Test your Agent Responses

The first step to reliable agents

Eval (Evaluate agent responses) is the entry point to Rippletide. Before adding memory or decision runtime, start by testing what your agent already does. Rippletide evaluates your agent’s responses against expected answers, detects hallucinations, and gives you a clear pass/fail report. Once you know where your agent stands, you can move to the Context Graph for persistent memory and Decision runtime for deterministic decision-making.

How It Works

Define expected Q&A pairs

Provide questions and their expected answers in a qanda.json file, a Pinecone index, or a PostgreSQL database.

Send questions to your agent

Rippletide sends each question to your agent’s endpoint and collects the response.

Compare and score

Each response is compared against the expected answer. The evaluation engine checks for factual accuracy, hallucinations, and completeness.

View results

Get a summary with total tests, pass/fail count, duration, and a link to the detailed dashboard at app.rippletide.com.

Evaluation Criteria

Each agent response is evaluated on:

Criterion	What it checks
Factual accuracy	Does the response match the expected answer’s facts?
Hallucination detection	Does the response contain information not present in the knowledge base?
Completeness	Does the response cover all key points from the expected answer?

A response passes when it is factually accurate and free of hallucinations. It fails when it contains fabricated information or contradicts the expected answer.

Evaluation report

After evaluation, each response gets:

A label (pass or fail)
A justification explaining the verdict
A list of facts extracted from the response, each labeled as correct or hallucinated

{
  "label": "fail",
  "justification": "The response claims free shipping on all orders, which contradicts the knowledge base.",
  "facts": [
    { "fact": "Returns accepted within 30 days", "label": "correct" },
    { "fact": "Free shipping on all orders", "label": "hallucination" }
  ]
}

Two Ways to Evaluate

CLI

Run evaluations from the terminal with an interactive UI, real-time progress, and template support. See the CLI guide →

API Endpoints

Use the Evaluation API for the smallest supported programmatic workflow: create or update an evaluation agent, upload knowledge, create a test prompt, submit the result, and read the result. See the API Reference for full endpoint documentation.

What’s next?

Now that you can evaluate your agent, the next step is to give it memory:

Context Graph

Give your agent persistent memory with a context graph

Decision runtime

Build deterministic agents with less than 1% hallucination rate

Welcome

Agent Evaluation

Context Graph

Decision Runtime

Community

The first step to reliable agents

How It Works

Evaluation Criteria

Evaluation report

Two Ways to Evaluate

CLI

API Endpoints

What’s next?

Context Graph

Decision runtime

​The first step to reliable agents

​How It Works

​Evaluation Criteria

​Evaluation report

​Two Ways to Evaluate

​CLI

​API Endpoints

​What’s next?

Context Graph

Decision runtime

The first step to reliable agents

How It Works

Evaluation Criteria

Evaluation report

Two Ways to Evaluate

CLI

API Endpoints

What’s next?