Skip to main content
POST
/
api
/
agents
/
{agentId}
/
test-results
/
{promptId}
Submit evaluation result
curl --request POST \
  --url https://agent-evalserver-production.up.railway.app/api/agents/{agentId}/test-results/{promptId} \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "status": "passed",
  "response": "Customers can request a refund within 30 days of purchase.",
  "expectedAnswer": "Customers can request a refund within 30 days.",
  "hallucinationLabel": "FactIsPresent",
  "hallucinationFindings": []
}
'
{
  "id": "4a2dc063-36f8-47b0-a703-bb2d7d4e44f5",
  "agentId": "2f7e9c9c-9a2e-4e3a-b77f-6d9d1a7e3a11",
  "promptId": 42,
  "status": "passed",
  "sessionId": null,
  "response": "Customers can request a refund within 30 days of purchase.",
  "expectedAnswer": "Customers can request a refund within 30 days.",
  "hallucinationLabel": "FactIsPresent",
  "hallucinationFindings": [],
  "created_at": "2026-06-02T15:20:00Z",
  "updated_at": "2026-06-02T15:21:00Z"
}

Authorizations

x-api-key
string
header
required

Platform API key for authenticated account-level API access

Path Parameters

agentId
string<uuid>
required

Evaluation agent ID.

promptId
integer
required

Question ID returned by Add evaluation questions.

Body

application/json

Submit the agent response and evaluation outcome for a prompt.

status
enum<string>
required

Evaluation outcome.

Available options:
passed,
failed,
ambiguous
Example:

"passed"

response
string
required

Agent response to the prompt.

Example:

"Customers can request a refund within 30 days of purchase."

expectedAnswer
string | null

Reference answer used for evaluation.

Example:

"Customers can request a refund within 30 days."

hallucinationLabel
string | null

Optional hallucination label returned by your evaluation step.

Example:

"FactIsPresent"

hallucinationFindings
object[] | null

Optional fact-level findings returned by your evaluation step.

Example:
[]

Response

200 - application/json

Test result stored

Stored evaluation result for one prompt.

id
string<uuid>

Unique result identifier.

Example:

"4a2dc063-36f8-47b0-a703-bb2d7d4e44f5"

agentId
string<uuid>

Evaluation agent ID.

Example:

"2f7e9c9c-9a2e-4e3a-b77f-6d9d1a7e3a11"

promptId
integer

Prompt ID this result belongs to.

Example:

42

status
enum<string>

Evaluation outcome.

Available options:
passed,
failed,
ambiguous
Example:

"passed"

sessionId
string<uuid> | null

Optional session identifier associated with the result.

Example:

"a75b1bb5-0c7c-4302-978e-e2452b79df26"

response
string | null

Agent response that was evaluated.

Example:

"Customers can request a refund within 30 days of purchase."

expectedAnswer
string | null

Reference answer used for evaluation.

Example:

"Customers can request a refund within 30 days."

hallucinationLabel
string | null

Hallucination label returned by the evaluator, when available.

Example:

"FactIsPresent"

hallucinationFindings
object[] | null

Fact-level findings returned by the evaluator.

Example:
[]
created_at
string<date-time>

Creation timestamp.

Example:

"2026-06-02T15:20:00Z"

updated_at
string<date-time>

Last update timestamp.

Example:

"2026-06-02T15:21:00Z"