Documentation

Everything you need to get started with AgentEvals.

Install agentevals, run your first evaluation from OpenTelemetry traces, and learn where to go next.

Advanced usage patterns for evaluation backends, deployment, trace compatibility, and scaling agentevals.

Define custom evaluation logic for agentevals when built-in metrics are not enough.

How agentevals fits into trace pipelines, model backends, container deployments, and external evaluation systems.

Explore agentevals results in the web UI.

The structure agentevals uses to define evaluation datasets, metadata, and scoring inputs.

Understand what OpenTelemetry trace data agentevals expects and how compatibility affects scoring quality.

Run agentevals in streaming-oriented workflows for incremental or near-real-time evaluation.

Deploy agentevals with Docker images and the Helm chart on Kubernetes.

Frequently asked questions about agentevals.

Delegate evaluation to OpenAI's Evals API while keeping agentevals as the trace-oriented evaluation layer.