Documentation
Everything you need to get started with AgentEvals.
Quick Start
Install agentevals, run your first evaluation from OpenTelemetry traces, and learn where to go next.
Advanced
Advanced usage patterns for evaluation backends, deployment, trace compatibility, and scaling agentevals.
Custom Evaluators
Define custom evaluation logic for agentevals when built-in metrics are not enough.
Integrations
How agentevals fits into trace pipelines, model backends, container deployments, and external evaluation systems.
UI Walkthrough
Explore agentevals results in the web UI.
Eval Set Format
The structure agentevals uses to define evaluation datasets, metadata, and scoring inputs.
OTel Compatibility
Understand what OpenTelemetry trace data agentevals expects and how compatibility affects scoring quality.
Streaming
Run agentevals in streaming-oriented workflows for incremental or near-real-time evaluation.
Kubernetes & Helm
Deploy agentevals with Docker images and the Helm chart on Kubernetes.
FAQ
Frequently asked questions about agentevals.
OpenAI Evals API Backend
Delegate evaluation to OpenAI's Evals API while keeping agentevals as the trace-oriented evaluation layer.