Quick Start
agentevals scores AI agent behavior from OpenTelemetry traces without re-running the agent.
Install
pip install agentevals
What you need
To evaluate traces, you need:
- OpenTelemetry traces from your agent runs
- an eval configuration that defines which metrics or evaluators to run
- optional API keys if you use model-backed or delegated evaluators
Basic workflow
The standard workflow is:
- collect traces from your agent system
- load those traces into agentevals
- run built-in metrics, custom evaluators, or delegated backends
- inspect results in the CLI or UI
agentevals is designed so evaluation happens from trace data rather than by replaying the original agent execution.
Typical next steps
After installation, most teams go in one of these directions:
- Define evals using the Eval Set Format
- Write custom logic with Custom Evaluators
- Use delegated judging with the OpenAI Evals API backend
- Run in production environments with Kubernetes & Helm
- Understand trace requirements in OTel Compatibility
- Work with live data using Streaming
- Inspect results visually in the UI Walkthrough
Deployment options
agentevals can be used in several ways:
- as a local Python package
- through container images
- on Kubernetes with the Helm chart
If you are deploying agentevals as a service, start with Kubernetes & Helm.
Choosing an evaluation approach
agentevals supports multiple ways to score behavior:
- built-in metrics for common trace-derived signals
- custom evaluators for project-specific logic
- delegated evaluation backends such as the initial OpenAI Evals API integration
For architecture guidance, see Advanced.