Quick Start

agentevals scores AI agent behavior from OpenTelemetry traces without re-running the agent.

Install

pip install agentevals

To evaluate traces, you need:

The standard workflow is:

agentevals is designed so evaluation happens from trace data rather than by replaying the original agent execution.

After installation, most teams go in one of these directions:

agentevals can be used in several ways:

If you are deploying agentevals as a service, start with Kubernetes & Helm.

agentevals supports multiple ways to score behavior:

built-in metrics for common trace-derived signals
custom evaluators for project-specific logic
delegated evaluation backends such as the initial OpenAI Evals API integration

For architecture guidance, see Advanced.