Quick Start

agentevals scores AI agent behavior from OpenTelemetry traces without re-running the agent.

Install

pip install agentevals

What you need

To evaluate traces, you need:

  • OpenTelemetry traces from your agent runs
  • an eval configuration that defines which metrics or evaluators to run
  • optional API keys if you use model-backed or delegated evaluators

Basic workflow

The standard workflow is:

  1. collect traces from your agent system
  2. load those traces into agentevals
  3. run built-in metrics, custom evaluators, or delegated backends
  4. inspect results in the CLI or UI

agentevals is designed so evaluation happens from trace data rather than by replaying the original agent execution.

Typical next steps

After installation, most teams go in one of these directions:

Deployment options

agentevals can be used in several ways:

  • as a local Python package
  • through container images
  • on Kubernetes with the Helm chart

If you are deploying agentevals as a service, start with Kubernetes & Helm.

Choosing an evaluation approach

agentevals supports multiple ways to score behavior:

  • built-in metrics for common trace-derived signals
  • custom evaluators for project-specific logic
  • delegated evaluation backends such as the initial OpenAI Evals API integration

For architecture guidance, see Advanced.