Integrations
agentevals is designed to sit on top of OpenTelemetry-based agent observability.
Core integration points
OpenTelemetry traces
agentevals evaluates from traces, so your primary integration surface is the telemetry your agent stack already emits.
For compatibility details and expectations around trace shape, see OTel Compatibility.
Custom evaluators
If you want evaluation logic to stay inside your own Python code, use Custom Evaluators.
OpenAI Evals API backend
agentevals includes an initial option to delegate eval execution to OpenAI’s Evals API.
Use this when you want agentevals to remain your trace-oriented orchestration layer while an external backend performs judging.
Containers and Kubernetes
For shared environments, CI systems, or cluster deployment, agentevals can run from a Docker image and can be installed on Kubernetes with the Helm chart.
See Kubernetes & Helm.
UI
The UI helps inspect results, compare runs, and review evaluation outputs visually.
See UI Walkthrough.
Typical integration patterns
- local development with Python package install
- CI evaluation jobs using a container image
- Kubernetes deployment via Helm
- trace ingestion from instrumented agent systems
- delegated rubric judging through OpenAI Evals API
Choosing the right path
- Choose custom evaluators for code-first extensibility.
- Choose OpenAI Evals API when you want delegated judging.
- Choose Docker/Kubernetes when you need reproducible deployment.
- Choose streaming when you want near-real-time evaluation pipelines.