AgentEvals AgentEvals
Home Docs Evaluators Discord GitHub

Documentation

  • Quick Start
  • Advanced
  • Custom Evaluators
  • Integrations
  • UI Walkthrough
  • Eval Set Format
  • OTel Compatibility
  • Streaming
  • Kubernetes & Helm
  • FAQ
  • OpenAI Evals API Backend

Documentation

Everything you need to get started with AgentEvals.

Quick Start

Install agentevals, run your first evaluation from OpenTelemetry traces, and learn where to go next.

Advanced

Advanced usage patterns for evaluation backends, deployment, trace compatibility, and scaling agentevals.

Custom Evaluators

Define custom evaluation logic for agentevals when built-in metrics are not enough.

Integrations

How agentevals fits into trace pipelines, model backends, container deployments, and external evaluation systems.

UI Walkthrough

Explore agentevals results in the web UI.

Eval Set Format

The structure agentevals uses to define evaluation datasets, metadata, and scoring inputs.

OTel Compatibility

Understand what OpenTelemetry trace data agentevals expects and how compatibility affects scoring quality.

Streaming

Run agentevals in streaming-oriented workflows for incremental or near-real-time evaluation.

Kubernetes & Helm

Deploy agentevals with Docker images and the Helm chart on Kubernetes.

FAQ

Frequently asked questions about agentevals.

OpenAI Evals API Backend

Delegate evaluation to OpenAI's Evals API while keeping agentevals as the trace-oriented evaluation layer.

AgentEvals AgentEvals
GitHub Discord
© 2026 AgentEvals. Open source under Apache 2.0.