UI Walkthrough

The agentevals UI is the easiest way to inspect evaluation runs and understand how scores were produced.

What the UI is for

Use the UI to:

browse evaluation runs
inspect individual trace-derived results
compare scores across runs or datasets
review metric outputs and evaluator judgments

Typical workflow

run evaluations from your trace data
open the UI
select a run or dataset
inspect scores, failures, and trace-level context

How it fits with newer features

The UI complements the newer v0.6.3 capabilities:

if you deploy agentevals with containers or Kubernetes, the UI becomes easier to share across teams
if you use delegated evaluation through OpenAI Evals API, the UI remains the place to review resulting outputs in the broader agentevals workflow
if you use streaming, the UI helps inspect continuously arriving results