UI Walkthrough
The agentevals UI is the easiest way to inspect evaluation runs and understand how scores were produced.
What the UI is for
Use the UI to:
- browse evaluation runs
- inspect individual trace-derived results
- compare scores across runs or datasets
- review metric outputs and evaluator judgments
Typical workflow
- run evaluations from your trace data
- open the UI
- select a run or dataset
- inspect scores, failures, and trace-level context
How it fits with newer features
The UI complements the newer v0.6.3 capabilities:
- if you deploy agentevals with containers or Kubernetes, the UI becomes easier to share across teams
- if you use delegated evaluation through OpenAI Evals API, the UI remains the place to review resulting outputs in the broader agentevals workflow
- if you use streaming, the UI helps inspect continuously arriving results