Quick Start

Installation

From PyPI (recommended): the published package includes the CLI, REST API, and embedded web UI.

pip install agentevals-cli

Optional extras:

pip install "agentevals-cli[live]"        # MCP server (`agentevals mcp`)

GitHub releases page also ships core wheels (CLI and API only) and bundle wheels (with the embedded UI) if you need a specific version or offline pip install ./path/to.whl. For MCP support, install with the [live] extra the same way as from PyPI (for example pip install "./your-wheel.whl[live]").

From source with uv or Nix:

uv sync
# or: nix develop .

See DEVELOPMENT.md for build instructions.

CLI Quick Start

Run an evaluation against a sample trace:

agentevals run samples/helm.json \
  --eval-set samples/eval_set_helm.json \
  -m tool_trajectory_avg_score

List available evaluators:

agentevals evaluator list

Live UI Quick Start

Start the server with the embedded web UI:

agentevals serve

Open http://localhost:8001 to upload traces and eval sets, select metrics, and view results with interactive span trees.

From source (two terminals):

uv run agentevals serve --dev     # Terminal 1
cd ui && npm install && npm run dev  # Terminal 2 → http://localhost:5173

Live-streamed traces appear in the “Local Dev” tab, grouped by session ID.

What’s Next

Integrations — Zero-code, SDK, and CLI/CI integration patterns
Custom Evaluators — Build your own evaluators
UI Walkthrough — Deep dive into the web UI