Skip to content

Results

The results command family works on existing local AgentV run workspaces and index.jsonl manifests. Use it after an eval run to inspect failures, validate manifests, export artifact layouts, combine/delete local run workspaces, or generate a shareable HTML report.

Remote result repository exchange is intentionally not part of agentv results. New eval runs can auto-export to a configured results repo when auto_push: true; manual remote status and sync are Dashboard/API workflows. See Dashboard Remote Results for configuration and sync behavior, and WIP checkpoints for recovering in-progress runs before final publish.

SubcommandPurpose
results reportGenerate a self-contained static HTML report from an existing run workspace
results exportMaterialize or normalize the artifact workspace structure for a manifest
results combineCombine partial local run workspaces into a new local run workspace
results deleteDelete one or more local run workspaces
results summaryPrint aggregate metrics for a run
results failuresShow only failing cases
results showDisplay case-level rows from a run workspace
results validateValidate that a workspace or manifest resolves correctly

The results report command turns an existing run workspace or index.jsonl manifest into a self-contained HTML report for sharing, inspection, and human review.

AgentV results report overview showing 11 tests across 2 eval files with pass, fail, pass rate, duration, and cost summary cards
Terminal window
agentv results report <run-workspace-or-index.jsonl>

Examples:

Terminal window
# Generate report.html next to the run manifest
agentv results report .agentv/results/runs/2026-03-14T10-32-00_claude
# Use an explicit output path
agentv results report .agentv/results/runs/2026-03-14T10-32-00_claude/index.jsonl \
--out ./reports/human-review.html

What it shows:

  • Summary stats — total tests, passed, failed, pass rate, duration, and cost
  • Eval file groups — test cases grouped by eval file with pass rate, test count, and duration
  • Expandable details — unified assertions with pass/fail indicators and type badges, collapsible input/output
  • Criteria column — shows the test prompt or description inline for quick scanning

The generated file is self-contained HTML: no Dashboard server, API endpoint, or external asset host is required after it is written. That makes it a good fit for public result repositories served by GitHub Pages.

One minimal publication workflow is:

Terminal window
# 1. Run an eval and sync or copy the run workspace into your public results repo.
agentv eval evals/demo.eval.yaml --output .agentv/results/runs/demo-live
# 2. In the public results repo, render the report into the Pages source directory.
agentv results report .agentv/results/runs/demo-live --out docs/index.html
# 3. Review the generated HTML before publishing.
grep -RInE 'sk-[A-Za-z0-9]|Bearer |localhost|127\.0\.0\.1|/home/|/Users/|/tmp/' docs/index.html
# 4. Commit the run artifacts and docs/index.html, then enable GitHub Pages
# for the repository's docs/ directory or the branch used for Pages.
git add .agentv/results/runs/demo-live docs/index.html README.md
git commit -m "docs(results): publish static AgentV report"
git push

Use --out docs/<name>.html when a repository should publish multiple runs. Link those files from the result repository README so readers can browse a dashboard-like report from GitHub Pages instead of running agentv dashboard or opening raw JSONL.

AgentV results report showing an expanded failing test case with unified assertions, deterministic type badges, pass/fail indicators, evidence text, and collapsible input/output
OptionDescription
--out, -oOutput HTML file (defaults to <run-dir>/report.html)
--dir, -dWorking directory used to resolve the source path

Use results export when you need the artifact workspace layout itself rather than a rendered report.

Terminal window
agentv results export <run-workspace-or-index.jsonl> [--out <dir>]

This is useful when a manifest needs to be materialized into a predictable artifact tree for other tooling, review, or archiving. The run workspace is also where generated task bundles live: index.jsonl rows may point to per-result task_dir, eval_path, targets_path, files_path, and graders_path entries. Keep those generated artifacts with the run when sharing or auditing results.

For lightweight terminal workflows:

Terminal window
agentv results summary .agentv/results/runs/<timestamp>
agentv results failures .agentv/results/runs/<timestamp>
agentv results show .agentv/results/runs/<timestamp> --test-id my-case
agentv results validate .agentv/results/runs/<timestamp>

For a review-centric workflow built around these artifacts, see Human Review Checkpoint.

The CLI contract is deliberately narrow: agentv results manages local result artifacts only. It does not expose results remote status or results remote sync subcommands.

Use these supported remote workflows instead:

  • Automatic publishing: configure projects[].results.sync.auto_push: true; new agentv eval and agentv pipeline bench runs push their artifacts after the run completes. Add projects[].results.branch when the results repo stores artifacts on a non-default branch. While an eval is still running, WIP checkpoints can keep partial run output durable on agentv/inflight/... branches.
  • Manual Dashboard sync: run agentv dashboard, open the project, and use Sync Project.
  • Manual API sync: while Dashboard is running, call GET /api/projects/:projectId/remote/status or POST /api/projects/:projectId/remote/sync for project-scoped automation. Single-project sessions also expose GET /api/remote/status and POST /api/remote/sync.
  • Git escape hatch: for advanced recovery, inspect or repair the configured projects[].results.path clone with git directly, then sync again.