Results

The results command family works on existing local AgentV run workspaces and index.jsonl manifests. Use it after an eval run to inspect failures, validate manifests, export artifact layouts, combine/delete local run workspaces, or generate a shareable HTML report.

Remote result repository exchange is intentionally not part of agentv results. New eval runs can auto-export to a configured results repo when auto_push: true; manual remote status and sync are Dashboard/API workflows. See Dashboard Remote Results for configuration and sync behavior, and WIP checkpoints for recovering in-progress runs before final publish.

Subcommands

Subcommand	Purpose
`results report`	Generate a self-contained static HTML report from an existing run workspace
`results export`	Materialize or normalize the artifact workspace structure for a manifest
`results combine`	Combine partial local run workspaces into a new local run workspace
`results delete`	Delete one or more local run workspaces
`results summary`	Print aggregate metrics for a run
`results failures`	Show only failing cases
`results show`	Display case-level rows from a run workspace
`results validate`	Validate that a workspace or manifest resolves correctly

`results report`

The results report command turns an existing run workspace or index.jsonl manifest into a self-contained HTML report for sharing, inspection, and human review.

agentv results report <run-workspace-or-index.jsonl>

Examples:

# Generate report.html next to the run manifest
agentv results report .agentv/results/runs/2026-03-14T10-32-00_claude

# Use an explicit output path
agentv results report .agentv/results/runs/2026-03-14T10-32-00_claude/index.jsonl \
  --out ./reports/human-review.html

What it shows:

Summary stats — total tests, passed, failed, pass rate, duration, and cost
Eval file groups — test cases grouped by eval file with pass rate, test count, and duration
Expandable details — unified assertions with pass/fail indicators and type badges, collapsible input/output
Criteria column — shows the test prompt or description inline for quick scanning

Publish a static report with GitHub Pages

The generated file is self-contained HTML: no Dashboard server, API endpoint, or external asset host is required after it is written. That makes it a good fit for public result repositories served by GitHub Pages.

One minimal publication workflow is:

# 1. Run an eval and sync or copy the run workspace into your public results repo.
agentv eval evals/demo.eval.yaml --output .agentv/results/runs/demo-live

# 2. In the public results repo, render the report into the Pages source directory.
agentv results report .agentv/results/runs/demo-live --out docs/index.html

# 3. Review the generated HTML before publishing.
grep -RInE 'sk-[A-Za-z0-9]|Bearer |localhost|127\.0\.0\.1|/home/|/Users/|/tmp/' docs/index.html

# 4. Commit the run artifacts and docs/index.html, then enable GitHub Pages
#    for the repository's docs/ directory or the branch used for Pages.
git add .agentv/results/runs/demo-live docs/index.html README.md
git commit -m "docs(results): publish static AgentV report"
git push

Use --out docs/<name>.html when a repository should publish multiple runs. Link those files from the result repository README so readers can browse a dashboard-like report from GitHub Pages instead of running agentv dashboard or opening raw JSONL.

AgentV results report showing an expanded failing test case with unified assertions, deterministic type badges, pass/fail indicators, evidence text, and collapsible input/output

Option	Description
`--out`, `-o`	Output HTML file (defaults to `<run-dir>/report.html`)
`--dir`, `-d`	Working directory used to resolve the source path

`results export`

Use results export when you need the artifact workspace layout itself rather than a rendered report.

agentv results export <run-workspace-or-index.jsonl> [--out <dir>]

This is useful when a manifest needs to be materialized into a predictable artifact tree for other tooling, review, or archiving. The run workspace is also where generated task bundles live: index.jsonl rows may point to per-result task_dir, eval_path, targets_path, files_path, and graders_path entries. Keep those generated artifacts with the run when sharing or auditing results.

Inspection helpers

For lightweight terminal workflows:

agentv results summary .agentv/results/runs/<timestamp>
agentv results failures .agentv/results/runs/<timestamp>
agentv results show .agentv/results/runs/<timestamp> --test-id my-case
agentv results validate .agentv/results/runs/<timestamp>

For a review-centric workflow built around these artifacts, see Human Review Checkpoint.

Remote results sync/status

The CLI contract is deliberately narrow: agentv results manages local result artifacts only. It does not expose results remote status or results remote sync subcommands.

Use these supported remote workflows instead:

Automatic publishing: configure projects[].results.sync.auto_push: true; new agentv eval and agentv pipeline bench runs push their artifacts after the run completes. Add projects[].results.branch when the results repo stores artifacts on a non-default branch. While an eval is still running, WIP checkpoints can keep partial run output durable on agentv/inflight/... branches.
Manual Dashboard sync: run agentv dashboard, open the project, and use Sync Project.
Manual API sync: while Dashboard is running, call GET /api/projects/:projectId/remote/status or POST /api/projects/:projectId/remote/sync for project-scoped automation. Single-project sessions also expose GET /api/remote/status and POST /api/remote/sync.
Git escape hatch: for advanced recovery, inspect or repair the configured projects[].results.path clone with git directly, then sync again.