Skip to content

Add reporting visualization command#81

Open
ftshijt wants to merge 1 commit into
mainfrom
codex/reporting-visualization
Open

Add reporting visualization command#81
ftshijt wants to merge 1 commit into
mainfrom
codex/reporting-visualization

Conversation

@ftshijt
Copy link
Copy Markdown
Contributor

@ftshijt ftshijt commented May 29, 2026

Summary

  • add a packaged versa-visualize command for HTML/CSV/Markdown reports from result JSONL files
  • expand aggregation reporting with mean/std, confidence intervals, failure counts, rankings, outlier examples, and multiple export formats
  • add a tiny reporting example and docs for a basic smoke check

What to expect

Run:

versa-visualize demo/reporting_example_results.jsonl \
  --out /tmp/versa-reporting-example/report.html \
  --csv /tmp/versa-reporting-example/report.csv \
  --markdown /tmp/versa-reporting-example/report.md \
  --group-by model

The HTML report opens with four summary counters: utterances, numeric metrics, metric categories, and missing/invalid metric values. It then shows a radar overview, category sunburst, category data-completeness table, optional per-metric ranking by group, the detailed metric summary table, and outlier examples.

For the bundled example, the category summary should show complete coverage:

Category Metrics Observed Values Missing Invalid Coverage
asr_wer_cer 1 6 0 0 100%
pitch_f0 1 6 0 0 100%
similarity 1 6 0 0 100%
speech_enhancement 2 12 0 0 100%

CSV/Markdown exports contain one row per metric, with columns for count, missing, invalid, mean, median, std, stderr, ci95_low, ci95_high, min, max, higher_is_better, best_key, best_value, worst_key, worst_value, and outliers.

Example CSV rows:

metric category count mean std 95% CI best worst
mcd pitch_f0 6 5.65 0.9975 4.852 to 6.448 utt_005 (4.5) utt_003 (7.1)
pesq speech_enhancement 6 2.633 0.6022 2.151 to 3.115 utt_005 (3.4) utt_003 (1.9)
wer asr_wer_cer 6 0.14 0.05933 0.09253 to 0.1875 utt_005 (0.07) utt_003 (0.23)

Tests

  • .codex-test-venv/bin/pytest test/test_reporting.py
  • .codex-test-venv/bin/python versa/bin/visualize.py demo/reporting_example_results.jsonl --out /private/tmp/versa-reporting-example/report.html --csv /private/tmp/versa-reporting-example/report.csv --markdown /private/tmp/versa-reporting-example/report.md --group-by model
  • python3 -m py_compile versa/__init__.py versa/reporting.py versa/bin/visualize.py versa/bin/aggregate_results.py
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant