feat: add SQLite persistent storage for benchmark results (#26)#34
feat: add SQLite persistent storage for benchmark results (#26)#34Sugaria0427 wants to merge 2 commits into
Conversation
Replaces flat JSON/CSV log storage with a queryable SQLite database while maintaining full backward compatibility. New files: - utils/storage.py — Storage class wrapping Python stdlib sqlite3. Schema: runs (run_id, timestamp, config_json) + results (run_id FK, backend, turn, metric, value). API: save_run, get_run, list_runs, compare_runs. - utils/migrate_legacy_logs.py — one-shot idempotent migration script that imports existing experiment_logs/*.json into SQLite. Modified files: - evaluation/logger.py — log_run() now calls Storage().save_run() alongside existing JSON+CSV writes. list_runs() queries SQLite first, falls back to filesystem scan. Fixes pre-existing has_llm_eval bug in _append_csv_summary. - tests/test_pipeline.py — 6 new tests (4 Storage CRUD + 2 logger integration). - CHANGELOG.md — documented the new feature. - .gitignore — added experiment_logs/memorylens.db. Closes Neal006#26
Code Review — PR #34 (SQLite persistent storage)Thanks for the contribution! The overall design is clean and the backward-compatibility approach (JSON/CSV unchanged, SQLite as opt-in) is exactly right. I found 3 bugs that will cause crashes in production and 2 lower-priority issues that need addressing before merge. 🔴 Bug 1 — Duplicate
|
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | 🔴 High | utils/storage.py |
Duplicate results rows corrupt get_run() on repeated run_id |
| 2 | 🔴 High | evaluation/logger.py |
IndexError when rows is empty in _append_csv_summary |
| 3 | 🔴 High | evaluation/logger.py |
Storage() connections never closed → file handle leak |
| 4 | 🟡 Medium | utils/storage.py |
None values in metric arrays break arithmetic downstream |
| 5 | 🟡 Low | tests/test_pipeline.py |
Integration tests leak connections and use raw SQL for cleanup |
Once the three 🔴 issues are fixed I'm happy to approve. The design and test coverage are solid — these are all mechanical fixes.
- Bug 1: DELETE stale results before INSERT in save_run() to prevent duplicate rows on repeated run_id (fixes get_run() corruption) - Bug 2: Add empty-rows guard in _append_csv_summary() to prevent IndexError - Bug 3: Close Storage() connections in log_run() and list_runs() via try/finally to prevent file handle leaks on Windows - Issue 4: Document None-handling contract in get_run() docstring; filter None from compare_runs() recall arrays - Issue 5: Close Storage() in test cleanup to match the Storage unit tests Adds test_storage_save_run_idempotent to verify fix for Bug 1.
022423e to
6fcc8db
Compare
|
[@Neal006] Thanks for the thorough review! All 5 issues have been addressed: 🔴 Bug 1 — Added 🟡 Issue 4 — Added docstring note in All 30 tests pass (23 existing + 7 new). |
Summary
Replace flat JSON/CSV log storage with a queryable SQLite database for benchmark results, while maintaining full backward compatibility.
Related issue
Closes #26
Type of change
How was this tested?
Checklist