feat: restrict post-processing (refine) to verbatim mode only (#86)#87
Conversation
The refine step exposed three modes — verbatim, standard, and caption. standard and caption shifted the SRT timing (caption re-segments by design; standard dropped fillers and the LLM did not reliably preserve segmentation), so subtitles drifted out of sync with the audio. Consolidate to verbatim, which preserves timestamps and segmentation. - refine.py: drop standard/caption prompts; _PROMPT_MAP is verbatim-only; add canonical VALID_REFINE_MODES / DEFAULT_REFINE_MODE (single source). - jobs.py: re-export VALID_REFINE_MODES from refine (standard/caption now rejected with INVALID_REFINE_MODE). - transcribe.py: normalize refine_mode to verbatim on both the streaming and batch (_run_refinement) paths — legacy standard/caption rows fall back at read time, no migration needed. - upload.html / settings.html: drop the mode selector and the standard/caption custom-prompt editors. - i18n: prune unused mode strings; reword persona hints. - tests: cover the batch-path normalization regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR restricts the LLM “refine” post-processing feature to verbatim-only to prevent subtitle timing drift, removing the previously supported standard and caption modes across backend, frontend, i18n, and tests.
Changes:
- Remove
standard/captionrefine prompts and makeverbatimthe sole supported mode (with a single source of truth for valid modes/default). - Normalize refine mode usage to
verbatimin both streaming and batch refinement paths; reject removed modes at the API boundary. - Update UI to remove refine-mode selection and prune i18n strings; adjust test suite accordingly (including a regression test for legacy-mode normalization).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/services/refine.py |
Removes non-verbatim prompts and introduces canonical VALID_REFINE_MODES/DEFAULT_REFINE_MODE. |
src/services/transcribe.py |
Forces verbatim refinement for streaming and batch paths; iterates custom prompts via VALID_REFINE_MODES. |
src/api/jobs.py |
Re-exports VALID_REFINE_MODES from refine module for consistent validation. |
src/templates/upload.html |
Removes refine-mode selector and stops sending refine_mode from the upload UI. |
src/templates/settings.html |
Reduces refine prompt editing UI to verbatim-only. |
src/i18n/en.json |
Removes standard/caption mode strings and rewrites verbatim/persona copy. |
src/i18n/ja.json |
Removes standard/caption mode strings and rewrites verbatim/persona copy. |
tests/test_refine.py |
Updates prompt-map tests for verbatim-only behavior. |
tests/test_pipeline.py |
Updates refine tests to verbatim-only and adds regression coverage for legacy-mode normalization in batch refinement. |
tests/test_api_validation.py |
Updates API validation tests for verbatim-only and ensures removed modes are rejected. |
tests/e2e/test_smoke.py |
Updates persona preset assertions now that refine mode selection is removed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Should pass validation (may fail later due to no real file, but not 400 for refine_mode) | ||
| assert resp.status_code != 400 or "refine_mode" not in resp.json().get("detail", "") | ||
| if resp.status_code == 200: | ||
| await cleanup_job(make_client, resp.json()["id"]) |
| @@ -313,7 +291,6 @@ <h2 class="text-lg font-medium text-gray-900 dark:text-gray-100 mb-3">{{ t('uplo | |||
| if (this.selectedModel && this.enableRefine) params.set('model', this.selectedModel); | |||
| if (this.language) params.set('language', this.language); | |||
| params.set('enable_refine', this.enableRefine); | |||
| # Canonical declaration of valid refine modes. jobs.py and settings.py import | ||
| # this so there is a single source of truth. |
- jobs.py: persist refine_mode=verbatim when enable_refine is set so the History "Mode" column shows the effective mode instead of "-" (the UI no longer sends refine_mode after #86). - test_api_validation.py: fix a weak assertion — AppError serializes under {"error": {"code": ...}}, not "detail", so the old check always passed. Now assert verbatim is not rejected with INVALID_REFINE_MODE. - refine.py: correct the VALID_REFINE_MODES comment — settings.py imports it from jobs.py (re-export), not directly from refine.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all three Copilot review comments in c8b9a5a:
324 tests pass, ruff clean. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/services/transcribe.py:544
- This batch refinement path also hard-codes
refine_mode = "verbatim", duplicating the new canonicalDEFAULT_REFINE_MODE. Consider usingDEFAULT_REFINE_MODEhere as well so streaming and batch paths remain consistent if the default ever changes.
# Only verbatim remains; legacy jobs stored as standard/caption fall back
# to it at read time (#86). Mirrors the streaming path so a custom verbatim
# prompt is honored on the verify pipeline too.
refine_mode = "verbatim"
| custom_prompts = await _load_custom_prompts(session) | ||
| refine_mode = job.refine_mode or "standard" | ||
| # Only verbatim remains; legacy jobs stored as standard/caption | ||
| # fall back to it at read time (#86) — no migration needed. | ||
| refine_mode = "verbatim" | ||
| context_before_n = 3 |
Address Copilot follow-up: both refine_mode normalization sites (streaming and _run_refinement) hard-coded the literal "verbatim", duplicating the canonical DEFAULT_REFINE_MODE from refine.py. Import and use the constant so there is a true single source of truth, and reword the comments to describe normalization regardless of the stored legacy value (not just a read-time fallback). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed the follow-up Copilot comment in 0d5b22d:
324 tests pass, ruff clean. |
| if resp.status_code == 400: | ||
| assert resp.json().get("error", {}).get("code") != "INVALID_REFINE_MODE" | ||
| if resp.status_code == 200: |
|
|
||
| # Canonical declaration of valid refine modes (single source of truth). | ||
| # jobs.py imports and re-exports this; settings.py imports it from jobs.py. | ||
| VALID_REFINE_MODES = frozenset(_PROMPT_MAP) |
Address Copilot round-3 comments:
- test_create_job_valid_refine_mode: assert unconditionally that verbatim is
never rejected with INVALID_REFINE_MODE, so an unexpected 422/500 can no
longer make the test pass silently. Intentionally not pinned to {200,400}
since downstream status depends on API-key presence (orthogonal to mode
validation).
- refine.py: use frozenset(_PROMPT_MAP.keys()) for explicitness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed round-3 Copilot comments in 8c9d435:
324 tests pass, ruff clean. |
Closes #86
Summary
standardandcaption.standard/captionshifted SRT timing (caption re-segments by design; standard dropped fillers and the LLM did not reliably preserve segmentation), so subtitles drifted out of sync with the audio.verbatimpreserves timestamps and segmentation.Implementation notes
standard/captionprompts;_PROMPT_MAPis verbatim-only; add canonicalVALID_REFINE_MODES = frozenset(_PROMPT_MAP)andDEFAULT_REFINE_MODE(single source of truth).VALID_REFINE_MODESfrom refine;standard/captionnow rejected withINVALID_REFINE_MODE(400).refine_modetoverbatimon both the streaming and batch (_run_refinement) paths — legacystandard/captionjob rows fall back at read time, no migration needed._load_custom_promptsiteratesVALID_REFINE_MODES.standard/captioncustom-prompt editors; the UI no longer sendsrefine_mode.Pre-PR review summary
A
/code-reviewpass at xhigh effort (5 angles) was run this session before commit: it found one bug — the batch (_run_refinement) path was initially not normalized to verbatim, silently ignoring custom verbatim prompts on the verify pipeline — which was fixed and is now guarded by a regression test (test_run_refinement_normalizes_legacy_mode_to_verbatim). All 324 tests pass; ruff clean.Full reviews are saved in the plugin cache:
🤖 Generated via /gh-issue-driven:ship