fix(translation): only extract chunks from the marked text, not its c… #f408e83#49
Conversation
…ontext The detailed-translation and advice models intermittently surfaced reusable patterns (chunks) drawn from the surrounding `context` rather than the user's marked selection. e.g. selecting "improvements are included" returned "related to" and "based on", which appear only in the context paragraph. Add a deterministic guardrail, `filterChunksToSelection`, that drops any chunk whose text does not appear (case- and whitespace-insensitively) inside the marked `phrase`, and apply it to both `getDetailedTranslation` and `getTranslationAdvice`. Also harden both prompts to state explicitly that the context is only for disambiguating meaning and must never be a source of chunks. https://claude.ai/code/session_01UTb6tc1wE8UyWxCKvBHmwT
navidshad
left a comment
There was a problem hiding this comment.
Automated PR Review
Primary Task: CU-86exw91re — Fix chunks confusion on translation function
Task alignment
The task describes a concrete bug: selecting "improvements are included" was returning "related to" and "based on" as chunks — words pulled from the surrounding context, not from the marked phrase itself.
- ✅ Prompt hardening (
service.ts:82,service.ts:155) — bothgetDetailedTranslationandgetTranslationAdvicesystem prompts now explicitly distinguishcontext(disambiguation only) fromphrase(the only valid source of chunks). - ✅ Deterministic guardrail (
chunk-filter.ts) —filterChunksToSelectionenforces the contract post-model regardless of what the LLM returns. Applied in both call sites (service.ts:121,service.ts:189). - ✅ Tests (
__tests__/chunk-filter.test.ts) — 6 cases covering: the exact real-world failing example from the task, sub-phrase matching, case/whitespace normalisation, empty/whitespace chunks, empty selection, andundefined/nullinput. The test file directly references the task's failing case, making the regression net self-documenting. - ✅ Scope is appropriate — no over-engineering, nothing missing.
One subtle correctness point worth calling out: the advice guardrail correctly uses if (result.chunks) { result.chunks = filterChunksToSelection(...) } rather than an unconditional assignment. In the advice flow, chunks: undefined means "don't touch the existing highlights", whereas chunks: [] means "remove all highlights" — preserving undefined is the right call here.
Commit messages
fix(translation): only extract chunks from the marked text, not its context✅ — typefixmatches the real impact (bug, not a new capability), scope is correct, description is clear.
Prior review follow-up
No prior reviews.
Convention check
- ✅ Module placement follows the established pattern (
server/src/modules/translation/chunk-filter.ts,__tests__/chunk-filter.test.ts). - ✅ No new DB collections;
config.tsuntouched. - ✅ Conventional Commits:
fix:→ patch bump, correctly applied. ⚠️ PR body — the "Related Tasks" and "Commit List" links point tohttps://app.clickup.com/t/f408e83(a git commit SHA), not the ClickUp task. The correct URL ishttps://app.clickup.com/t/86exw91re. Cosmetic only — the auto-linked comment is correct.
Verdict
APPROVE — Clean, well-tested fix that directly addresses the exact bug in the task. The dual approach (prompt hardening + deterministic post-processing guardrail) makes the fix resilient to future model drift.
Generated by Claude Code
|
🎉 This PR is included in version 1.0.0-dev.1 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
📋 Summary
This PR fixes the translation extraction process to only extract chunks from the explicitly marked text segments, excluding their surrounding context.
🔗 Related Tasks
#f408e83 - Fix translation extraction to exclude context outside marked text
📝 Additional Details
Previously, translation extraction included surrounding context which could lead to incorrect or extraneous translation chunks. This update ensures extraction is limited strictly to the marked text.
📜 Commit List
f408e83 fix(translation): only extract chunks from the marked text, not its context