Skip to content

fix(llm): treat truncated LLM responses as fallback in RAG intent#3129

Merged
marevol merged 1 commit intomasterfrom
fix/llm-rag-intent-truncation
May 4, 2026
Merged

fix(llm): treat truncated LLM responses as fallback in RAG intent#3129
marevol merged 1 commit intomasterfrom
fix/llm-rag-intent-truncation

Conversation

@marevol
Copy link
Copy Markdown
Contributor

@marevol marevol commented May 3, 2026

Summary

Fixes #3125. AbstractLlmClient.detectIntent could classify a user message as UNCLEAR when the LLM response was truncated by output-token exhaustion (Gemini finishReason=MAX_TOKENS, OpenAI finish_reason=length, Anthropic max_tokens), surfacing a "please clarify" reply for an answerable question.

Two cooperating gaps:

  1. isEmptyContentWithLengthFinish only matched OpenAI-style "length" and required blank content, so Gemini MAX_TOKENS with non-empty truncated content slipped past the defensive fallback.
  2. parseIntentResponse silently routed a truncated-but-parseable JSON to IntentDetectionResult.unclear because ChatIntent.fromValue maps a missing/blank intent field to UNCLEAR by design.

Changes

  • Add TRUNCATION_FINISH_REASONS set (length, MAX_TOKENS, max_tokens, model_length) and isTruncatedFinish helper covering OpenAI, Ollama, Gemini, Anthropic, and self-hosted endpoints. Generalize isEmptyContentWithLengthFinish to delegate to the helper (kept for subclass back-compat).
  • Replace the empty-content gate in detectIntent (both overloads) and evaluateResults with a strict truncation gate so any truncated response falls back to fallbackSearch / fallbackAllRelevant regardless of content emptiness.
  • Add a WARN-only truncation log to regenerateQuery without short-circuiting (the query field is atomic and the regex requires a closing quote, so partial values are filtered out and any extractable refinement is preserved).
  • Harden parseIntentResponse: a missing/blank intent field or an unknown value falls back to search instead of confident UNCLEAR. Reserve UNCLEAR strictly for an explicit "intent":"unclear" from the model.
  • Sanitize WARN-level logs against operator-log data exposure: introduce truncateForLog (cap at LOG_RESPONSE_HEAD_MAX_CHARS=200) and emit responseLength + responseHead + intentLength + intentHead instead of unbounded raw response/intent/userMessage. Apply consistently to the new truncation gates and to the existing parse-failure catch paths in parseIntentResponse and parseEvaluationResponse.

Test plan

  • mvn test -Dtest=AbstractLlmClientTest — 120/120 green (49 new tests)
  • mvn test -Dtest='org.codelibs.fess.llm.*Test' — 191/191 green
  • mvn test -Dtest='org.codelibs.fess.llm.*Test,org.codelibs.fess.chat.*Test' — 302/302 green (no regression in ChatClient consumers)
  • Codex independent review — iterated to convergence (ok=true, no remaining issues)

New tests cover: provider variants of isTruncatedFinish, the Gemini MAX_TOKENS reproduction, OpenAI/Anthropic/model_length truncation, parseIntentResponse blank/unknown/whitespace/case-insensitive/explicit-unclear/JSON-null/null-response/invalid-JSON, evaluateResults truncation across providers + happy path + empty-content sanity, regenerateQuery truncation + extractable-query preservation, truncateForLog null/empty/short/boundary/over-boundary.

…ection

AbstractLlmClient.detectIntent could classify a user message as UNCLEAR
when the underlying LLM response was truncated by output-token exhaustion
(Gemini finishReason=MAX_TOKENS, OpenAI finish_reason=length, Anthropic
max_tokens), surfacing a "please clarify" reply for an answerable question.

Two cooperating gaps in AbstractLlmClient caused the symptom:

1. isEmptyContentWithLengthFinish only matched OpenAI-style "length" and
   required blank content, so Gemini MAX_TOKENS with non-empty truncated
   content slipped past the defensive fallback.
2. parseIntentResponse silently routed a truncated-but-parseable JSON to
   IntentDetectionResult.unclear because ChatIntent.fromValue maps a
   missing/blank intent field to UNCLEAR by design.

Changes
- Add TRUNCATION_FINISH_REASONS Set ("length", "MAX_TOKENS", "max_tokens",
  "model_length") and isTruncatedFinish helper covering OpenAI, Ollama,
  Gemini, Anthropic, and self-hosted endpoints.
- Generalize isEmptyContentWithLengthFinish to delegate to the helper
  (kept for subclass back-compat).
- Replace the empty-content gate in detectIntent (both overloads) and
  evaluateResults with a strict truncation gate so any truncated response
  falls back to fallbackSearch / fallbackAllRelevant regardless of content
  emptiness.
- Add a WARN-only truncation log to regenerateQuery without short-circuiting
  (the "query" field is atomic and the regex requires a closing quote, so
  partial values are filtered out and any extractable refinement is kept).
- Harden parseIntentResponse so a missing/blank intent field or an unknown
  value falls back to search instead of confident UNCLEAR; reserve UNCLEAR
  strictly for an explicit "intent":"unclear" from the model.
- Sanitize WARN-level logs against operator-log data exposure: introduce
  truncateForLog (cap at LOG_RESPONSE_HEAD_MAX_CHARS=200) and emit
  responseLength + responseHead + intentLength + intentHead instead of
  unbounded raw response/intent/userMessage. Apply consistently to the new
  truncation gates and to the existing parse-failure catch paths in
  parseIntentResponse and parseEvaluationResponse.

Tests
- 49 new unit tests in AbstractLlmClientTest covering: provider variants of
  isTruncatedFinish, the Gemini MAX_TOKENS reproduction, OpenAI/Anthropic/
  model_length truncation paths, parseIntentResponse blank/unknown/whitespace/
  case-insensitive/explicit-unclear/JSON-null/null-response/invalid-JSON paths,
  evaluateResults truncation across providers + happy path + empty-content
  sanity, regenerateQuery truncation + extractable-query Option B preservation,
  and truncateForLog null/empty/short/boundary/over-boundary cases.
- 120/120 AbstractLlmClientTest green; 191/191 LLM package; 302/302 with the
  chat package included.

Codex review iterated to convergence (ok=true) with no remaining issues.

Closes #3125
@marevol marevol self-assigned this May 3, 2026
@marevol marevol added this to the 15.7.0 milestone May 3, 2026
@marevol marevol merged commit 5cdb1fb into master May 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RAG: intent detection falls into UNCLEAR when LLM response is truncated by MAX_TOKENS / length

1 participant