fix(llm): treat truncated LLM responses as fallback in RAG intent by marevol · Pull Request #3129 · codelibs/fess

marevol · 2026-05-03T00:55:53Z

Summary

Fixes #3125. AbstractLlmClient.detectIntent could classify a user message as UNCLEAR when the LLM response was truncated by output-token exhaustion (Gemini finishReason=MAX_TOKENS, OpenAI finish_reason=length, Anthropic max_tokens), surfacing a "please clarify" reply for an answerable question.

Two cooperating gaps:

isEmptyContentWithLengthFinish only matched OpenAI-style "length" and required blank content, so Gemini MAX_TOKENS with non-empty truncated content slipped past the defensive fallback.
parseIntentResponse silently routed a truncated-but-parseable JSON to IntentDetectionResult.unclear because ChatIntent.fromValue maps a missing/blank intent field to UNCLEAR by design.

Changes

Add TRUNCATION_FINISH_REASONS set (length, MAX_TOKENS, max_tokens, model_length) and isTruncatedFinish helper covering OpenAI, Ollama, Gemini, Anthropic, and self-hosted endpoints. Generalize isEmptyContentWithLengthFinish to delegate to the helper (kept for subclass back-compat).
Replace the empty-content gate in detectIntent (both overloads) and evaluateResults with a strict truncation gate so any truncated response falls back to fallbackSearch / fallbackAllRelevant regardless of content emptiness.
Add a WARN-only truncation log to regenerateQuery without short-circuiting (the query field is atomic and the regex requires a closing quote, so partial values are filtered out and any extractable refinement is preserved).
Harden parseIntentResponse: a missing/blank intent field or an unknown value falls back to search instead of confident UNCLEAR. Reserve UNCLEAR strictly for an explicit "intent":"unclear" from the model.
Sanitize WARN-level logs against operator-log data exposure: introduce truncateForLog (cap at LOG_RESPONSE_HEAD_MAX_CHARS=200) and emit responseLength + responseHead + intentLength + intentHead instead of unbounded raw response/intent/userMessage. Apply consistently to the new truncation gates and to the existing parse-failure catch paths in parseIntentResponse and parseEvaluationResponse.

Test plan

mvn test -Dtest=AbstractLlmClientTest — 120/120 green (49 new tests)
mvn test -Dtest='org.codelibs.fess.llm.*Test' — 191/191 green
mvn test -Dtest='org.codelibs.fess.llm.*Test,org.codelibs.fess.chat.*Test' — 302/302 green (no regression in ChatClient consumers)
Codex independent review — iterated to convergence (ok=true, no remaining issues)

New tests cover: provider variants of isTruncatedFinish, the Gemini MAX_TOKENS reproduction, OpenAI/Anthropic/model_length truncation, parseIntentResponse blank/unknown/whitespace/case-insensitive/explicit-unclear/JSON-null/null-response/invalid-JSON, evaluateResults truncation across providers + happy path + empty-content sanity, regenerateQuery truncation + extractable-query preservation, truncateForLog null/empty/short/boundary/over-boundary.

…ection AbstractLlmClient.detectIntent could classify a user message as UNCLEAR when the underlying LLM response was truncated by output-token exhaustion (Gemini finishReason=MAX_TOKENS, OpenAI finish_reason=length, Anthropic max_tokens), surfacing a "please clarify" reply for an answerable question. Two cooperating gaps in AbstractLlmClient caused the symptom: 1. isEmptyContentWithLengthFinish only matched OpenAI-style "length" and required blank content, so Gemini MAX_TOKENS with non-empty truncated content slipped past the defensive fallback. 2. parseIntentResponse silently routed a truncated-but-parseable JSON to IntentDetectionResult.unclear because ChatIntent.fromValue maps a missing/blank intent field to UNCLEAR by design. Changes - Add TRUNCATION_FINISH_REASONS Set ("length", "MAX_TOKENS", "max_tokens", "model_length") and isTruncatedFinish helper covering OpenAI, Ollama, Gemini, Anthropic, and self-hosted endpoints. - Generalize isEmptyContentWithLengthFinish to delegate to the helper (kept for subclass back-compat). - Replace the empty-content gate in detectIntent (both overloads) and evaluateResults with a strict truncation gate so any truncated response falls back to fallbackSearch / fallbackAllRelevant regardless of content emptiness. - Add a WARN-only truncation log to regenerateQuery without short-circuiting (the "query" field is atomic and the regex requires a closing quote, so partial values are filtered out and any extractable refinement is kept). - Harden parseIntentResponse so a missing/blank intent field or an unknown value falls back to search instead of confident UNCLEAR; reserve UNCLEAR strictly for an explicit "intent":"unclear" from the model. - Sanitize WARN-level logs against operator-log data exposure: introduce truncateForLog (cap at LOG_RESPONSE_HEAD_MAX_CHARS=200) and emit responseLength + responseHead + intentLength + intentHead instead of unbounded raw response/intent/userMessage. Apply consistently to the new truncation gates and to the existing parse-failure catch paths in parseIntentResponse and parseEvaluationResponse. Tests - 49 new unit tests in AbstractLlmClientTest covering: provider variants of isTruncatedFinish, the Gemini MAX_TOKENS reproduction, OpenAI/Anthropic/ model_length truncation paths, parseIntentResponse blank/unknown/whitespace/ case-insensitive/explicit-unclear/JSON-null/null-response/invalid-JSON paths, evaluateResults truncation across providers + happy path + empty-content sanity, regenerateQuery truncation + extractable-query Option B preservation, and truncateForLog null/empty/short/boundary/over-boundary cases. - 120/120 AbstractLlmClientTest green; 191/191 LLM package; 302/302 with the chat package included. Codex review iterated to convergence (ok=true) with no remaining issues. Closes #3125

marevol self-assigned this May 3, 2026

marevol added the improvement label May 3, 2026

marevol added this to the 15.7.0 milestone May 3, 2026

marevol merged commit 5cdb1fb into master May 4, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): treat truncated LLM responses as fallback in RAG intent#3129

fix(llm): treat truncated LLM responses as fallback in RAG intent#3129
marevol merged 1 commit intomasterfrom
fix/llm-rag-intent-truncation

marevol commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marevol commented May 3, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant