fix(llm): treat truncated LLM responses as fallback in RAG intent#3129
Merged
fix(llm): treat truncated LLM responses as fallback in RAG intent#3129
Conversation
…ection
AbstractLlmClient.detectIntent could classify a user message as UNCLEAR
when the underlying LLM response was truncated by output-token exhaustion
(Gemini finishReason=MAX_TOKENS, OpenAI finish_reason=length, Anthropic
max_tokens), surfacing a "please clarify" reply for an answerable question.
Two cooperating gaps in AbstractLlmClient caused the symptom:
1. isEmptyContentWithLengthFinish only matched OpenAI-style "length" and
required blank content, so Gemini MAX_TOKENS with non-empty truncated
content slipped past the defensive fallback.
2. parseIntentResponse silently routed a truncated-but-parseable JSON to
IntentDetectionResult.unclear because ChatIntent.fromValue maps a
missing/blank intent field to UNCLEAR by design.
Changes
- Add TRUNCATION_FINISH_REASONS Set ("length", "MAX_TOKENS", "max_tokens",
"model_length") and isTruncatedFinish helper covering OpenAI, Ollama,
Gemini, Anthropic, and self-hosted endpoints.
- Generalize isEmptyContentWithLengthFinish to delegate to the helper
(kept for subclass back-compat).
- Replace the empty-content gate in detectIntent (both overloads) and
evaluateResults with a strict truncation gate so any truncated response
falls back to fallbackSearch / fallbackAllRelevant regardless of content
emptiness.
- Add a WARN-only truncation log to regenerateQuery without short-circuiting
(the "query" field is atomic and the regex requires a closing quote, so
partial values are filtered out and any extractable refinement is kept).
- Harden parseIntentResponse so a missing/blank intent field or an unknown
value falls back to search instead of confident UNCLEAR; reserve UNCLEAR
strictly for an explicit "intent":"unclear" from the model.
- Sanitize WARN-level logs against operator-log data exposure: introduce
truncateForLog (cap at LOG_RESPONSE_HEAD_MAX_CHARS=200) and emit
responseLength + responseHead + intentLength + intentHead instead of
unbounded raw response/intent/userMessage. Apply consistently to the new
truncation gates and to the existing parse-failure catch paths in
parseIntentResponse and parseEvaluationResponse.
Tests
- 49 new unit tests in AbstractLlmClientTest covering: provider variants of
isTruncatedFinish, the Gemini MAX_TOKENS reproduction, OpenAI/Anthropic/
model_length truncation paths, parseIntentResponse blank/unknown/whitespace/
case-insensitive/explicit-unclear/JSON-null/null-response/invalid-JSON paths,
evaluateResults truncation across providers + happy path + empty-content
sanity, regenerateQuery truncation + extractable-query Option B preservation,
and truncateForLog null/empty/short/boundary/over-boundary cases.
- 120/120 AbstractLlmClientTest green; 191/191 LLM package; 302/302 with the
chat package included.
Codex review iterated to convergence (ok=true) with no remaining issues.
Closes #3125
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3125.
AbstractLlmClient.detectIntentcould classify a user message asUNCLEARwhen the LLM response was truncated by output-token exhaustion (GeminifinishReason=MAX_TOKENS, OpenAIfinish_reason=length, Anthropicmax_tokens), surfacing a "please clarify" reply for an answerable question.Two cooperating gaps:
isEmptyContentWithLengthFinishonly matched OpenAI-style"length"and required blank content, so GeminiMAX_TOKENSwith non-empty truncated content slipped past the defensive fallback.parseIntentResponsesilently routed a truncated-but-parseable JSON toIntentDetectionResult.unclearbecauseChatIntent.fromValuemaps a missing/blank intent field toUNCLEARby design.Changes
TRUNCATION_FINISH_REASONSset (length,MAX_TOKENS,max_tokens,model_length) andisTruncatedFinishhelper covering OpenAI, Ollama, Gemini, Anthropic, and self-hosted endpoints. GeneralizeisEmptyContentWithLengthFinishto delegate to the helper (kept for subclass back-compat).detectIntent(both overloads) andevaluateResultswith a strict truncation gate so any truncated response falls back tofallbackSearch/fallbackAllRelevantregardless of content emptiness.regenerateQuerywithout short-circuiting (thequeryfield is atomic and the regex requires a closing quote, so partial values are filtered out and any extractable refinement is preserved).parseIntentResponse: a missing/blank intent field or an unknown value falls back to search instead of confidentUNCLEAR. ReserveUNCLEARstrictly for an explicit"intent":"unclear"from the model.truncateForLog(cap atLOG_RESPONSE_HEAD_MAX_CHARS=200) and emitresponseLength+responseHead+intentLength+intentHeadinstead of unbounded raw response/intent/userMessage. Apply consistently to the new truncation gates and to the existing parse-failure catch paths inparseIntentResponseandparseEvaluationResponse.Test plan
mvn test -Dtest=AbstractLlmClientTest— 120/120 green (49 new tests)mvn test -Dtest='org.codelibs.fess.llm.*Test'— 191/191 greenmvn test -Dtest='org.codelibs.fess.llm.*Test,org.codelibs.fess.chat.*Test'— 302/302 green (no regression inChatClientconsumers)ok=true, no remaining issues)New tests cover: provider variants of
isTruncatedFinish, the GeminiMAX_TOKENSreproduction, OpenAI/Anthropic/model_lengthtruncation,parseIntentResponseblank/unknown/whitespace/case-insensitive/explicit-unclear/JSON-null/null-response/invalid-JSON,evaluateResultstruncation across providers + happy path + empty-content sanity,regenerateQuerytruncation + extractable-query preservation,truncateForLognull/empty/short/boundary/over-boundary.