Reasoning loss diagnostic logging#123
Open
mikeperry-tor wants to merge 3 commits into
Open
Conversation
This should become a teep factor for reports, if it works.
This causes GLM to strip prior reasoning via JINJA template.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds diagnostic logging in the proxy’s chat-completions path to help identify common causes of “reasoning loss” (e.g., frameworks stripping assistant reasoning fields, model-specific chat template flags that discard reasoning, and trailing user addendums after tool output).
Changes:
- Introduces request-metadata extraction (
chatRequestStats) and structured slog output (logChatRequestStats) for chat requests. - Adds a per-server hourly log limiter to rate-limit repeated reasoning-loss warnings.
- Adds unit tests covering reasoning-loss classification, model-flag warnings/suppression, and rate limiting behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| internal/proxy/proxy.go | Adds chat request reasoning-loss metadata parsing, rate-limited diagnostic logging, and hooks it into the chat endpoint handler. |
| internal/proxy/proxy_internal_test.go | Adds focused unit tests for stats extraction and emitted log signals (including rate-limiting behavior). |
Comment on lines
+626
to
+628
| if string(bytes.TrimSpace(raw)) == "null" { | ||
| return "", false, true | ||
| } |
Comment on lines
+648
to
+650
| if string(bytes.TrimSpace(raw)) == "null" { | ||
| return false, true, false | ||
| } |
Comment on lines
+767
to
+770
| func logChatRequestStats(ctx context.Context, limiter *hourlyLogLimiter, model, providerName, upstreamModel, path string, body []byte) { | ||
| stats, err := chatRequestStats(body) | ||
| if err != nil { | ||
| if allowHourlyLog(limiter, "chat_reasoning_metadata_unavailable") { |
Collaborator
Author
There was a problem hiding this comment.
This recommended fix drops the rate limiter for some reason. We should instead also use the rate limiter to avoid parsing, in addition to the log level checks.
Collaborator
Author
|
We should also reference #124 in these log messages for more information. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Diagnostic logging for common sources of agent reasoning loss.
Bug 1: Most frameworks omit reasoning_content for "openai compatible" apis like teep -> WARN if detected in same turn; INFO if detected across turns.
Bug 2: Onyx and maybe others append a reminder user message, which causes most model types (GLM, DeepSeek, Qwen) to purge prior reasoning, thinking it is a new turn -> WARN if detected.
Bug 3: If an API request comes in trying to preserve all reasoning including prior turns (typical of properly written coding agents), but the API omits a model-specific reasoning preservation flag (can happen if framework does not detect teep's model type) -> WARN.
None are bugs caused by teep per-se, but they are easy hazards that agent frameworks can mess up, especially when using teep's oddly formatted model names and unknown base url and specific provider API type.
Arguably we could make a camoflage mode that might help mitigate these (for example, by normalizing model names to their OpenRouter versions, and officially speaking OpenRouter API instead of "OpenAI Compatible", or some similar option, but then this becomes a user configuration hazard or flat out impossibility because of the lack of API base url input field).