fix(server): Qwen3.6-27B tool calling for claude-code Anthropic path#276
Open
dusterbloom wants to merge 8 commits into
Open
fix(server): Qwen3.6-27B tool calling for claude-code Anthropic path#276dusterbloom wants to merge 8 commits into
dusterbloom wants to merge 8 commits into
Conversation
howard0su
reviewed
May 26, 2026
| // assistant suffix, which leaves the model in the wrong decoding state | ||
| // for tool use. Mirror the hard-coded behavior here when the rendered | ||
| // prompt ends with a bare assistant generation prompt. | ||
| if (!enable_thinking) { |
Contributor
There was a problem hiding this comment.
this is general code path. based on your comment, we should check the model or arch here.
dusterbloom
added a commit
to dusterbloom/lucebox-hub
that referenced
this pull request
May 26, 2026
… only The kAssistantBare -> kAssistantPrefill post-processing in render_chat_template_jinja was applied to all Jinja-rendered prompts. Add arch_hint (ChatFormat) parameter, defaulting to QWEN3, and guard the block with arch_hint == ChatFormat::QWEN3. Call site in http_server.cpp passes chat_format_ so other archs (Laguna, Gemma4) are unaffected. Addresses howard0su's review comment on PR Luce-Org#276.
Collaborator
Author
|
Addressed in 0e3c79a: added |
Collaborator
Author
|
Addressed in 0e3c79a: added |
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 26, 2026
…template
Anthropic tool definitions use `input_schema` as the schema key; Qwen3-Coder's
chat template expects `parameters`. With claude-code's 24-tool requests the model
couldn't ground its tool schemas and fell back to plain-text `<bash>` blocks.
Adds `normalize_tools_for_qwen()` (38 LOC) that handles three input shapes:
- Anthropic (input_schema) → {type:function, function:{name,description,parameters}}
- OpenAI envelope already present → pass through unchanged
- Bare Qwen top-level (name+parameters, no wrapper) → wrap to OpenAI envelope
Wired into request parsing at body["tools"] assignment.
5 new unit tests: anthropic_bare, openai_passthrough, bare_qwen_passthrough,
mixed (both shapes in one array), empty (defensive). All 1454 assertions pass.
…calls Model emits <bash>CMD</bash>, <ls>PATH</ls> etc. when its system prompt uses that format. Extend tool_parser (Pattern 6) and sse_emitter hit- detection to recognise these 7 tags: bash, read, write, edit, ls, grep, glob. Case-insensitive lookup maps the emitted tag to the canonical tool name from the request's tools array (e.g. <bash> → "Bash"). Eight new unit tests added; 1483 assertions all pass.
… Jinja XML collisions The Unsloth Jinja template's render_extra_keys macro unrolls every JSON-Schema key as a literal XML tag. Keys like $schema, additionalProperties, and $defs produced garbage XML (<$schema>...</$schema>, <additionalProperties>False</additionalProperties>) and crucially a nested <name> tag for each parameter that collided with the outer function's <name> tag, causing the model to hallucinate function names like <function=cls> with bogus parameters. Adds scrub_schema_metadata() (28 LOC) that strips the five metadata keys at every level of the schema tree (recursive through properties and items). Applied in all three normalization paths (Anthropic input_schema, OpenAI passthrough, bare Qwen). 3 new unit tests: strips_schema_metadata, strips_metadata_recursively, preserves_real_fields. All 1504 assertions pass, 0 failures. End-to-end replay of req_003.json (22.8K-token claude-code request): model now emits name:Write (real tool), stop_reason:tool_use, finish=tool_calls. No <function=cls> hallucination.
…e leakage Cap each tool and parameter description at 500 chars using paragraph-break > sentence-boundary > hard-cut priority, snapping back past UTF-8 multibyte sequences. Verified by 6 new unit tests (1529 assertions, 0 failures).
…nking is off When the Jinja template ends with a bare <|im_start|>assistant\n (e.g. the official Qwen3.6 template) and the request has thinking disabled, the hardcoded Qwen renderer appends <think>\n\n</think>\n\n to put the model in the right decoding state for tool use. The Jinja path was missing this suffix, so /v1/messages requests rendered through Jinja produced a different prompt shape than the OpenAI path. Mirror the hardcoded behavior. Diagnosed by Codex rescue session 019e5fd0 against captured req_003.json from a real claude-code run. Patch is dormant for templates that already append their own assistant suffix (Unsloth Qwen3-Coder).
…names
Quantized models (notably Qwen3.6-27B-Q3) emit short forms of canonical
parameter names: <parameter=cmd> instead of <parameter=command>, <path>
instead of <file_path>, <expr> instead of <expression>. The schema-checking
client (claude-code) then rejects the tool call.
Add resolve_param_alias() that maps emitted keys to the schema's actual
keys via case-insensitive direct match, then a small alias table for
common cmd/command, path/file_path, query/pattern, expr/expression,
src/source, dst/destination shortenings. Helper is pure, returns the
original key if no canonical match exists.
Verified: Qwen3.6-27B-Q3_K_S now produces {"command":"ls -lhS /tmp..."}
for claude-code's Bash tool (was {"cmd":...} pre-fix).
…P2-5, P2-8)
P1 blockers:
- P1-1 (tool_parser.cpp): drop std::regex::icase from re_native_tag so
Pattern 6 alignment with sse_emitter::find_tool_start (case-sensitive).
Also bound the body quantifier to {0,65536}? to prevent catastrophic
backtracking on adversarial input.
- P1-2 (tool_parser.cpp): gate Pattern 6 on tools.is_array() && !empty()
so prose like 'please read the manual' or 'grep for the pattern' doesn't
get fabricated into phantom tool calls.
- P1-3 (test_server_unit.cpp): rewrite test_truncate_preserves_unicode
assertion to actually verify the byte before the ellipsis is not a UTF-8
continuation byte. Add 2-byte (é) and 4-byte (𝄞) coverage too.
P2 fixes:
- P2-1 (http_server.cpp): scrub_schema_metadata now recurses into JSON
Schema combinators (oneOf, anyOf, allOf, not). Anthropic tool defs use
these for polymorphic params; without recursion the noise leaks.
- P2-3 (test_server_unit.cpp): add four resolve_param_alias tests
(cmd→command, path→file_path, case-insensitive direct, passthrough)
via the public parse_tool_calls API.
- P2-5 (chat_template.cpp): make think-prefill suffix check tolerant of
trailing whitespace variants (\n\n, trailing space). Trim trailing
whitespace, check for bare <|im_start|>assistant, then re-emit
marker + prefill.
- P2-8 (test_server_unit.cpp): fix tautological assertion in
test_truncate_at_paragraph_break (was checking '\xE2' on result.back()
which is always the last byte of the ellipsis '\xA6').
Existing tests updated: bash_multiline/ls_with_path now pass tools (the
new P1-2 gate requires it). bash_no_match repurposed; new
no_tools_no_fabrication tests added to lock in the gate.
… only The kAssistantBare -> kAssistantPrefill post-processing in render_chat_template_jinja was applied to all Jinja-rendered prompts. Add arch_hint (ChatFormat) parameter, defaulting to QWEN3, and guard the block with arch_hint == ChatFormat::QWEN3. Call site in http_server.cpp passes chat_format_ so other archs (Laguna, Gemma4) are unaffected. Addresses howard0su's review comment on PR Luce-Org#276.
0e3c79a to
5e861b4
Compare
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 28, 2026
Merge PR Luce-Org#276 as a stack parent while preserving the existing server unit coverage already carried by auto-integration. The PR head's only tree delta duplicated normalize/tool-call tests in test_server_unit.cpp and left invalid duplicate definitions, so restore the pre-merge file and record the reconciliation in the manifest.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Qwen3.6-27B (Q3_K_M / Q4_K_M, non-UD) tool calling silently fails for real claude-code traffic on
/v1/messages— same model works fine through hermes (OpenAI shape). Confirmed community bug (HF discussion, reference fix repo).Symptoms reproduced on a captured real claude-code request (22.8K-token system prompt + 24 tools):
```bash\nls -lS /, thenfinish=stop. No structuredtool_use.content_block_start type:tool_use, name:Bashwith{"command":"ls -lhS /tmp 2>/dev/null | head -30"},finish=tool_calls.What's in the PR
Six coordinated server-side fixes, sequenced because each exposes the next:
normalize_tools_for_qwen()— convert Anthropicinput_schema→ OpenAIparametersshape so the Jinja template sees the schema key Qwen3-Coder was trained on.<bash>/<read>/<write>/<edit>/<ls>/<grep>/<glob>native-tag parser fallback — Pattern 6 intool_parser.cpp. Gated on `tools` being present (avoids fabricating phantom calls from prose).scrub_schema_metadata()— strip JSON-Schema metadata (`$schema`, `additionalProperties`, `$defs`, `oneOf`/`anyOf`/`allOf`/`not`) before the Jinja `render_extra_keys` macro turns them into garbage XML tags that hallucinate function names like `<function=cls>`.truncate_description()— cap each tool description at 500 bytes (paragraph break → sentence boundary → UTF-8-safe hard cut). Claude-code embeds 12KB of "use other tools instead" recipes inside Bash's own description, which steered Qwen to pick Write.<think>prefill in Jinja renderer when thinking is disabled — mirrors the hardcoded Qwen renderer. Handles trailing whitespace variants in the template. (Diagnosed by Codex.)resolve_param_alias()— Q3 quant hallucinates short forms (`cmd`→`command`, `path`↔`file_path`, `expr`→`expression`, etc.). Resolves the emitted parameter name to the schema's canonical name via case-insensitive direct match plus a small alias table.Plus PR #271's `find_tool_start` is extended to recognize the 7 native tags (rebase-resolved cleanly).
Test plan
Recipe to deploy
```bash
dflash_server <Qwen3.6-27B GGUF> \
--host 127.0.0.1 --port 18099 --max-ctx 131072 \
--draft \
--chat-template-file
```
Plus `~/.claude/settings.json` env block:
```json
{"env": {
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"CLAUDE_CODE_DISABLE_TELEMETRY": "1"
}}
```
(`CLAUDE_CODE_ATTRIBUTION_HEADER=0` must be in JSON, not shell `export` — see Unsloth docs.)
Known caveats