fix(server): Qwen3.6-27B tool calling for claude-code Anthropic path by dusterbloom · Pull Request #276 · Luce-Org/lucebox-hub

dusterbloom · 2026-05-25T17:55:14Z

Summary

Qwen3.6-27B (Q3_K_M / Q4_K_M, non-UD) tool calling silently fails for real claude-code traffic on /v1/messages — same model works fine through hermes (OpenAI shape). Confirmed community bug (HF discussion, reference fix repo).

Symptoms reproduced on a captured real claude-code request (22.8K-token system prompt + 24 tools):

Pre-fix: model emits 16 leading newlines, then ```bash\nls -lS /, then finish=stop. No structured tool_use.
Post-fix: model emits proper content_block_start type:tool_use, name:Bash with {"command":"ls -lhS /tmp 2>/dev/null | head -30"}, finish=tool_calls.

What's in the PR

Six coordinated server-side fixes, sequenced because each exposes the next:

normalize_tools_for_qwen() — convert Anthropic input_schema → OpenAI parameters shape so the Jinja template sees the schema key Qwen3-Coder was trained on.
<bash> / <read> / <write> / <edit> / <ls> / <grep> / <glob> native-tag parser fallback — Pattern 6 in tool_parser.cpp. Gated on `tools` being present (avoids fabricating phantom calls from prose).
scrub_schema_metadata() — strip JSON-Schema metadata (`$schema`, `additionalProperties`, `$defs`, `oneOf`/`anyOf`/`allOf`/`not`) before the Jinja `render_extra_keys` macro turns them into garbage XML tags that hallucinate function names like `<function=cls>`.
truncate_description() — cap each tool description at 500 bytes (paragraph break → sentence boundary → UTF-8-safe hard cut). Claude-code embeds 12KB of "use other tools instead" recipes inside Bash's own description, which steered Qwen to pick Write.
Closed <think> prefill in Jinja renderer when thinking is disabled — mirrors the hardcoded Qwen renderer. Handles trailing whitespace variants in the template. (Diagnosed by Codex.)
resolve_param_alias() — Q3 quant hallucinates short forms (`cmd`→`command`, `path`↔`file_path`, `expr`→`expression`, etc.). Resolves the emitted parameter name to the schema's canonical name via case-insensitive direct match plus a small alias table.

Plus PR #271's `find_tool_start` is extended to recognize the 7 native tags (rebase-resolved cleanly).

Test plan

1013 LOC diff: 337 production + 676 tests (~2:1 test:prod ratio)
New unit coverage: native-tag parsing, scrub recursion incl. combinators, description truncation incl. UTF-8 2/3/4-byte safety, alias resolution, no-tools-no-fabrication gate, Anthropic→OpenAI shape normalization
All sources compile clean (only pre-existing fread warnings)
End-to-end verified against captured real claude-code request: `finish=tool_calls`, structured `tool_use` block, valid `command` argument
Underwent self-review cycle (cubic-style); 3 P1 + 5 P2 findings addressed in commit `129e606`
Local server interactive verification: launch with `--chat-template-file ` against your dflash_server build, run a multi-turn claude-code session, confirm agentic loop completes
Regression check on hermes (OpenAI shape) — should be unaffected; Pattern 6 gate ensures no behavior change when tools shape is already OpenAI

Recipe to deploy

```bash
dflash_server <Qwen3.6-27B GGUF> \
--host 127.0.0.1 --port 18099 --max-ctx 131072 \
--draft \
--chat-template-file
```

Plus `~/.claude/settings.json` env block:
```json
{"env": {
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"CLAUDE_CODE_DISABLE_TELEMETRY": "1"
}}
```
(`CLAUDE_CODE_ATTRIBUTION_HEADER=0` must be in JSON, not shell `export` — see Unsloth docs.)

Known caveats

Q3_K_S works with all 6 fixes. Q4_K_M structurally also works but model prefers writing Python scripts over running Bash directly — Qwen3-Coder training bias amplified at higher precision.
UD-Q3_K_XL / UD-Q4_K_XL (Unsloth Dynamic, calibrated on tool-calling examples) likely needs fewer of these patches — not tested in this PR since download wasn't available in-session.
Native-tag fallback (`` etc.) is opportunistic: `` and `` can't satisfy Anthropic's Edit/Write tool schemas without `file_path` / `old_string` / `new_string`. They're kept for parser correctness but won't produce useful calls; flagged in code comments.

cubic-dev-ai

No issues found across 5 files

_{Re-trigger cubic}

howard0su · 2026-05-26T05:59:05Z

+        // assistant suffix, which leaves the model in the wrong decoding state
+        // for tool use. Mirror the hard-coded behavior here when the rendered
+        // prompt ends with a bare assistant generation prompt.
+        if (!enable_thinking) {


this is general code path. based on your comment, we should check the model or arch here.

… only The kAssistantBare -> kAssistantPrefill post-processing in render_chat_template_jinja was applied to all Jinja-rendered prompts. Add arch_hint (ChatFormat) parameter, defaulting to QWEN3, and guard the block with arch_hint == ChatFormat::QWEN3. Call site in http_server.cpp passes chat_format_ so other archs (Laguna, Gemma4) are unaffected. Addresses howard0su's review comment on PR Luce-Org#276.

dusterbloom · 2026-05-26T13:00:07Z

Addressed in 0e3c79a: added arch_hint (ChatFormat) parameter to render_chat_template_jinja, gated the closed-think prefill block on arch_hint == ChatFormat::QWEN3. Call site in http_server.cpp passes chat_format_ so Laguna and Gemma4 Jinja templates are untouched.

dusterbloom · 2026-05-26T13:00:30Z

Addressed in 0e3c79a: added arch_hint (ChatFormat) parameter to render_chat_template_jinja, gated the closed-think prefill injection on arch_hint == ChatFormat::QWEN3. Call site in http_server.cpp passes chat_format_ so Laguna and Gemma4 Jinja templates are untouched.

…template Anthropic tool definitions use `input_schema` as the schema key; Qwen3-Coder's chat template expects `parameters`. With claude-code's 24-tool requests the model couldn't ground its tool schemas and fell back to plain-text `<bash>` blocks. Adds `normalize_tools_for_qwen()` (38 LOC) that handles three input shapes: - Anthropic (input_schema) → {type:function, function:{name,description,parameters}} - OpenAI envelope already present → pass through unchanged - Bare Qwen top-level (name+parameters, no wrapper) → wrap to OpenAI envelope Wired into request parsing at body["tools"] assignment. 5 new unit tests: anthropic_bare, openai_passthrough, bare_qwen_passthrough, mixed (both shapes in one array), empty (defensive). All 1454 assertions pass.

…calls Model emits <bash>CMD</bash>, <ls>PATH</ls> etc. when its system prompt uses that format. Extend tool_parser (Pattern 6) and sse_emitter hit- detection to recognise these 7 tags: bash, read, write, edit, ls, grep, glob. Case-insensitive lookup maps the emitted tag to the canonical tool name from the request's tools array (e.g. <bash> → "Bash"). Eight new unit tests added; 1483 assertions all pass.

… Jinja XML collisions The Unsloth Jinja template's render_extra_keys macro unrolls every JSON-Schema key as a literal XML tag. Keys like $schema, additionalProperties, and $defs produced garbage XML (<$schema>...</$schema>, <additionalProperties>False</additionalProperties>) and crucially a nested <name> tag for each parameter that collided with the outer function's <name> tag, causing the model to hallucinate function names like <function=cls> with bogus parameters. Adds scrub_schema_metadata() (28 LOC) that strips the five metadata keys at every level of the schema tree (recursive through properties and items). Applied in all three normalization paths (Anthropic input_schema, OpenAI passthrough, bare Qwen). 3 new unit tests: strips_schema_metadata, strips_metadata_recursively, preserves_real_fields. All 1504 assertions pass, 0 failures. End-to-end replay of req_003.json (22.8K-token claude-code request): model now emits name:Write (real tool), stop_reason:tool_use, finish=tool_calls. No <function=cls> hallucination.

…e leakage Cap each tool and parameter description at 500 chars using paragraph-break > sentence-boundary > hard-cut priority, snapping back past UTF-8 multibyte sequences. Verified by 6 new unit tests (1529 assertions, 0 failures).

…nking is off When the Jinja template ends with a bare <|im_start|>assistant\n (e.g. the official Qwen3.6 template) and the request has thinking disabled, the hardcoded Qwen renderer appends <think>\n\n</think>\n\n to put the model in the right decoding state for tool use. The Jinja path was missing this suffix, so /v1/messages requests rendered through Jinja produced a different prompt shape than the OpenAI path. Mirror the hardcoded behavior. Diagnosed by Codex rescue session 019e5fd0 against captured req_003.json from a real claude-code run. Patch is dormant for templates that already append their own assistant suffix (Unsloth Qwen3-Coder).

…names Quantized models (notably Qwen3.6-27B-Q3) emit short forms of canonical parameter names: <parameter=cmd> instead of <parameter=command>, <path> instead of <file_path>, <expr> instead of <expression>. The schema-checking client (claude-code) then rejects the tool call. Add resolve_param_alias() that maps emitted keys to the schema's actual keys via case-insensitive direct match, then a small alias table for common cmd/command, path/file_path, query/pattern, expr/expression, src/source, dst/destination shortenings. Helper is pure, returns the original key if no canonical match exists. Verified: Qwen3.6-27B-Q3_K_S now produces {"command":"ls -lhS /tmp..."} for claude-code's Bash tool (was {"cmd":...} pre-fix).

…P2-5, P2-8) P1 blockers: - P1-1 (tool_parser.cpp): drop std::regex::icase from re_native_tag so Pattern 6 alignment with sse_emitter::find_tool_start (case-sensitive). Also bound the body quantifier to {0,65536}? to prevent catastrophic backtracking on adversarial input. - P1-2 (tool_parser.cpp): gate Pattern 6 on tools.is_array() && !empty() so prose like 'please read the manual' or 'grep for the pattern' doesn't get fabricated into phantom tool calls. - P1-3 (test_server_unit.cpp): rewrite test_truncate_preserves_unicode assertion to actually verify the byte before the ellipsis is not a UTF-8 continuation byte. Add 2-byte (é) and 4-byte (𝄞) coverage too. P2 fixes: - P2-1 (http_server.cpp): scrub_schema_metadata now recurses into JSON Schema combinators (oneOf, anyOf, allOf, not). Anthropic tool defs use these for polymorphic params; without recursion the noise leaks. - P2-3 (test_server_unit.cpp): add four resolve_param_alias tests (cmd→command, path→file_path, case-insensitive direct, passthrough) via the public parse_tool_calls API. - P2-5 (chat_template.cpp): make think-prefill suffix check tolerant of trailing whitespace variants (\n\n, trailing space). Trim trailing whitespace, check for bare <|im_start|>assistant, then re-emit marker + prefill. - P2-8 (test_server_unit.cpp): fix tautological assertion in test_truncate_at_paragraph_break (was checking '\xE2' on result.back() which is always the last byte of the ellipsis '\xA6'). Existing tests updated: bash_multiline/ls_with_path now pass tools (the new P1-2 gate requires it). bash_no_match repurposed; new no_tools_no_fabrication tests added to lock in the gate.

… only The kAssistantBare -> kAssistantPrefill post-processing in render_chat_template_jinja was applied to all Jinja-rendered prompts. Add arch_hint (ChatFormat) parameter, defaulting to QWEN3, and guard the block with arch_hint == ChatFormat::QWEN3. Call site in http_server.cpp passes chat_format_ so other archs (Laguna, Gemma4) are unaffected. Addresses howard0su's review comment on PR Luce-Org#276.

Merge PR Luce-Org#276 as a stack parent while preserving the existing server unit coverage already carried by auto-integration. The PR head's only tree delta duplicated normalize/tool-call tests in test_server_unit.cpp and left invalid duplicate definitions, so restore the pre-merge file and record the reconciliation in the manifest.

cubic-dev-ai Bot reviewed May 25, 2026

View reviewed changes

howard0su reviewed May 26, 2026

View reviewed changes

easel added a commit to easel/lucebox-hub that referenced this pull request May 26, 2026

Merge PR Luce-Org#276 into auto-integration

193d6ca

dusterbloom added 8 commits May 28, 2026 20:58

dusterbloom force-pushed the fix/qwen36-claude-code-tool-calling branch from 0e3c79a to 5e861b4 Compare May 28, 2026 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): Qwen3.6-27B tool calling for claude-code Anthropic path#276

fix(server): Qwen3.6-27B tool calling for claude-code Anthropic path#276
dusterbloom wants to merge 8 commits into
Luce-Org:mainfrom
dusterbloom:fix/qwen36-claude-code-tool-calling

dusterbloom commented May 25, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

howard0su May 26, 2026

Uh oh!

dusterbloom commented May 26, 2026

Uh oh!

dusterbloom commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dusterbloom commented May 25, 2026

Summary

What's in the PR

Test plan

Recipe to deploy

Known caveats

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

howard0su May 26, 2026

Choose a reason for hiding this comment

Uh oh!

dusterbloom commented May 26, 2026

Uh oh!

dusterbloom commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants