Fix extract_fenced_code_block for language tags with non-word characters by vidigoat · Pull Request #1468 · simonw/llm

vidigoat · 2026-06-02T18:17:20Z

extract_fenced_code_block captured the language/info tag with (\w+)?. \w only matches [A-Za-z0-9_], so a fenced block whose info string contains a non-word character fails to match entirely and the function returns None. Common, real-world language tags trigger this: ```c++, ```objective-c, ```c#, ```f#. The effect is that llm -x / --extract / --extract-last and the template extract: / extract_last: options silently extract nothing when a model labels the block with one of these languages.

This widens the capture to [^\n]*so the whole info string up to the newline is consumed, regardless of punctuation. Plain ``` ` (no language) and existing word-only tags are unaffected.

Added parametrized cases to test_extract_fenced_code_block for c++, objective-c, and c#. They fail on main (return None) and pass with this change; the rest of tests/test_utils.py stays green.

The language-tag capture used \w+, which only matches [A-Za-z0-9_]. When a fenced block was labelled with an info string containing a non-word character (e.g. ```c++, ```objective-c, ```c#), the pattern failed to match and the function returned None, silently breaking llm -x/--extract and the template extract: options. Widen the capture to [^\n`]* so the whole info string is consumed up to the newline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix extract_fenced_code_block for language tags with non-word characters#1468

Fix extract_fenced_code_block for language tags with non-word characters#1468
vidigoat wants to merge 1 commit into
simonw:mainfrom
vidigoat:fix-fenced-code-lang-symbols

vidigoat commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vidigoat commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant