Skip to content

fix(csharp): recover declarations from namespaces with internal errors (#115)#116

Merged
odvcencio merged 2 commits into
odvcencio:mainfrom
richardwooding:fix/csharp-namespace-internal-error-recovery
Jun 16, 2026
Merged

fix(csharp): recover declarations from namespaces with internal errors (#115)#116
odvcencio merged 2 commits into
odvcencio:mainfrom
richardwooding:fix/csharp-namespace-internal-error-recovery

Conversation

@richardwooding

Copy link
Copy Markdown
Contributor

Summary

Fixes #115. Large real-world C# files (e.g. Newtonsoft.Json's JsonReader.cs) could parse into a single top-level ERROR node, so a tree walk found 0 class_declaration / method_declaration nodes. Every individual construct parses fine in isolation — it's a cumulative GLR parse failure — so the fix is in the C# recovery normalizers, not the grammar blob.

Root cause

  • normalizeCSharpRecoveredNamespaces was all-or-nothing: csharpRecoverNamespaceNodeFromRange only succeeded when the whole namespace body re-parsed cleanly. With any internal error it gave up.
  • The subsequent normalizeCSharpRecoveredTypeDeclarations pass then bailed on the leading preproc_region child.
  • The brace matcher used during recovery ignored char/string/comment content, truncating spans on files with braces inside char literals ('{' / '}').

Changes

  • Namespace fallback (parser_result_csharp_namespace.go): when the clean extraction fails, synthesise a namespace_declaration from the sub-parse that was already produced (no extra reparse) and recover its body declarations leniently, so the type declarations and the members that parsed still surface.
  • Lenient member recovery (parser_result_csharp_type_body.go): thread a lenient flag so an unrecoverable ERROR fragment is skipped instead of failing the whole enclosing type; keep nested namespaces as body children.
  • Trivia-aware brace matching (parser_result_csharp_helpers.go): csharpFindMatchingBraceByte skips braces inside line/block comments, regular/verbatim strings, and char literals. Existing C# recovery call sites now use it.
  • Top-level allow-list (parser_result_csharp.go): permit preprocessor directive nodes so a recovered root can retag to compilation_unit.

Bounding

Respects the prior recovery-cost work (#64, #98, #106): the fallback reuses the sub-parse already produced (no new namespace reparse), does a single O(children) pass, and leaves the 4096-byte method-body reparse gate and the csharpMaxNamespaceRecoveries cap in place. The existing boundedness tests stay green.

Result

File Before After
JsonReader.cs single ERROR, 0 decls compilation_unit, class + 13 properties
JsonTextWriter.cs single ERROR, 0 decls compilation_unit, class + 11 methods
JsonTextReader.cs single ERROR, 0 decls compilation_unit, namespace recovered

Recovery is best-effort: declarations whose header is shredded into ERROR fragments by the GLR failure (e.g. JsonTextReader's class) are not fully reconstructed — that would need source-based reconstruction and is intentionally out of scope here to keep the change bounded. The catastrophic whole-file collapse is gone in all cases.

Tests

  • New regression tests in parser_csharp_namespace_recovery_test.go (namespace-with-internal-error recovers declarations; char-literal braces don't truncate the span; clean namespaces are unaffected).
  • go test ./... is green, including the C# boundedness suite.

odvcencio#115)

Large real-world C# files (e.g. Newtonsoft.Json's JsonReader.cs) could
collapse into a single top-level ERROR node, so a tree walk found zero
class_declaration / method_declaration nodes. The cause is a cumulative
GLR parse failure (every construct parses fine in isolation), so the fix
belongs in the C# recovery normalizers rather than the grammar.

Namespace recovery was all-or-nothing: it only succeeded when the whole
namespace body re-parsed cleanly. When the body contained internal errors
it gave up, and the subsequent top-level type-declaration pass bailed on
the leading preprocessor directive.

- Add a best-effort namespace fallback that, when the clean extraction
  fails, synthesises a namespace_declaration from the existing sub-parse
  (no extra reparse) and recovers the body declarations leniently, so the
  type declarations and the members that parsed still surface.
- Thread a lenient flag through the type-declaration body recovery so an
  unrecoverable ERROR fragment is skipped instead of failing the whole
  enclosing type; keep nested namespaces as body children.
- Make C# brace matching trivia-aware so braces inside char literals,
  strings and comments no longer truncate a recovered declaration span.
- Allow preprocessor directive nodes at the top level so a recovered root
  can retag to compilation_unit.

The recovery stays bounded: it reuses the sub-parse already produced
(no new namespace reparse), does a single O(children) pass, and leaves
the existing 4096-byte method-body reparse gate and namespace recovery
cap in place.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements robust C# namespace recovery when parsing errors occur inside namespace bodies (resolving issue #115). It introduces a trivia-aware brace matcher (csharpFindMatchingBraceByte) that correctly handles comments, strings, and char literals, and adds a fallback mechanism to construct best-effort namespace declarations from error roots. Feedback on the PR points out a bug in the brace matcher where verbatim interpolated strings using the @$"..." syntax are not correctly identified, which could lead to parsing issues, and provides a code suggestion to resolve it.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread parser_result_csharp_helpers.go
The trivia-aware brace matcher and the top-level chunk scanner only flagged
a string as verbatim when '@' immediately preceded the opening quote, missing
verbatim interpolated strings written as @$"..." (C# 8+), where the quote is
preceded by '$' and then '@'. In that case "" was treated as two separate
strings and backslashes as escapes, which could let braces inside the string
be miscounted and truncate a recovered declaration span.

Detect both @"... / $@"... ('@' before the quote) and @$"... ('@$' before the
quote). Add a verbatim-string recovery regression test.

Addresses review feedback on odvcencio#116.
@odvcencio

Copy link
Copy Markdown
Owner

heya, thanks for the contribution- just a small note, im going to merge this as a temporary pain relief while i attempt to revisit the machinery to not have to require (any) normalization or recovery paths per grammar. thanks again!

@odvcencio odvcencio merged commit aeccf2c into odvcencio:main Jun 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C#: large real-world files collapse to a single ERROR node (0 declarations extracted)

2 participants