fix(csharp): recover declarations from namespaces with internal errors (#115)#116
Conversation
odvcencio#115) Large real-world C# files (e.g. Newtonsoft.Json's JsonReader.cs) could collapse into a single top-level ERROR node, so a tree walk found zero class_declaration / method_declaration nodes. The cause is a cumulative GLR parse failure (every construct parses fine in isolation), so the fix belongs in the C# recovery normalizers rather than the grammar. Namespace recovery was all-or-nothing: it only succeeded when the whole namespace body re-parsed cleanly. When the body contained internal errors it gave up, and the subsequent top-level type-declaration pass bailed on the leading preprocessor directive. - Add a best-effort namespace fallback that, when the clean extraction fails, synthesises a namespace_declaration from the existing sub-parse (no extra reparse) and recovers the body declarations leniently, so the type declarations and the members that parsed still surface. - Thread a lenient flag through the type-declaration body recovery so an unrecoverable ERROR fragment is skipped instead of failing the whole enclosing type; keep nested namespaces as body children. - Make C# brace matching trivia-aware so braces inside char literals, strings and comments no longer truncate a recovered declaration span. - Allow preprocessor directive nodes at the top level so a recovered root can retag to compilation_unit. The recovery stays bounded: it reuses the sub-parse already produced (no new namespace reparse), does a single O(children) pass, and leaves the existing 4096-byte method-body reparse gate and namespace recovery cap in place.
There was a problem hiding this comment.
Code Review
This pull request implements robust C# namespace recovery when parsing errors occur inside namespace bodies (resolving issue #115). It introduces a trivia-aware brace matcher (csharpFindMatchingBraceByte) that correctly handles comments, strings, and char literals, and adds a fallback mechanism to construct best-effort namespace declarations from error roots. Feedback on the PR points out a bug in the brace matcher where verbatim interpolated strings using the @$"..." syntax are not correctly identified, which could lead to parsing issues, and provides a code suggestion to resolve it.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
The trivia-aware brace matcher and the top-level chunk scanner only flagged
a string as verbatim when '@' immediately preceded the opening quote, missing
verbatim interpolated strings written as @$"..." (C# 8+), where the quote is
preceded by '$' and then '@'. In that case "" was treated as two separate
strings and backslashes as escapes, which could let braces inside the string
be miscounted and truncate a recovered declaration span.
Detect both @"... / $@"... ('@' before the quote) and @$"... ('@$' before the
quote). Add a verbatim-string recovery regression test.
Addresses review feedback on odvcencio#116.
|
heya, thanks for the contribution- just a small note, im going to merge this as a temporary pain relief while i attempt to revisit the machinery to not have to require (any) normalization or recovery paths per grammar. thanks again! |
Summary
Fixes #115. Large real-world C# files (e.g. Newtonsoft.Json's
JsonReader.cs) could parse into a single top-levelERRORnode, so a tree walk found 0class_declaration/method_declarationnodes. Every individual construct parses fine in isolation — it's a cumulative GLR parse failure — so the fix is in the C# recovery normalizers, not the grammar blob.Root cause
normalizeCSharpRecoveredNamespaceswas all-or-nothing:csharpRecoverNamespaceNodeFromRangeonly succeeded when the whole namespace body re-parsed cleanly. With any internal error it gave up.normalizeCSharpRecoveredTypeDeclarationspass then bailed on the leadingpreproc_regionchild.'{'/'}').Changes
parser_result_csharp_namespace.go): when the clean extraction fails, synthesise anamespace_declarationfrom the sub-parse that was already produced (no extra reparse) and recover its body declarations leniently, so the type declarations and the members that parsed still surface.parser_result_csharp_type_body.go): thread alenientflag so an unrecoverableERRORfragment is skipped instead of failing the whole enclosing type; keep nested namespaces as body children.parser_result_csharp_helpers.go):csharpFindMatchingBraceByteskips braces inside line/block comments, regular/verbatim strings, and char literals. Existing C# recovery call sites now use it.parser_result_csharp.go): permit preprocessor directive nodes so a recovered root can retag tocompilation_unit.Bounding
Respects the prior recovery-cost work (#64, #98, #106): the fallback reuses the sub-parse already produced (no new namespace reparse), does a single
O(children)pass, and leaves the 4096-byte method-body reparse gate and thecsharpMaxNamespaceRecoveriescap in place. The existing boundedness tests stay green.Result
JsonReader.csERROR, 0 declscompilation_unit, class + 13 propertiesJsonTextWriter.csERROR, 0 declscompilation_unit, class + 11 methodsJsonTextReader.csERROR, 0 declscompilation_unit, namespace recoveredRecovery is best-effort: declarations whose header is shredded into
ERRORfragments by the GLR failure (e.g.JsonTextReader's class) are not fully reconstructed — that would need source-based reconstruction and is intentionally out of scope here to keep the change bounded. The catastrophic whole-file collapse is gone in all cases.Tests
parser_csharp_namespace_recovery_test.go(namespace-with-internal-error recovers declarations; char-literal braces don't truncate the span; clean namespaces are unaffected).go test ./...is green, including the C# boundedness suite.