Skip to content

Fix ProForma writer for ranges bearing multiple modifications (§4.5)#116

Open
trishorts wants to merge 1 commit into
topdownproteomics:masterfrom
trishorts:fix/proforma-writer-multi-mod-range
Open

Fix ProForma writer for ranges bearing multiple modifications (§4.5)#116
trishorts wants to merge 1 commit into
topdownproteomics:masterfrom
trishorts:fix/proforma-writer-multi-mod-range

Conversation

@trishorts

Copy link
Copy Markdown

Problem

ProFormaWriter.WriteString throws ProFormaParseException: "Can't nest ranges within each other." whenever a term contains a sequence range that carries more than one modification — the ProForma 2.0 §4.5 construct (SEQ)[mod1][mod2]…. The parser accepts these strings, so they cannot currently be round-tripped: parsing succeeds but writing the resulting term fails.

Reproduce

var parser = new ProFormaParser();
var writer = new ProFormaWriter();
var term = parser.ParseString("PRT(ESFRMS)[Oxidation][Oxidation][half cystine][half cystine]ISK");
writer.WriteString(term);   // throws: Can't nest ranges within each other.

This also blocks real specification examples (e.g. the §4.5 insulin-style range bearing multiple oxidations and half-cystines).

Root cause

Each modification on a range is parsed as a separate ProFormaTag spanning the same (ZeroBasedStartIndex, ZeroBasedEndIndex) — the parser keeps the range open across consecutive )[mod][mod] until the next residue resets it. When the writer reaches the range, its inner loop scans the tags that fall within the range and unconditionally throws for any whose start != end. But a co-located range modification legitimately has start != end (it spans the whole range), so these were mistaken for nested ranges.

Fix

In the range branch of WriteString, a tag whose start and end equal the current range's start and end is treated as another modification on that same range and emitted as a consecutive [descriptor] after the range closes. Genuinely nested ranges (a different start != end inside the range) still throw as before.

After

PRT(ESFRMS)[Oxidation][Oxidation][half cystine][half cystine]ISK   // round-trips unchanged
SE(QUEN)[+14.05][Oxidation]CE                                       // two modifications on one range

Tests

  • WriteMultipleModificationsOnSameRange — constructs a range carrying two modifications and asserts SE(QUEN)[+14.05][Oxidation]CE.
  • RoundTripMultipleModificationsOnRange — parse→write round-trip of a four-modification range.
  • Full suite green: 235 passed, 7 skipped, 0 failed (net472 + net6.0).

🤖 Generated with Claude Code

https://claude.ai/code/session_01JNnpzBNwfnTbZxtZdtAzLG

The writer threw "Can't nest ranges within each other" for a range carrying more
than one modification (e.g. "(SEQ)[mod1][mod2]", ProForma 2.0 section 4.5). Each
modification on a range is parsed as a separate tag spanning the same start/end,
and the writer's internal-tag loop mistook those co-located modifications for
nested ranges.

Treat a tag whose start and end match the current range as another modification
on that range and emit it as a consecutive descriptor after the range closes;
genuine nested ranges still throw. Adds a constructed unit test and a parse/write
round-trip regression test; full suite green (235 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JNnpzBNwfnTbZxtZdtAzLG
@trishorts trishorts requested a review from rfellers June 19, 2026 16:09
@trishorts trishorts added the bug Something isn't working label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant