Skip to content

Fix SqlTransaction race condition causing 500 on client disconnect in bundles#5553

Draft
mikaelweave wants to merge 1 commit intomainfrom
agents/incident-repair-unassigned-sprint
Draft

Fix SqlTransaction race condition causing 500 on client disconnect in bundles#5553
mikaelweave wants to merge 1 commit intomainfrom
agents/incident-repair-unassigned-sprint

Conversation

@mikaelweave
Copy link
Copy Markdown
Contributor

Summary

Fixes Bug #188321: Batch/Transaction bundles fail with 'SqlTransaction has completed; it is no longer usable' race condition.

When a client disconnects mid-request during a sequential transaction bundle, the server was returning HTTP 500 instead of HTTP 408 (Request Timeout).

Root Cause

Two interacting bugs:

Bug A — Cancelled token in error catch block

SqlServerFhirDataStore.MergeInternalAsync passes an already-cancelled CancellationToken to MergeResourcesCommitTransactionAsync inside the error catch block. The cancelled token causes OpenAsync to throw OperationCanceledException immediately, masking the original exception and leaving server-side transaction state unclean.

Bug B — SqlTransactionScope.Dispose() zombie chain

SqlTransactionScope.Dispose() disposes SqlConnection before SqlTransaction. Closing the connection zombies the transaction, so the internal Rollback() call in SqlTransaction.Dispose() throws InvalidOperationException("This SqlTransaction has completed"). During C# using-block unwinding this exception replaces the active FhirTransactionCancelledException, which BundleHandler then catches and returns HTTP 500.

Fixes

  • Fix A (SqlServerFhirDataStore.cs): Pass CancellationToken.None to MergeResourcesCommitTransactionAsync in the catch block so the server-side abort always completes.

  • Fix B (BundleHandler.cs): Check cancellationToken.IsCancellationRequested in the IsCompletedTransactionException catch block and throw FhirTransactionCancelledException (408) instead of FhirTransactionFailedException (500) when the client cancelled.

  • Fix C — defense-in-depth (BundleHandler.cs, BundleHandlerParallelOperations.cs): Expand the transaction abort guard to fire on any 4xx/5xx status code, not only when entry.Response.Outcome != null. When a client disconnects mid-stream the response body is empty and Outcome is null, so the old guard would not abort the transaction.

Testing

  • Added unit test GivenATransaction_WithCancellationAndSqlTransactionZombied_ReturnsCancelledException covering the full failure chain: 2-entry transaction bundle + cancelled token + SqlTransaction zombie exception → asserts FhirTransactionCancelledException (408).
  • All 1282 existing unit tests pass (net8.0 + net9.0).

Related

ADO Bug #188321

… bundles

Bug #188321: Batch/Transaction bundles fail with 'SqlTransaction has
completed; it is no longer usable' race condition.

Root cause is two interacting bugs:

Bug A - SqlServerFhirDataStore.MergeInternalAsync passes an already-
cancelled CancellationToken to MergeResourcesCommitTransactionAsync in
the error catch block. The cancelled token causes OpenAsync to throw
OperationCanceledException immediately, masking the original exception
and leaving server-side transaction state unclean.

Fix A: Pass CancellationToken.None so the abort call always completes.

Bug B - SqlTransactionScope.Dispose() (healthcare-shared-components)
disposes SqlConnection before SqlTransaction. This zombies the
transaction, causing SqlTransaction.Dispose()'s internal Rollback() to
throw InvalidOperationException('This SqlTransaction has completed').
During C# using-block unwinding this replaces the active exception
(FhirTransactionCancelledException), which BundleHandler then catches
and returns HTTP 500 instead of HTTP 408.

Fix B: Check cancellationToken.IsCancellationRequested in the
IsCompletedTransactionException catch block and throw
FhirTransactionCancelledException (408) instead of
FhirTransactionFailedException (500) when the client cancelled.

Fix C (defense-in-depth): Expand the transaction abort guard in both
sequential (BundleHandler) and parallel (BundleHandlerParallelOperations)
paths to abort on any 4xx/5xx status code, not only when
entry.Response.Outcome != null (outcome is null when client disconnects
mid-stream and the response body is empty).

Also adds a unit test covering the full failure chain:
2-entry transaction bundle + cancelled token + SqlTransaction zombie
exception -> asserts FhirTransactionCancelledException (408).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mikaelweave mikaelweave requested a review from a team as a code owner May 5, 2026 22:07
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.39%. Comparing base (a894a3b) to head (b2be9bf).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5553      +/-   ##
==========================================
+ Coverage   77.02%   77.39%   +0.36%     
==========================================
  Files         983      985       +2     
  Lines       36007    36193     +186     
  Branches     5469     5501      +32     
==========================================
+ Hits        27736    28010     +274     
+ Misses       6927     6825     -102     
- Partials     1344     1358      +14     

see 27 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mikaelweave mikaelweave marked this pull request as draft May 7, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants