Skip to content

cc: Fix dirty VACB list corruption causing FAST_FAIL_CORRUPT_LIST_ENTRY crashes#5

Closed
Copilot wants to merge 2 commits into
masterfrom
copilot/random-crash-debugging
Closed

cc: Fix dirty VACB list corruption causing FAST_FAIL_CORRUPT_LIST_ENTRY crashes#5
Copilot wants to merge 2 commits into
masterfrom
copilot/random-crash-debugging

Conversation

Copilot AI commented Jun 2, 2026

Copy link
Copy Markdown

Random BugCheck 0x139 (FAST_FAIL_CORRUPT_LIST_ENTRY) crashes during CcRosUnmarkDirtyVacb → RemoveEntryList caused by corrupted DirtyVacbListEntry Flink/Blink pointers. Three bugs in dirty VACB tracking:

Double-insert into dirty list (primary crash)

CcRosMarkDirtyVacb only had ASSERT(!Vacb->Dirty) — compiled out in release. Race:

  1. CcRosFlushVacb calls CcRosUnmarkDirtyVacbDirty=FALSE, VACB removed from list
  2. Concurrent writer calls CcRosMarkDirtyVacb → VACB inserted at tail, Dirty=TRUE
  3. MmFlushSegment fails → CcRosMarkDirtyVacb called again → same node inserted twice

Second InsertTailList corrupts the previous tail's Flink without unlinking first. Next RemoveEntryList on a neighbor fails RtlpCheckListEntryFAST_FAIL.

Fix: CcRosMarkDirtyVacb now checks Vacb->Dirty under the spinlock and returns early if already dirty.

Double-unmark (counter underflow / extra refcount drop)

CcRosUnmarkDirtyVacb only had ASSERT(Vacb->Dirty). CcFlushCache reads vacb->Dirty after CcRosLookupVacb releases its locks; lazy writer can flush the same VACB in the window, causing CcTotalDirtyPages/DirtyPages unsigned underflow and a spurious CcRosVacbDecRefCount.

Fix: CcRosUnmarkDirtyVacb now returns BOOLEAN and early-returns FALSE (under the lock) if Vacb->Dirty is already clear.

Unconditional re-mark on flush failure

CcRosFlushVacb always called CcRosMarkDirtyVacb on failure regardless of whether it had actually unmarked anything, compounding the double-insert risk.

Fix: CcRosFlushVacb captures the BOOLEAN from CcRosUnmarkDirtyVacb and only re-marks dirty on failure when WasMarked == TRUE.

Suspicious code annotated

Lockless vacb->Dirty reads in CcFlushCache and CcRosReleaseVacb are now commented explaining the races and why the new guards make them safe.

- CcRosMarkDirtyVacb: guard against double-insertion (primary crash fix)
- CcRosUnmarkDirtyVacb: return BOOLEAN, guard against double-unmark
- CcRosFlushVacb: only re-mark dirty on failure if we were the one who unmarked
- Add comments to suspicious lockless Dirty reads in CcFlushCache and CcRosReleaseVacb
Copilot AI changed the title cc: Fix dirty VACB list corruption and race conditions cc: Fix dirty VACB list corruption causing FAST_FAIL_CORRUPT_LIST_ENTRY crashes Jun 2, 2026
Copilot AI requested a review from tkreuzer June 2, 2026 18:52
Comment thread ntoskrnl/cc/view.c
PROS_SHARED_CACHE_MAP SharedCacheMap = Vacb->SharedCacheMap;

CcRosUnmarkDirtyVacb(Vacb, TRUE);
/*

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Please explain better what "for correctness" means here. What could happen, if we bail out, when the VACB was not marked dirty at this point?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the latest commit. The key point: if we bailed out early when WasMarked == FALSE, we'd return STATUS_SUCCESS to the caller without having verified that the section pages were actually written. The concurrent flush that beat us may have failed, meaning dirty data is still in the section. By always calling MmFlushSegment regardless of WasMarked, we get a reliable write status to return to the caller. We just must not re-mark dirty on failure in that case, since we were not the one who removed the VACB from the dirty list.

Comment thread ntoskrnl/cc/view.c
Comment on lines +620 to +626
/*
* NOTE: Vacb->Dirty is read here without holding the spinlock as an
* optimisation to avoid an unnecessary lock acquisition. The check may
* race with a concurrent CcRosUnmarkDirtyVacb; however CcRosMarkDirtyVacb
* re-checks Dirty under the lock and is a safe no-op if the VACB has
* already been re-marked dirty by another thread in the meantime.
*/

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot What would happen here, if the VACB is unmarked dirty right after this check? Is that still a safe path? Add this to the comment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the latest commit. If Vacb->Dirty is TRUE at the check (so we skip CcRosMarkDirtyVacb) and a concurrent flush then immediately unmarks it, that flush's MmFlushSegment call covers all dirty section pages — including those written by the caller of this function before CcRosReleaseVacb was invoked. Should that flush fail, CcRosFlushVacb re-marks the VACB dirty via the WasMarked == TRUE path, ensuring a retry. No dirty data can be silently lost through this race.

Copilot AI requested a review from tkreuzer June 8, 2026 07:25
@tkreuzer tkreuzer closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants