cc: Fix dirty VACB list corruption causing FAST_FAIL_CORRUPT_LIST_ENTRY crashes#5
cc: Fix dirty VACB list corruption causing FAST_FAIL_CORRUPT_LIST_ENTRY crashes#5Copilot wants to merge 2 commits into
Conversation
- CcRosMarkDirtyVacb: guard against double-insertion (primary crash fix) - CcRosUnmarkDirtyVacb: return BOOLEAN, guard against double-unmark - CcRosFlushVacb: only re-mark dirty on failure if we were the one who unmarked - Add comments to suspicious lockless Dirty reads in CcFlushCache and CcRosReleaseVacb
| PROS_SHARED_CACHE_MAP SharedCacheMap = Vacb->SharedCacheMap; | ||
|
|
||
| CcRosUnmarkDirtyVacb(Vacb, TRUE); | ||
| /* |
There was a problem hiding this comment.
@copilot Please explain better what "for correctness" means here. What could happen, if we bail out, when the VACB was not marked dirty at this point?
There was a problem hiding this comment.
Updated in the latest commit. The key point: if we bailed out early when WasMarked == FALSE, we'd return STATUS_SUCCESS to the caller without having verified that the section pages were actually written. The concurrent flush that beat us may have failed, meaning dirty data is still in the section. By always calling MmFlushSegment regardless of WasMarked, we get a reliable write status to return to the caller. We just must not re-mark dirty on failure in that case, since we were not the one who removed the VACB from the dirty list.
| /* | ||
| * NOTE: Vacb->Dirty is read here without holding the spinlock as an | ||
| * optimisation to avoid an unnecessary lock acquisition. The check may | ||
| * race with a concurrent CcRosUnmarkDirtyVacb; however CcRosMarkDirtyVacb | ||
| * re-checks Dirty under the lock and is a safe no-op if the VACB has | ||
| * already been re-marked dirty by another thread in the meantime. | ||
| */ |
There was a problem hiding this comment.
@copilot What would happen here, if the VACB is unmarked dirty right after this check? Is that still a safe path? Add this to the comment.
There was a problem hiding this comment.
Updated in the latest commit. If Vacb->Dirty is TRUE at the check (so we skip CcRosMarkDirtyVacb) and a concurrent flush then immediately unmarks it, that flush's MmFlushSegment call covers all dirty section pages — including those written by the caller of this function before CcRosReleaseVacb was invoked. Should that flush fail, CcRosFlushVacb re-marks the VACB dirty via the WasMarked == TRUE path, ensuring a retry. No dirty data can be silently lost through this race.
Random
BugCheck 0x139(FAST_FAIL_CORRUPT_LIST_ENTRY) crashes duringCcRosUnmarkDirtyVacb → RemoveEntryListcaused by corruptedDirtyVacbListEntryFlink/Blink pointers. Three bugs in dirty VACB tracking:Double-insert into dirty list (primary crash)
CcRosMarkDirtyVacbonly hadASSERT(!Vacb->Dirty)— compiled out in release. Race:CcRosFlushVacbcallsCcRosUnmarkDirtyVacb→Dirty=FALSE, VACB removed from listCcRosMarkDirtyVacb→ VACB inserted at tail,Dirty=TRUEMmFlushSegmentfails →CcRosMarkDirtyVacbcalled again → same node inserted twiceSecond
InsertTailListcorrupts the previous tail'sFlinkwithout unlinking first. NextRemoveEntryListon a neighbor failsRtlpCheckListEntry→FAST_FAIL.Fix:
CcRosMarkDirtyVacbnow checksVacb->Dirtyunder the spinlock and returns early if already dirty.Double-unmark (counter underflow / extra refcount drop)
CcRosUnmarkDirtyVacbonly hadASSERT(Vacb->Dirty).CcFlushCachereadsvacb->DirtyafterCcRosLookupVacbreleases its locks; lazy writer can flush the same VACB in the window, causingCcTotalDirtyPages/DirtyPagesunsigned underflow and a spuriousCcRosVacbDecRefCount.Fix:
CcRosUnmarkDirtyVacbnow returnsBOOLEANand early-returnsFALSE(under the lock) ifVacb->Dirtyis already clear.Unconditional re-mark on flush failure
CcRosFlushVacbalways calledCcRosMarkDirtyVacbon failure regardless of whether it had actually unmarked anything, compounding the double-insert risk.Fix:
CcRosFlushVacbcaptures theBOOLEANfromCcRosUnmarkDirtyVacband only re-marks dirty on failure whenWasMarked == TRUE.Suspicious code annotated
Lockless
vacb->Dirtyreads inCcFlushCacheandCcRosReleaseVacbare now commented explaining the races and why the new guards make them safe.