perf: O(log n) priority_queue delete via in-element index tracking (#263)#401
Draft
bushidocodes wants to merge 1 commit into
Draft
perf: O(log n) priority_queue delete via in-element index tracking (#263)#401bushidocodes wants to merge 1 commit into
bushidocodes wants to merge 1 commit into
Conversation
…ng (#263) The min-heap priority_queue located elements for deletion with an O(n) linear scan. This makes delete O(log n) by having each element store its own 1-based heap index, which the queue keeps in sync on every move (append, swaps during percolate up/down, dequeue, delete). delete then reads the element's slot directly and repairs the heap in O(log n). Mirrors the approach in the composite codebase referenced in the issue. - priority_queue.h: * Add an optional `get_index_fn` accessor (returns a pointer to the element's in-struct size_t index slot). NULL preserves the legacy O(n) linear-scan delete, so the change is opt-in per queue and behavior-preserving where not wired up. * Maintain the index slot on every element move via small record/clear/swap helpers; generalize percolate_up into percolate_up_from(start_index). * Rewrite delete_nolock: O(1) lookup via the slot (with a defensive items[i]==value validation that still returns -1 for absent elements), then repair the heap. Also fixes a latent correctness bug: arbitrary-position delete must percolate the replacement UP or DOWN, not down-only — the old code could leave the heap invalid when the moved last element was smaller than the hole's parent. - Add a `size_t pq_idx` slot + accessor to the four element types that are delete()'d (sandbox, tenant_global_request_queue, perworker_tenant_sandbox_queue, tenant_timeout) and wire the accessor into the 7 queues that call delete. Queues that are only enqueued/dequeued (global_request_scheduler_minheap, tgrq->sandbox_requests) keep the NULL/legacy path. Each tracked element is a member of at most one tracked queue at a time, so a single slot is sufficient. Verified with a deterministic unit test against the real header (heap-invariant checks, the percolate-up-on-delete case, absent-element delete, and 200k random ops proving the tracked O(log n) path is behaviorally identical to the legacy O(n) path), clean under ASan+UBSan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
@emil916 - I'm pretty fried and don't have the braincells to review this closely and compare to the composite link you posted way back when, but here's an Opus 4.8 attempt at what you want with some preliminary perf numbers. I'm leaving this as a draft because I haven't reviewed this as closely. You might be more familiar here and be able to determine if this is an improvement a bit faster. |
Contributor
|
I have already implemented this in my feat branch. I thought this was already merged. I remember there were some complications when I did it. So, it is good to have another view. I'll look at this later, as well. Thank you, Sean! |
Contributor
Author
|
@emil916 - Feel free to close this out if it's a dupe of what you're going to merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #263.
What
The min-heap
priority_queuelocated elements for deletion with an O(n) linear scan (priority_queue_delete_nolock). This makes delete O(log n): each element stores its own 1-based heap index, the queue keeps that slot in sync on every move, anddeletereads it directly and repairs the heap in O(log n). This is the approach from the composite codebase referenced in the issue — and thelocal_runqueue_mtds.cdelete path even carried aTODO: Apply the composite way of PQ deletion O(logn).Design
priority_queue_initializetakes a newget_index_fnaccessor that returns a pointer to the element's in-structsize_tindex slot. PassNULLto keep the legacy O(n) linear-scan delete — so queues that are neverdelete()'d from (global_request_scheduler_minheap,tgrq->sandbox_requests) are untouched and behavior-preserving.append, swaps in percolate up/down,dequeue,delete) via smallrecord/clear/swaphelpers;percolate_upis generalized topercolate_up_from(start).size_t pq_idxslot + accessor to the four element types that aredelete()'d (sandbox,tenant_global_request_queue,perworker_tenant_sandbox_queue,tenant_timeout) and wired the accessor into the 7 queues that call delete. Each tracked element is a member of at most one tracked queue at a time, so a single slot suffices.Correctness
Deterministic unit test against the real header (heap-invariant assertions, the percolate-up-on-delete case the old code got wrong, absent-element delete returning −1, and 200k random enqueue/delete/dequeue ops proving the tracked O(log n) path is behaviorally identical to the legacy O(n) path). Clean under ASan + UBSan.
Scheduler validation (in the dev container)
common/*.env)traps + stack_overflow sweep MTDS, exercising all four tracked element types' delete paths (sandbox, tgrq, pwt, tenant_timeout).
Microbenchmark — changed algorithm vs existing O(n)
Standalone harness, container clang-13
-O3, timing delete-all-in-random-order at heap size N (best of 9).old O(n)is the true pre-change header;new O(logn)is this PR.Indexed delete stays ~flat (~21→40 ns) as the heap grows; the O(n) scan grows linearly (18→441 ns). Crossover ≈ N=128: for tiny heaps the index-maintenance constant makes it marginally slower, but it wins increasingly past that — up to ~11× at 4096.
Mapping to real queue sizes: the MTDS tenant queues are small (
RUNTIME_MAX_TENANT_COUNT=32, ~neutral), but the local runqueue (RUNTIME_RUNQUEUE_SIZE=256, grows) and per-tenant sandbox queues (RUNTIME_TENANT_QUEUE_SIZE=4096) get deep under load — exactly where the old O(n) delete hurt. Beyond average throughput, this bounds worst-case delete latency (O(n) on a full queue is ~440 ns and climbs with load; O(log n) stays ~40 ns), which helps scheduler tail-latency/predictability under overload.🤖 Generated with Claude Code