Skip to content

test(infra): migrate integration tests to explicit CleanupGuard cleanup#515

Merged
Detair merged 19 commits into
mainfrom
fix/cleanup-guard-test-flake
Apr 11, 2026
Merged

test(infra): migrate integration tests to explicit CleanupGuard cleanup#515
Detair merged 19 commits into
mainfrom
fix/cleanup-guard-test-flake

Conversation

@Detair

@Detair Detair commented Apr 11, 2026

Copy link
Copy Markdown
Owner

Summary

Eliminates the recurring CI flake where Rust integration tests intermittently time out at nextest's 120s slow-timeout. Three different tests (later 4) hit this on PRs #512, #513, and #514 — always reported as test #900/900 because nextest reports completion in finish-time order.

Root cause: CleanupGuard::drop (server/tests/integration/helpers/mod.rs) spawned a thread + new tokio runtime + .join().expect() with no timeout. When cleanup hangs (DB pool contention, slow DELETE, lock wait), the test process holds for 120s and gets killed.

Issue #509 documented the same anti-pattern in the setup tests with the same root cause analysis.

Layers

Layer 1 — Stopgap (1 commit, helpers/mod.rs): Bounded join in CleanupGuard::drop. 30-second polling join, then detach. Immediately removes the failure mode.

Layer 2 — Proper fix (1 commit + 16 migration commits, ~136 sites across 16 files): Added pub async fn cleanup(mut self) that consumes the guard and runs cleanup actions on the caller's existing tokio runtime. Migrated all 136 test sites to call guard.cleanup().await; before returning. The Drop fallback stays as a permanent safety net for tests that panic before reaching cleanup (RAII semantics) and prints a "dropped with N pending actions" warning so unmigrated tests are visible.

Layer 3 — Deferred: After ~2 weeks of clean main-branch CI with zero "dropped with N pending actions" warnings, a follow-up commit will replace the Drop fallback's eprintln with panic!() to permanently enforce the contract. Tracked in: #516.

Migration coverage

File Sites
workspaces.rs 24
filters_http.rs 17
governance.rs 12
guild_limits.rs 10
webhooks.rs 10
channel_pins.rs 10
connectivity_http.rs 8
custom_status.rs 6
setup_http.rs 6
messages_http.rs 6
media_processing.rs 6
bot_intents.rs 6
dm_http.rs 5
channels_http.rs 5
uploads_http.rs 3
setup_concurrent_http.rs 2
Total 136

Test plan

  • cargo build --tests -p vc-server clean
  • cargo clippy -p vc-server --lib -- -D warnings clean
  • cargo nextest run -p vc-server --lib → 430/430 passing
  • cargo fmt --check clean
  • Per-file grep -c "let mut guard" equals grep -c "guard.cleanup().await" across all 16 files (136/136)
  • CI Rust Tests passes 3 consecutive runs on this PR
  • No "dropped with N pending actions" warnings appear in CI logs

Refs

🤖 Generated with Claude Code

Detair and others added 19 commits April 11, 2026 21:19
CleanupGuard::drop currently calls .join() with no timeout. When test
cleanup hangs (DB pool contention, slow DELETE, lock wait, etc.), the
test process holds until nextest's 120s slow-timeout kills it.

Add a 30-second polling join: if cleanup completes within 30s, join
normally. Otherwise, log to stderr and return — the cleanup thread
continues detached but the test process is no longer blocked.

This is a stopgap. The proper fix (explicit .cleanup().await) is in
follow-up commits.

Refs #509 (same anti-pattern, different test set)
Tests should call .cleanup().await at the end of the body instead of
relying on Drop. The new method consumes self, runs all cleanup actions
on the caller's tokio runtime, and leaves Drop as a no-op.

The Drop fallback now also prints a 'dropped with N pending actions'
warning so the migration progress is visible. After all tests are
migrated and CI is green for ~2 weeks, the warning will be replaced
with a panic in a follow-up commit.

Refs spec docs/superpowers/specs/2026-04-11-security-audit-followups-design.md
10 test sites migrated. Includes the 2 known-flaky tests that triggered
the CleanupGuard CI fix:
- test_globally_banned_user_cannot_join_via_discovery
- test_channel_limit
6 test sites migrated. Includes the known-flaky test:
- test_custom_status_with_expiry_persists
- test_custom_status_clear_in_database
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Detair Detair merged commit ef9be18 into main Apr 11, 2026
18 checks passed
@Detair Detair deleted the fix/cleanup-guard-test-flake branch April 12, 2026 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant