test(ci): parallelize the test suite with pytest-xdist [WIP]#262
Closed
Elarwei001 wants to merge 1 commit into
Closed
test(ci): parallelize the test suite with pytest-xdist [WIP]#262Elarwei001 wants to merge 1 commit into
Elarwei001 wants to merge 1 commit into
Conversation
CI runs ~29 min per Python version, dominated by network-bound live tests (test_virus alone is ~64% of the runtime because each filter test downloads a broad NCBI taxon). These tests are I/O-bound, so running them concurrently should cut wall-clock substantially without changing what is tested. - Add pytest-xdist to the test dependency group and run hatch-test with `-n auto`. - Isolate the test_virus output directory per test (tempfile.mkdtemp) instead of a single shared "test_virus_output" dir, which would otherwise race under parallel workers. WIP: opened to measure the real CI speedup before tuning (worker count, optional NCBI API key). All tests remain live — no semantic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #262 +/- ##
==========================================
+ Coverage 56.70% 56.83% +0.12%
==========================================
Files 29 29
Lines 9392 9392
==========================================
+ Hits 5326 5338 +12
+ Misses 4066 4054 -12 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
Author
|
Closing this experiment. The parallel run gave only a modest speedup (py3.12 ~28→23 min, py3.13 ~26→24 min) and py3.14 failed — most of the wall-clock is a handful of network-bound live tests ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
CI runs ~29 min per Python version (py3.12/3.13/3.14), dominated by network-bound live tests. From a local
--durationsrun,test_virusalone is ~64% of the runtime: eachtest_virus_with_*_filterdownloads a broad NCBI taxon ("Zika virus", thousands of genomes) server-side-filtered, so every test re-fetches a large package (75–195 s each). These tests are I/O-bound (waiting on NCBI/Ensembl), so running them concurrently should cut wall-clock a lot without changing what is tested.Change
pytest-xdistto the test dependency group; run hatch-test with-n auto.test_virusoutput directory per test (tempfile.mkdtemp) instead of a single shared"test_virus_output"dir — the shared dir (rmtree + recreate insetUp) would race across parallel workers. This is the only fixed shared output dir in the suite.All tests remain live — no mocking, no semantic change; only the output-dir isolation and the parallel runner.
Local check
4 virus filter tests run concurrently (
-n 4) pass with no races; ~70–90 s serial → 31 s parallel.Open questions (why WIP)
-nor add anNCBI_API_KEYsecret (thevirusmodule already supportsapi_key).🤖 Generated with Claude Code