feat(enrichr): retrieve gene sets incl. MSigDB collections (#139)#241
feat(enrichr): retrieve gene sets incl. MSigDB collections (#139)#241Elarwei001 wants to merge 6 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #241 +/- ##
==========================================
+ Coverage 56.70% 57.26% +0.55%
==========================================
Files 29 29
Lines 9392 9469 +77
==========================================
+ Hits 5326 5422 +96
+ Misses 4066 4047 -19 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Add `gget.enrichr_library()` (CLI: `gget enrichr --get_library`) to fetch the
gene sets (members) of any Enrichr gene-set library — the recommended way to
retrieve MSigDB gene sets (e.g. MSigDB_Hallmark_2020) without MSigDB login.
- Returns a long-format DataFrame (gene_set, gene), or a {gene_set: [genes]}
dict with json=True. `gene_set=` returns a single set; `species` selects the
non-human Enrichr variants.
- CLI: new --get_library/-gl and --gene_set/-gs; genes/--database made optional
in library mode (still enforced for enrichment). Backward compatible.
- Detects Enrichr's HTML-404 (HTTP 200) response for unknown libraries.
- Tests + fixtures (live Enrichr) and docs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a network-free TestEnrichrLibraryOffline class that mocks requests to cover enrichr_library: invalid species, verbose logging, blank-line parsing, bad/empty library errors, gene_set filter + not-found, and the json/json+save/CSV-save branches. All PR-added lines now covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
69cc3ea to
32105d7
Compare
…ests Implements the follow-up improvements flagged in review: - enrichr_libraries(): list available Enrichr gene-set libraries (via the datasetStatistics endpoint), with an optional substring filter (e.g. "MSigDB"); CLI `gget enrichr --list_libraries [FILTER]`. - enrichr_library(descriptions=True): also return each gene set's description (adds a 'description' column / nested dict); CLI `--descriptions`. - Harden the de-facto-live library tests: skip (not fail) on network errors or a transient non-data Enrichr response; add network-free unit tests for enrichr_libraries and the descriptions option. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Also add the get_library/gene_set library docs + example to the Spanish page (the original library feature had only been documented in English).
Add MSigDB references (Subramanian 2005, Liberzon 2011/2015, and Castanza 2023 for mouse MSigDB) to the References section of the en/es enrichr docs, following gget's per-source citation convention now that enrichr can retrieve MSigDB collections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
enrichr_libraries rejected species="mouse" because it looked the value up directly in DATASETSTATISTICS_ENRICHR_URLS, which has no "mouse" key. enrichr_library and enrichr already treat mouse as the human/Enrichr variant, so align enrichr_libraries: accept mouse and query the human endpoint. Add an offline test asserting the human URL is used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Hi @lauraluebbert — quick summary of #241 (resolves #139):
btw, tests are network-free where possible; the live ones skipTest (not fail) on transient Enrichr/Ensembl errors, while still asserting real errors like bad library names. Happy to change anything — would appreciate a review when you have a moment. Thanks! |
Resolves #139
What this adds
gget enrichrcan now fetch the gene sets behind an Enrichr library — so users can pull MSigDB collections (Hallmark, Oncogenic, Computational, …) directly, not just run enrichment.enrichr_library(name)— member genes of any library (e.g.MSigDB_Hallmark_2020) as a long-formatgene_set/genedata frame.gene_set=returns a single set;descriptions=Trueadds each set's description. CLI:--get_library/-gl,--gene_set,--descriptions/-desc.enrichr_libraries(species="human", filter=None)— list the available libraries (name, term count, gene coverage) to discover what can be fetched. CLI:--list_libraries/-ll.Existing enrichment behaviour is unchanged.
Testing
Extended
tests/test_enrichr.pywith offline tests (mocked library parsing, descriptions, and listing) plus live tests that skip on transient upstream errors. Docs updated (en + es).