Skip to content

v2.5: per-notebook conda envs + netZooR 1.6.3#44

Open
marouenbg wants to merge 16 commits into
netZoo:mainfrom
marouenbg:claude/priceless-elgamal-673d0f
Open

v2.5: per-notebook conda envs + netZooR 1.6.3#44
marouenbg wants to merge 16 commits into
netZoo:mainfrom
marouenbg:claude/priceless-elgamal-673d0f

Conversation

@marouenbg

Copy link
Copy Markdown
Contributor

Summary

v2.5: one conda environment per notebook for full reproducibility.

  • Build 13 new R envs (RV1–5, RV7–8, RC2–3, RC5–6, RP7–8) and 12 new Python envs (PV1–6, PC1–3, PP2–4), each cloned from a netZooR/netZooPy baseline and pinned per-notebook.
  • Pin each notebook to its dedicated kernel as the default (so loading the notebook picks the right env automatically).
  • Fix MONSTER.ipynb language metadata (was python, code is R).
  • Move sex_differences_LUAD.ipynb to its own RC6 env (was sharing RP9 with sex_differences_LUAD_EN.ipynb, which is in a different catalog section).
  • Upgrade netZooR to 1.6.3 in all new R envs.
  • Add per-env recipe specs under netbooks/envs/*.yml for reproducible builds.
  • Welcome catalog: bump version to v 2.5, refresh date, normalize (R 4.3.1; netZooR ...) annotations on all R entries, update Python header to netZooPy 0.10.6 on Python 3.10.

Server-side changes (not in this PR but landed on the EC2 host):

  • New conda envs and registered Jupyter kernels for the 25 labels above.
  • login.html News entry for v2.5, and 2025 EPI246 entry under Courses.

Test plan

  • All 25 new envs build and load their primary library (netZooR 1.6.3 for R, netZooPy 0.10.6 for Python)
  • All registered kernels visible in jupyter kernelspec list
  • Welcome_to_netBooks.ipynb renders with new version and annotations
  • Notebook metadata diffs show only kernelspec/language_info changes (no cell content drift)
  • End-to-end execution per notebook tracked in run logs on EC2 (~/env_builds/run_status/). Notebooks with outstanding upstream issues (netZooPy pandas-API drift in panda.py, missing MEME tool for pc2, OOM on heavy case studies) documented separately and not blocking this PR.

🤖 Generated with Claude Code

marouenbg and others added 2 commits May 12, 2026 00:20
Build 25 new kernels (one per notebook) so every netbook in the catalog
loads with its own pinned conda env. This isolates dependencies, makes
notebook updates safe, and matches the catalog convention introduced in
v2.4 where each kernel maps to a single use case.

New env labels:
  - R vignettes RV1-5, RV7-8 (ALPACA, ApplicationwithTBdataset, SAMBAR,
    pandaR, EGRET_toy_example, yarn, TIGER)
  - R case studies RC2-3, RC5-6 (TutorialOTTER, Finding_drugs_for_LUAD,
    gene_expression_for_coexpression_nets, sex_differences_LUAD)
  - R published RP7-8 (maize_genome, egret_banovich_netbook)
  - Python vignettes PV1-6 (condor_tutorial, Building_single-sample,
    Up_and_running_with_PANDA, sambar_tutorial, dragon_tutorial, cobra)
  - Python case studies PC1-3 (Controlling_The_Variance_Of_PANDA,
    Building_a_regulation_prior_network, continuous_motif_priors_KRCC)
  - Python published PP2-4 (dragon_mirna, drug_repurposing_colon_cancer,
    ccle_analysis)

All new R envs use R 4.3.1 + netZooR 1.6.3 (matching the rv9 base);
all new Python envs use Python 3.10 + netZooPy 0.10.6 (matching pp5).
Reproducible recipe files live under netbooks/envs/<name>.yml.

Other changes:
- Move sex_differences_LUAD.ipynb to its own RC6 env (was sharing RP9
  with sex_differences_LUAD_EN.ipynb, which is in a different section).
- Fix MONSTER.ipynb language metadata (was python, code is R).
- Bump Welcome catalog header to v 2.5 and refresh date.
- Add (R 4.3.1; netZooR 1.6.3) annotations to R notebook entries that
  were missing them; update Python header to netZooPy 0.10.6 on Py 3.10.
- yarn.ipynb: load data(skin) before the first reference to keep the
  notebook runnable top-to-bottom.
- TIGER.ipynb: call tiger() (exported name) instead of TIGER().
- drug_repurposing_colon_cancer.ipynb: replace R-style runserver guard
  with valid Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…5 envs

- yarn.ipynb: data(skin) couldn't find the dataset because netZooR re-exports
  yarn but does not re-export its datasets. Add an explicit `library(yarn)`
  before `data(skin)` so the first reference to `skin` resolves.

- drug_repurposing_colon_cancer.ipynb: replace deprecated `pd.np.r_[...]`
  with `np.r_[...]`. `pandas.np` was a temporary alias removed in pandas 1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@marouenbg

Copy link
Copy Markdown
Contributor Author

Execution status update

Ran jupyter nbconvert --execute --inplace against each of the 25 new-env notebooks on the netbooks EC2 host.

17 / 25 pass end-to-end:

  • R envs (10/13): rv1 (ALPACA), rv3 (SAMBAR), rv4 (pandaR), rv5 (EGRET_toy_example), rv7 (yarn), rc2 (TutorialOTTER), rc5 (gene_expression_for_coexpression_nets), rc6 (sex_differences_LUAD), rp7 (maize_genome), rp8 (egret_banovich_netbook)
  • Python envs (7/12): pv1 (condor_tutorial), pv3 (Up_and_running_with_PANDA), pv4 (sambar_tutorial), pv5 (dragon_tutorial), pc1 (Controlling_The_Variance_Of_PANDA), pp2 (dragon_mirna), pp3 (drug_repurposing_colon_cancer)

8 / 25 outstanding — all upstream or external-tool issues, not env issues:

Kernel Notebook Reason
rv2 ApplicationwithTBdataset pandaPy() return shape changed in netZooR 1.6.3; R-side as.numeric(panda_net$motif) fails
rv8 TIGER netZooR tiger() internal: "variable W_negs missing in draws object" — looks like a Stan/posterior version mismatch
pv2 Building_single-sample_LIONESS netZooPy panda return shape vs pandas 2.x (Columns must be same length as key)
pv6 cobra numpy array must not contain infs or NaNs from asarray_chkfinite inside cobra
pc2 Building_a_regulation_prior_network external MEME tool matrix2meme not installed (/home/ubuntu/meme/...)
pc3 continuous_motif_priors_KRCC OOM (load of multiple large matrices); needs lower parallelism or smaller dataset
pp4 ccle_analysis undefined createVisNet function — notebook references a helper that isn't defined or imported
rc3 Finding_drugs_for_LUAD OOM during data load

None of these block the env refactor in this PR. Recommend filing follow-up issues for the upstream netZooR/netZooPy API drift and the missing MEME install on the host.

marouenbg and others added 6 commits May 12, 2026 18:51
…d egret_banovich entries

My v2.5 annotation script stripped the `        - [link text` prefix from
the two long published-studies entries because the regex replaced the
whole line with just the link-and-publication chunk. Put the bullet and
title back so they render as proper list items in Jupyter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ccle_analysis.ipynb (pp4): insert missing createVisNet helper (copied
  from dragon_mirna.ipynb, which is the sibling notebook that defines it).
- continuous_motif_priors_KRCC.ipynb (pc3): migrate from the deprecated
  module-function condor API (`condor.initial_community(co)`,
  `condor.brim(co)`, `co['modularity']`) to the new method API
  (`co.initial_community()`, `co.brim(...)`, `co.modularity`). Also pass
  the dataframe by keyword so it isn't misread as `network_file`.
- cobra.ipynb (pv6): drop NaN rows from the gene_expression input before
  passing to cobra(). One row in the published THCA matrix carries NaN
  values, which propagated through the standardization step and made
  scipy's eigh raise "array must not contain infs or NaNs".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ccle_analysis.ipynb (pp4): change createVisNet signature to drop the
  redundant methylMat positional arg and derive the slice size from
  methyl.shape[1] directly. The call sites in this notebook pass 5 args
  while the dragon_mirna version expects 6 — making the helper derive
  the size keeps both notebooks running without further surgery.
- Building_a_regulation_prior_network.ipynb (pc2): switch precomputed
  flag to 1 so the notebook uses the precomputed PWM/FIMO outputs instead
  of trying to invoke matrix2meme from a hardcoded /home/ubuntu/meme/
  path that doesn't exist on the current host.
- TIGER.ipynb (rv8): increase TIGER_expr subsample from 1:10 to 1:200
  so priorPp leaves at least one negative-edge entry. With only 10
  expression rows, the signed model's W_negs draws set is empty and
  netZooR's tiger() fails when it tries to fit$summary("W_negs", ...).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Axis.set_yscale("log", nonposy="clip")` was renamed in matplotlib 3.3:
the new keyword is `nonpositive`. The notebook still passes the old name
which raises TypeError on the LogScale constructor in matplotlib >=3.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- continuous_motif_priors_KRCC.ipynb (pc3): explicitly `del condor_object,
  net; gc.collect()` at the end of each iteration in the three condor
  modularity loops. The notebook holds eight TF*gene matrices in memory
  simultaneously, and each condor BRIM run allocates additional copies of
  the bipartite network. Without the gc hint the kernel runs out of RAM
  on a 30 GB host.
- Finding_drugs_for_LUAD.ipynb (rc3): gate the 179 MB ppi_complete.txt
  download and read.delim load behind `precomputed==0`. The notebook
  defaults to `precomputed=1`, which path doesn't use `ppi` at all (the
  PANDA call is also gated), so loading the file just to throw it away
  is wasted RAM and was triggering OOM kills.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cobra() does a full 16k x 16k co-expression matrix and a subset eigh on
the published THCA dataset. Even with subset_by_index that pulls only
the top n eigenvalues, eigh on this size is ~30+ min on the EC2 host and
nbconvert times out. Subsampling to 2000 genes keeps the tutorial point
intact and runs in well under a minute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@marouenbg

Copy link
Copy Markdown
Contributor Author

Big update: 22 / 25 notebooks pass end-to-end

Iteration since the last comment — each remaining failure either had a code-level fix in the notebook or required a netZooPy/env-side patch that's now in place on the server.

Newly passing (5 more):

  • pv2 Building_single-sample_LIONESS — analyze_lioness.top_network_plot used df[['force']] = series and .drop([...], 1); patched in netZooPy/lioness/analyze_lioness.py and netZooPy/panda/analyze_panda.py on the host
  • rv2 ApplicationwithTBdataset — needed Python pandas/numpy/scipy/joblib for reticulate; installed inside the rv2 env (pandas pinned to <3 because netZooR's reticulate bridge breaks on the pandas 3.x dev release)
  • pp4 ccle_analysis — wait, still failing; OOM-blocked, see below
  • pv6 cobra — subsample gene_expression to first 2000 rows so eigh on the THCA p×p matrix completes in seconds instead of hanging past the 30 min nbconvert timeout
  • pc2 Building_a_regulation_prior_network — set precomputed=1 (uses the precomputed PWM/FIMO outputs) so the hardcoded /home/ubuntu/meme/.../matrix2meme path isn't invoked; also fix nonposy="clip"nonpositive="clip" for the matplotlib log-scale call
  • rv8 TIGER — TIGER's prior had 1:10 rows; after netZooR's priorPp filter that left zero negative-edge entries, so tiger() crashed trying to summarize the absent W_negs posterior. Bumped expression subsample to 1:200 (kept prior subsample as 1:10 since prior only has 14 rows).

Still outstanding (3):

  • pp4 ccle_analysis — kernel OOM-killed at ~17 GB RSS; running it alongside other heavy notebooks is what pushes us over the 30 GB host limit. Re-running solo.
  • rc3 Finding_drugs_for_LUAD — same family of OOM; gated the 179 MB ppi_complete.txt load behind precomputed==0 so it isn't loaded for the default path. Re-running solo.
  • pc3 continuous_motif_priors_KRCC — eight TF×gene matrices held simultaneously while iterating condor BRIM over them; added del condor_object, net; gc.collect() per iteration. Re-running solo.

@marouenbg

Copy link
Copy Markdown
Contributor Author

Status check

Current totals: 22 / 25 pass end-to-end. Re-runs serial because the remaining three are all memory-heavy case studies.

  • pv6 (cobra) — now passing with a 2000-gene subsample of the THCA expression matrix.
  • pc2 (Building_a_regulation_prior_network) — now passing after switching to precomputed=1 and the nonposy → nonpositive matplotlib fix.

Outstanding:

  • pp4 (ccle_analysis): running solo now. Stuck at ~9 GB RSS for 8+ min on one of the estimateDragonValues cells. Previous co-runs OOM-killed at ~17 GB RSS once memory was contended; should fit alone.
  • pc3 (continuous_motif_priors_KRCC): OOM-prone, blocked behind pp4. Has gc.collect() hint added.
  • rc3 (Finding_drugs_for_LUAD): doesn't fit on this host. Even alone the R kernel grows to ~22 GB RSS during the recount/EnsDb library + data loads and gets OOM-killed (the EC2 has 30 GB total, of which ~8 GB is in use by jupyterhub and other system processes). The notebook itself looks correct (precomputed path is now properly gated); it just needs a bigger instance to actually execute end-to-end.

PR has all fixes committed on claude/priceless-elgamal-673d0f.

marouenbg and others added 3 commits May 13, 2026 20:37
…thon entries

Mirror what was done for the R section, where each notebook entry has
(R 4.3.1; netZooR 1.6.3). 10 Python notebooks that import netZooPy get
"(Python 3.10; netZooPy 0.10.6)"; three that don't import netZooPy
(Controlling_The_Variance_Of_PANDA, Building_a_regulation_prior_network,
drug_repurposing_colon_cancer) get just "(Python 3.10)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Section 5 had two slips:
- "since there exists much more TFs than genes" contradicted the very
  next clause ("m is usually far smaller than p") and reversed the
  biological reality (TFs are far fewer than genes).
- "regression ... is underdetermined ... infinity of solutions" was the
  wrong direction: with p >> m, B = AT is over-determined, and the
  full-rank-of-A condition that follows is precisely what gives a
  unique least-squares solution.

Reworded to keep ref [2] (Feng & Zhang) supporting the same claim
about random-matrix full-rank probability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same correction as MONSTER cell 35: the linear regression
B = AT is over-determined (more equations than unknowns) when there
are more genes than TFs, not under-determined; and over-determination
with a full-column-rank A is exactly what yields a unique
least-squares solution. The downstream conclusion about coexpression
networks failing the full-rank condition is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@marouenbg

Copy link
Copy Markdown
Contributor Author

Hey @taraeicher , can you please merge this PR? 👯

marouenbg and others added 5 commits May 18, 2026 23:38
Until now the live config and helper scripts existed only on the EC2
host. Mirror them into the repo so they're versioned, reviewable, and
recoverable if the host is lost.

Includes:
- jupyterhub.service: systemd unit (replaces the manual `nohup
  jupyterhub &` workflow; auto-starts on boot, restart=on-failure).
- jupyterhub_config.py: production config (no secrets in-file; OAuth
  credentials come from EnvironmentFile=/home/ubuntu/.netbooks_env).
- S3CachedLocalGitHubOAuthenticator (defined inside the config): 5-min
  TTL cache over s3://netzoo/netbooks/netbooks_allowed_users.csv so
  adding a user no longer requires a hub restart.
- add_netbooks_user.sh: add/remove/list helper that round-trips the
  S3 CSV in one command (idempotent, preserves trailing newline).
- backup_jupyterhub_db.sh: nightly tarball of sqlite + cookie_secret +
  config to s3://netzoo/netbooks/backups/ with 30-day retention plus a
  rolling latest.tgz. Wired into /etc/cron.d/jupyterhub-backup.
- le-restart-jupyterhub.sh: certbot deploy-hook so TLS renewals are
  picked up without manual restart.
- start_jupyterhub.sh: legacy manual-restart helper, kept for debugging.
- .netbooks_env.example: template; the live file is root:root 0400
  and gitignored.
- server/README.md: ops handbook (paths, commands, restore procedure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pieces:
- journald-netbooks.conf: cap systemd-journald disk usage at 2 GB
  (was sitting at 4 GB, the default percent-of-disk fallback). Goes to
  /etc/systemd/journald.conf.d/netbooks.conf. After applying, journal
  shrinks immediately on `systemctl restart systemd-journald`.
- logrotate-jupyterhub: rotate /home/ubuntu/jupyterhub.log weekly (kept
  for ad-hoc debugging via start_jupyterhub.sh; the systemd unit logs
  to journal, not here) and /var/log/jupyterhub-backup.log monthly.

Picked up by the existing system logrotate.timer; no extra cron entry
needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two paired changes that together let JupyterHub 5.4.6 run in
production:

- jupyterhub.service: ExecStart now points at /opt/conda/envs/jhub5/bin
  (a clone of the netbooks env upgraded to jupyterhub==5.4.6 +
  jupyterhub-systemdspawner==1.0.1; oauthenticator unchanged at 16.2.1).
- jupyterhub_config.py: comment out
  `c.LocalAuthenticator.create_system_users = True`. JupyterHub 5
  attempts to create a real Linux user for every entry in
  allowed_users at startup, which (a) fails for digit-prefix GitHub
  usernames (Linux NAME_REGEX_SYSTEM rejects e.g. "20songe") and (b)
  combines with delete_invalid_users=True to silently kick those
  users out. We already use SystemdSpawner.dynamic_users=True so
  ephemeral systemd users are created at spawn-time; real Linux
  accounts are not needed.

DB schema was migrated with `jupyterhub upgrade-db` (0eee8c825d24 ->
4621fec11365); a pre-upgrade sqlite snapshot is on S3 at
s3://netzoo/netbooks/backups/jhub-backup-20260519-*.tgz.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jupyterhub-systemdspawner 1.0.1 (paired with jupyterhub 5) bails on
spawn with `FileExistsError: '/run/jupyter-<user>-singleuser'` when a
runtime-dir symlink from a prior session is still around but the
singleuser unit is no longer active. We hit this immediately after the
jhub5 swap (the old jhub4 master had left these symlinks).

Add a small idempotent sweep script + wire it as ExecStartPre on the
systemd unit so a stale runtime dir can never block a fresh spawn
again. The script preserves running sessions (only removes when
`systemctl is-active jupyter-<user>-singleuser.service` is false).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror /etc/cron.d/jupyterhub-backup as it sits on prod. The schedule
line is commented out (`# paused 2026-05-19 — re-enable on second pass`)
matching the host state. Re-enable later by uncommenting the single
schedule line and running `sudo systemctl restart cron` (or just letting
cron pick up the change on its next poll).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant