Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/workflows/full-workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ jobs:
env:
SNAKEMAKE_CONDA_FRONTEND: mamba
GITHUB_PAT: ${{ github.token }}
TEST_WORKFLOW_PROFILE_ARGS: >-
--profile none
--workflow-profile none
--executor local
--cores 2
TEST_WORKFLOW_JOBS: "2"
TEST_WORKFLOW_FAKE_CELLBENDER_SOURCE: ${{ github.workspace }}/tests/reference_outputs/testdata/results/cellbender
TEST_WORKFLOW_SNAKEMAKE_ARGS: >-
--resources mem_mb=8192
--set-resources
Expand Down Expand Up @@ -62,6 +69,9 @@ jobs:
- name: Check out repository
uses: actions/checkout@v4

- name: Add test command shims to PATH
run: echo "${GITHUB_WORKSPACE}/tests/bin" >> "${GITHUB_PATH}"

- name: Create test environment
uses: mamba-org/setup-micromamba@v2
with:
Expand All @@ -73,7 +83,7 @@ jobs:
uses: actions/cache@v4
with:
path: .snakemake/conda
key: snakemake-conda-full-${{ runner.os }}-${{ hashFiles('workflow/envs/*.yml') }}
key: snakemake-conda-full-${{ runner.os }}-${{ hashFiles('workflow/envs/*.yml', 'workflow/envs/*.pin.txt') }}
restore-keys: |
snakemake-conda-full-${{ runner.os }}-
snakemake-conda-${{ runner.os }}-
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
uses: actions/cache@v4
with:
path: .snakemake/conda
key: snakemake-conda-${{ runner.os }}-${{ hashFiles('workflow/envs/*.yml') }}
key: snakemake-conda-${{ runner.os }}-${{ hashFiles('workflow/envs/*.yml', 'workflow/envs/*.pin.txt') }}
restore-keys: |
snakemake-conda-${{ runner.os }}-

Expand All @@ -58,7 +58,6 @@ jobs:
fail-fast: false
matrix:
env_name:
- cellbender.yml
- doubletfinder.yml
- emptydrops.yml
- posthocfilter.yml
Expand Down
10 changes: 6 additions & 4 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ GitHub Actions runs the default test suite via `.github/workflows/tests.yml`, us

The default test suite also includes a focused local rule-execution smoke test. It runs the real `tenx2seuratrds`, `find_markers`, and `combine_markers` rule chain against `testdata/`, using Snakemake's `--use-conda` support and writing outputs under pytest's temporary directory. This catches broken R package imports, script argument drift, invalid Seurat object creation, and marker CSV schema changes without submitting to SLURM.

The R output validator also checks that the Seurat object has at least 100 features and 100 cells; metadata rows match the cell count; barcode row names are present, unique, and nonempty; `orig.ident`, `nCount_RNA`, `nFeature_RNA`, `percent.mt`, and `seurat_clusters` metadata columns exist; RNA count and feature-count metadata values are finite and positive; mitochondrial percentages are finite and within `[0, 100]`; at least two clusters are present; PCA and UMAP reductions exist; the marker table is nonempty and has the expected columns; marker gene symbols are present and nonempty; marker numeric columns are finite; marker p-value and percent columns are within `[0, 1]`; marker clusters are present in the Seurat metadata; markers are reported for at least two clusters; and the marker `workflow` column matches the expected test workflow label. The test runner environment is defined in the repository-level `environment.yml`; the rule-specific R environment is still created by Snakemake from `workflow/envs/tenx2seuratrds.yml`. A separate lightweight checkpoint-expansion test uses a fake `Rscript` to materialize the `marker_manifest` checkpoint, verify that dynamic `find_markers` jobs are generated for each cluster id, and confirm that `combine_markers` receives the expected marker chunks.
The R output validator also checks that the Seurat object has at least 100 features and 100 cells; metadata rows match the cell count; barcode row names are present, unique, and nonempty; `orig.ident`, `nCount_RNA`, `nFeature_RNA`, `percent.mt`, and `seurat_clusters` metadata columns exist; RNA count and feature-count metadata values are finite and positive; mitochondrial percentages are finite and within `[0, 100]`; at least two clusters are present; PCA and UMAP reductions exist; the marker table is nonempty and has the expected columns; marker gene symbols are present and nonempty; marker numeric columns are finite; marker p-value and percent columns are within `[0, 1]`; marker clusters are present in the Seurat metadata; markers are reported for at least two clusters; and the marker `workflow` column matches the expected test workflow label. The test runner environment is defined in the repository-level `environment.yml`; the rule-specific R environment is still created by Snakemake from `workflow/envs/tenx2seuratrds.yml`, using `workflow/envs/tenx2seuratrds.linux-64.pin.txt` on Linux to recreate the locked package set. A separate lightweight checkpoint-expansion test uses a fake `Rscript` to materialize the `marker_manifest` checkpoint, verify that dynamic `find_markers` jobs are generated for each cluster id, and confirm that `combine_markers` receives the expected marker chunks.


## Optional conda and container validation
Expand All @@ -37,7 +37,7 @@ pytest tests --run-container-validation
pytest tests --run-doubletfinder-install
```

`--run-conda-validation` creates each `workflow/envs/*.yml` environment in a pytest temporary directory and checks key R/Python package imports. To validate only one environment, pass `--conda-env-name ENV_FILE`, for example `--conda-env-name soupx.yml`; GitHub Actions uses this selector to run conda validation as one matrix job per env. `--run-container-validation` pulls the CellBender container with Docker, Apptainer, or Singularity. `--run-doubletfinder-install` runs the networked Snakemake `install_doubletfinder` rule and confirms that `DoubletFinder` can be imported from the created rule environment.
`--run-conda-validation` creates each `workflow/envs/*.yml` environment in a pytest temporary directory and checks key R/Python package imports; on Linux it creates environments from the matching `*.linux-64.pin.txt` files so validation exercises the locked package specs used by Snakemake. To validate only one environment, pass `--conda-env-name ENV_FILE`, for example `--conda-env-name soupx.yml`; GitHub Actions uses this selector to run conda validation as one matrix job per env. `--run-container-validation` pulls the pinned CellBender container digest with Docker, Apptainer, or Singularity. `--run-doubletfinder-install` runs the networked Snakemake `install_doubletfinder` rule and confirms that `DoubletFinder` can be imported from the created rule environment.


## Optional full workflow run
Expand All @@ -50,12 +50,14 @@ pytest tests --run-workflow

The full-run test calls `tests/run_test_workflow.sh`, which uses `testdata/samplesheet_test.tsv` and overrides the workflow output directory with `resultsDir=testdata/results`. The manifest in `tests/test_sample_rule_output_files.txt` is therefore written with paths under `testdata/results/`.

For testing, omit `--snakemake-conda-prefix` so Snakemake uses its default `.snakemake/conda` location under the repository root. The runner assumes that the current environment already provides `snakemake` on `PATH`.
For testing, omit `--snakemake-conda-prefix` so Snakemake uses its default `.snakemake/conda` location under the repository root. Snakemake includes the `*.linux-64.pin.txt` contents in environment hashes, so changing a lock file creates a new rule environment. The runner assumes that the current environment already provides `snakemake` on `PATH`.

The GitHub-hosted full workflow action uses `tests/bin/cellbender` as a test shim for the CellBender rule because hosted runners do not provide the GPU-enabled CellBender container runtime used on the cluster. The shim copies the reference CellBender H5 outputs into the expected rule outputs, while the rest of the workflow runs normally and is still compared against the reference snapshot. Cluster/HPC runs of `pytest tests --run-workflow` do not use this shim unless `tests/bin` is explicitly added to `PATH`.


## Reference outputs

The full workflow test compares regenerated files in `testdata/results/` against reference files under `tests/reference_outputs/`. The compared file list is in `tests/test_reference_output_files.txt`. Seurat `.rds` files are compared at the metadata-table level, marker CSVs are compared by columns and `(cluster, genesymbol)` rows with numeric tolerance, emptyDrops matrix files are compared after gzip decompression, CellBender H5 outputs are compared with `h5diff`, and remaining durable outputs are compared byte-for-byte.
The full workflow test compares regenerated files in `testdata/results/` against reference files under `tests/reference_outputs/`. The compared file list is in `tests/test_reference_output_files.txt`. Seurat `.rds` files are compared at the metadata-table level, marker CSVs are compared by columns and `(cluster, genesymbol)` rows with numeric tolerance, emptyDrops matrix files are compared after gzip decompression, CellBender H5 outputs are compared with `h5diff`, and remaining durable outputs are compared byte-for-byte. Seurat metadata numeric columns use a strict default tolerance, except `neighborhood_purity`, which allows an absolute tolerance of `0.025` because it is a derived nearest-neighbor purity metric that can vary slightly across R/Bioconductor/platform builds even when barcode order and `seurat_clusters` match. Override that tolerance with `SEURAT_METADATA_NEIGHBORHOOD_PURITY_TOLERANCE` if needed.

To refresh the reference snapshot after intentionally changing workflow behavior, first run the full test workflow so `testdata/results/` contains the desired outputs, then run:

Expand Down
56 changes: 56 additions & 0 deletions tests/bin/cellbender
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env bash
set -euo pipefail

if [[ "${1:-}" != "remove-background" ]]; then
echo "test CellBender shim only supports: cellbender remove-background" >&2
exit 2
fi
shift

if [[ "${1:-}" == "--help" ]]; then
echo "Usage: cellbender remove-background --input INPUT --output OUTPUT [--seed SEED] [--cuda]"
exit 0
fi

output=""
while [[ $# -gt 0 ]]; do
case "$1" in
--output)
output="${2:-}"
shift 2
;;
--input|--seed)
shift 2
;;
--cuda)
shift
;;
*)
shift
;;
esac
done

if [[ -z "${output}" ]]; then
echo "test CellBender shim requires --output" >&2
exit 2
fi

source_dir="${TEST_WORKFLOW_FAKE_CELLBENDER_SOURCE:-}"
if [[ -z "${source_dir}" ]]; then
echo "TEST_WORKFLOW_FAKE_CELLBENDER_SOURCE is required for the test CellBender shim" >&2
exit 127
fi

base_source="${source_dir}/cellbender_test.h5"
filtered_source="${source_dir}/cellbender_test_filtered.h5"
filtered_output="${output%.h5}_filtered.h5"

if [[ ! -s "${base_source}" || ! -s "${filtered_source}" ]]; then
echo "test CellBender shim could not find reference H5 outputs under ${source_dir}" >&2
exit 1
fi

mkdir -p "$(dirname "${output}")"
cp "${base_source}" "${output}"
cp "${filtered_source}" "${filtered_output}"
25 changes: 20 additions & 5 deletions tests/compare_seurat_metadata.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,19 @@ if (length(args) == 0 || length(args) %% 2 != 0) {
suppressPackageStartupMessages(library("Seurat"))

numeric_tolerance <- as.numeric(Sys.getenv("SEURAT_METADATA_NUMERIC_TOLERANCE", "1e-8"))
if (is.na(numeric_tolerance) || length(numeric_tolerance) != 1) {
stop("SEURAT_METADATA_NUMERIC_TOLERANCE must be numeric", call. = FALSE)
if (is.na(numeric_tolerance) || length(numeric_tolerance) != 1 || numeric_tolerance < 0) {
stop("SEURAT_METADATA_NUMERIC_TOLERANCE must be a non-negative number", call. = FALSE)
}

neighborhood_purity_tolerance <- as.numeric(Sys.getenv("SEURAT_METADATA_NEIGHBORHOOD_PURITY_TOLERANCE", "0.025"))
if (is.na(neighborhood_purity_tolerance) || length(neighborhood_purity_tolerance) != 1 || neighborhood_purity_tolerance < 0) {
stop("SEURAT_METADATA_NEIGHBORHOOD_PURITY_TOLERANCE must be a non-negative number", call. = FALSE)
}

metadata_column_tolerances <- c(
neighborhood_purity = neighborhood_purity_tolerance
)

fail <- function(path, message) {
stop(sprintf("%s: %s", path, message), call. = FALSE)
}
Expand All @@ -30,14 +39,20 @@ compare_metadata_column <- function(current_path, column, current_values, refere
if (any(comparable)) {
diff <- abs(current_numeric[comparable] - reference_numeric[comparable])
scale <- pmax(abs(current_numeric[comparable]), abs(reference_numeric[comparable]), 1)
bad <- diff > numeric_tolerance * scale
column_tolerance <- if (column %in% names(metadata_column_tolerances)) {
metadata_column_tolerances[[column]]
} else {
numeric_tolerance
}
bad <- diff > column_tolerance * scale
if (any(bad)) {
fail(
current_path,
sprintf(
"metadata column %s differs numerically; max abs diff %.12g",
"metadata column %s differs numerically; max abs diff %.12g exceeds tolerance %.12g",
column,
max(diff)
max(diff),
column_tolerance
)
)
}
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
34 changes: 31 additions & 3 deletions tests/run_test_workflow.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ set -euo pipefail
usage() {
echo "Usage: $0 [--conda-prefix PATH] [additional snakemake args...]"
echo "Set TEST_WORKFLOW_SNAKEMAKE_ARGS to append whitespace-delimited Snakemake args before command-line extras."
echo "Set TEST_WORKFLOW_PROFILE_ARGS to replace the default Slurm/cannon profile args."
}

script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
Expand Down Expand Up @@ -48,20 +49,47 @@ if ! command -v snakemake >/dev/null 2>&1; then
exit 127
fi

workflow_seed=$(python - <<PYSEED
from pathlib import Path
import re

config_text = Path("config/config.yaml").read_text()
for line in config_text.splitlines():
match = re.match(r"^workflow_seed\s*:\s*([^#\s]+)", line)
if match:
print(int(match.group(1)))
break
else:
raise SystemExit("workflow_seed is missing from config/config.yaml")
PYSEED
)
export SCRNASEQ_PREPROCESS_SEED="${SCRNASEQ_PREPROCESS_SEED:-${workflow_seed}}"
if [[ "${SCRNASEQ_PREPROCESS_SEED}" != "${workflow_seed}" ]]; then
echo "SCRNASEQ_PREPROCESS_SEED=${SCRNASEQ_PREPROCESS_SEED} does not match config workflow_seed=${workflow_seed}" >&2
exit 2
fi

profile_args=(--workflow-profile profiles/slurm --profile cannon)
if [[ -n "${TEST_WORKFLOW_PROFILE_ARGS:-}" ]]; then
# Intended for simple Snakemake profile/executor args.
read -r -a profile_args <<< "${TEST_WORKFLOW_PROFILE_ARGS}"
fi

common_args=(
--snakefile workflow/Snakefile
--configfile config/config.yaml
--config sampleTable=testdata/samplesheet_test.tsv resultsDir=testdata/results
--use-conda
--workflow-profile profiles/slurm
--profile cannon
"${profile_args[@]}"
)

conda_prefix_args=()
if [[ -n "${snakemake_conda_prefix}" ]]; then
conda_prefix_args=(--conda-prefix "${snakemake_conda_prefix}")
fi

test_workflow_jobs="${TEST_WORKFLOW_JOBS:-200}"

env_snakemake_args=()
if [[ -n "${TEST_WORKFLOW_SNAKEMAKE_ARGS:-}" ]]; then
# Intended for simple Snakemake CLI tokens, such as --set-resources entries.
Expand All @@ -76,7 +104,7 @@ snakemake \
"${conda_prefix_args[@]}" \
--rerun-incomplete \
--retries 2 \
--jobs 200 \
--jobs "${test_workflow_jobs}" \
--latency-wait 600 \
"${env_snakemake_args[@]}" \
"${extra_snakemake_args[@]}"
40 changes: 32 additions & 8 deletions tests/test_conda_container_validation.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import platform
import re
import shutil
import subprocess
Expand All @@ -12,7 +13,6 @@
RULE_DIR = Path("workflow/rules")
TEST_SAMPLE_SHEET = Path("testdata/samplesheet_test.tsv")
EXPECTED_ENV_FILES = {
"cellbender.yml",
"doubletfinder.yml",
"emptydrops.yml",
"posthocfilter.yml",
Expand All @@ -22,19 +22,17 @@
}
CONDA_REFERENCE_RE = re.compile(r"conda:\s*\n\s*['\"]\.\./envs/([^'\"]+)['\"]")
CONTAINER_RE = re.compile(r"container:\s*\n\s*['\"]([^'\"]+)['\"]")
CELLBENDER_CONTAINER_URI = "docker://us.gcr.io/broad-dsde-methods/cellbender@sha256:093f2caf1ce4acae4541ea45e52ab7b220ca131ec73b4d1f664b85fe12850bae"

R_IMPORTS_BY_ENV = {
"cellbender.yml": ["Seurat", "tidyverse", "bluster"],
"doubletfinder.yml": ["Seurat", "tidyverse", "remotes", "fields", "Matrix", "KernSmooth", "ROCR", "igraph", "glmGamPoi", "bluster"],
"emptydrops.yml": ["Seurat", "tidyverse", "DropletUtils", "scater", "glmGamPoi", "bluster", "Matrix", "R.utils"],
"posthocfilter.yml": ["Seurat", "tidyverse", "glmGamPoi", "scater", "bluster"],
"scdblfinder.yml": ["Seurat", "tidyverse", "scDblFinder", "glmGamPoi", "bluster"],
"soupx.yml": ["Seurat", "tidyverse", "glmGamPoi", "bluster", "SoupX"],
"tenx2seuratrds.yml": ["Seurat", "tidyverse", "scCustomize", "hdf5r", "glmGamPoi", "bluster", "presto"],
}
PYTHON_IMPORTS_BY_ENV = {
"cellbender.yml": ["cellbender"],
}
PYTHON_IMPORTS_BY_ENV = {}


def repo_root():
Expand Down Expand Up @@ -73,6 +71,19 @@ def available_conda_frontend():
raise AssertionError("neither mamba nor conda is available on PATH")


def linux_conda_subdir():
if platform.system() == "Linux" and platform.machine() in {"x86_64", "AMD64"}:
return "linux-64"
return None


def pin_file_for_env(env_path):
subdir = linux_conda_subdir()
if subdir is None:
return None
return env_path.with_suffix(f".{subdir}.pin.txt")


def r_require_namespace_expr(packages):
package_vector = ", ".join(repr(package) for package in packages)
return (
Expand Down Expand Up @@ -100,6 +111,13 @@ def test_workflow_conda_env_files_are_well_formed_and_referenced_envs_exist():
assert env.get("channel_priority") == "strict", env_path
assert env.get("dependencies"), env_path

pin_path = pin_file_for_env(env_path)
if pin_path is not None:
assert pin_path.exists(), f"missing Snakemake conda pin file: {pin_path}"
pin_text = pin_path.read_text()
assert "@EXPLICIT" in pin_text, f"pin file is not an explicit conda spec: {pin_path}"
assert "conda-forge" in pin_text or "bioconda" in pin_text, pin_path

referenced = set()
for rule_path in sorted((root / RULE_DIR).glob("*.smk")):
referenced.update(CONDA_REFERENCE_RE.findall(rule_path.read_text()))
Expand All @@ -116,7 +134,7 @@ def test_workflow_container_declarations_are_explicit_and_recognized():
containers.append((rule_path, uri))

assert containers, "no workflow container declarations found"
assert containers == [(root / "workflow/rules/cellbender.smk", "docker://us.gcr.io/broad-dsde-methods/cellbender:latest")]
assert containers == [(root / "workflow/rules/cellbender.smk", CELLBENDER_CONTAINER_URI)]
for _, uri in containers:
assert uri.startswith("docker://")
assert ":" in uri.removeprefix("docker://"), f"container URI is missing a tag: {uri}"
Expand All @@ -135,8 +153,14 @@ def test_conda_env_solves_and_key_packages_import(tmp_path, pytestconfig, env_na
prefix = tmp_path / env_name.removesuffix(".yml")
conda = available_conda_frontend()

pin_path = pin_file_for_env(env_path)
if pin_path is not None and pin_path.exists():
create_cmd = [conda, "create", "--yes", "--prefix", str(prefix), "--file", str(pin_path)]
else:
create_cmd = [conda, "env", "create", "--yes", "--prefix", str(prefix), "--file", str(env_path)]

create = run_command(
[conda, "env", "create", "--yes", "--prefix", str(prefix), "--file", str(env_path)],
create_cmd,
root,
timeout=1800,
)
Expand Down Expand Up @@ -172,7 +196,7 @@ def test_cellbender_container_can_be_pulled(tmp_path, pytestconfig):
pytest.skip("use --run-container-validation to pull workflow containers")

root = repo_root()
uri = "docker://us.gcr.io/broad-dsde-methods/cellbender:latest"
uri = CELLBENDER_CONTAINER_URI
docker_image = uri.removeprefix("docker://")

if shutil.which("docker"):
Expand Down
15 changes: 0 additions & 15 deletions workflow/envs/cellbender.yml

This file was deleted.

Loading
Loading