Post-processing a singlem renew run of SRA data

Processing of public metagenomic data that has been analysed with SingleM.

Most code here is not intended for public usage as many paths etc are specific to CMR / QUT / Woodcroft group, but nonetheless may be useful for others.

Post-processing a singlem renew run of SRA data

First modify paths at the top of the Snakemake

Then setup:

pixi install --all

and run

pixi run snakemake --cores 1

Make sure the correct taxonomic level is chosen for applying predictions in the Snakemake file. See {base_output_directory}/logs/host_or_not_prediction.log for the results of the cross validation.

Host-vs-not metagenome prediction

Example code for host-vs-not prediction is contained within the Snakefile. In

Publishing a release to Zenodo

push_to_zenodo.smk creates a new draft (unpublished) Zenodo version of the sandpiper record from the data in /work/microbiome/db/sandpiper/<data_version>. It bases the new version on the published parent record (PARENT_RECORD_ID near the top of the file), uploads the gtdb / per-acc-summary / parsed-metadata / kingfisher-metadata files (renamed to sandpiper<version>.*), removes the files inherited from the previous version, and sets the version metadata.

Run it with the API token in the environment:

ZENODO_TOKEN="$(cat ~/.zenodo_draft_release_api_token)" \
  pixi run snakemake -s push_to_zenodo.smk --config version=2.0.1 -j 1

version sets the Zenodo version and the uploaded filenames.
data_version (optional) is the source data dir under /work/microbiome/db/sandpiper/ to read from; defaults to version. Use it to build a release from a different data dir, e.g. version=2.0.1 data_version=2.0.0.
Logs and the resulting draft URL are written under zenodo_drafts/<version>/ (kept local because /work is often mounted read-only).

The draft is not published — review it in the Zenodo web UI and publish manually. Two things still need handling by hand before publishing:

The GlobDB file (sandpiper<x>.globdb.csv.gz) is uploaded separately.
Grants under the legacy DOE funder (0114b2m14::, the DE-SC... grants) are rejected by the current Zenodo API and are dropped automatically; re-add them in the web UI if they are needed.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
bowerbird @ b0b8be3		bowerbird @ b0b8be3
cloud		cloud
configs		configs
local_read_processing		local_read_processing
singlem_host_or_ecological_predictor @ beb4670		singlem_host_or_ecological_predictor @ beb4670
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
historical_analysis.ipynb		historical_analysis.ipynb
low_complexity_with_organism.py		low_complexity_with_organism.py
merge_tsv_chunks.py		merge_tsv_chunks.py
otu_table_headings		otu_table_headings
per_acc_summary.py		per_acc_summary.py
pixi.lock		pixi.lock
pixi.toml		pixi.toml
push_to_zenodo.smk		push_to_zenodo.smk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Post-processing a singlem renew run of SRA data

Host-vs-not metagenome prediction

Publishing a release to Zenodo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Post-processing a singlem renew run of SRA data

Host-vs-not metagenome prediction

Publishing a release to Zenodo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages