This repository holds datasets, MASS run outputs, and analysis code used alongside MASS (Maximum Agreement Secondary Structures). It is the companion data repository; clone the MASS repository for the tool itself.
| Path | Purpose |
|---|---|
data/ |
Input structure ensembles (mostly FASTA: sequence + dot–bracket per entry). |
results/ |
CSV outputs from running MASS (e.g. per-family or per-cluster runs, various tau ranges and algorithms). |
analysis/ |
Python modules and Jupyter notebooks used to process outputs and generate figures. |
Rfam/— Rfam family–level FASTA files used as inputs.CoDNaS-RNA/— CoDNaS-RNA cluster FASTA files (combined structure sets per cluster).Simulated/small/— Simulated ensembles (sim_small_row*.fasta).Simulated/large/— Larger simulated ensembles (sim_large_row*.fasta).
Outputs are grouped similarly to inputs (e.g. Rfam/, CoDNaS-RNA/). Filenames typically encode the source identifier, tau range, and algorithm (e.g. mstp_beam1000, ilp).
paths.py— ResolvesREPO_ROOT,DATA_ROOT(data/),OUTPUT_ROOT(results/), andFIGS_ROOT(Figs/). Override the repo root with theREPO_ROOTenvironment variable if needed; optionalEXTERNAL_DATA_ROOTfor external CSVs.utils.py— Shared helpers for loading and plotting.parse_structures.py— Structure parsing and related utilities.- Notebooks:
rfam.ipynb,codnas-rna.ipynb,simulated.ipynb— end-to-end processing and figures for each dataset class.