Skip to content

elkebir-group/MASS-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MASS-data

This repository holds datasets, MASS run outputs, and analysis code used alongside MASS (Maximum Agreement Secondary Structures). It is the companion data repository; clone the MASS repository for the tool itself.

Repository layout

Path Purpose
data/ Input structure ensembles (mostly FASTA: sequence + dot–bracket per entry).
results/ CSV outputs from running MASS (e.g. per-family or per-cluster runs, various tau ranges and algorithms).
analysis/ Python modules and Jupyter notebooks used to process outputs and generate figures.

data/ subcollections

  • Rfam/ — Rfam family–level FASTA files used as inputs.
  • CoDNaS-RNA/ — CoDNaS-RNA cluster FASTA files (combined structure sets per cluster).
  • Simulated/small/ — Simulated ensembles (sim_small_row*.fasta).
  • Simulated/large/ — Larger simulated ensembles (sim_large_row*.fasta).

results/ subcollections

Outputs are grouped similarly to inputs (e.g. Rfam/, CoDNaS-RNA/). Filenames typically encode the source identifier, tau range, and algorithm (e.g. mstp_beam1000, ilp).

analysis/ contents

  • paths.py — Resolves REPO_ROOT, DATA_ROOT (data/), OUTPUT_ROOT (results/), and FIGS_ROOT (Figs/). Override the repo root with the REPO_ROOT environment variable if needed; optional EXTERNAL_DATA_ROOT for external CSVs.
  • utils.py — Shared helpers for loading and plotting.
  • parse_structures.py — Structure parsing and related utilities.
  • Notebooks: rfam.ipynb, codnas-rna.ipynb, simulated.ipynb — end-to-end processing and figures for each dataset class.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors