OmniGate: Omics-Integrated Gating for Multi-Cancer Subtype Classification

Rishit Kar¹, Ved Ambular¹, Sargam Nagar¹, Varun Shenai¹

¹ Department of Computer Engineering, DJ Sanghvi College of Engineering

OMNIGATE is a deep learning framework designed for robust multi-modal cancer subtype classification. Unlike traditional fusion methods that simply concatenate features, OMNIGATE utilizes a dynamic context gating mechanism that learns to weigh the importance of specific omics layers (mRNA, miRNA, CNV, Methylation) on a per-sample basis.

Dataset Description

The dataset used in this study is extracted from the MLOmics dataset, a publicly available multi-omics benchmark designed for cancer subtype classification tasks. It integrates heterogeneous molecular data collected from large-scale cancer genomics projects.

Overview

The core model learns a latent representation for each omics modality and then applies a context-aware gate to each latent block before final classification. This lets the network dynamically emphasize the most informative modality for each sample instead of relying on static concatenation alone.

The current src pipeline supports:

Multi-cancer training across GS-BRCA, GS-LGG, GS-OV, GS-COAD, and GS-GBM
Gated multi-omics neural fusion with focal loss and regularization terms
Dynamic fold selection based on the minimum class count
Classifier-head ablation with Base_MLP, SVM, XGBoost, and Deeper_MLP
Aggregated gate-importance plots and Top-20 feature sensitivity plots
Fold-wise and global CSV export for downstream analysis

Repository Layout

OMNIGATE/
├── preprocessing/
│   └── processed_multicancer/
│       └── GS-*/                      # Per-cancer processed arrays and feature-name files
├── results_aggregated/               # Generated outputs after training
├── src/
│   ├── config.py                     # Global settings, paths, runtime configuration
│   ├── data.py                       # Dataset and feature-name loading
│   ├── models.py                     # Losses and gated fusion network
│   ├── training.py                   # Fold training and classifier ablation
│   ├── reporting.py                  # Plotting and CSV export
│   └── main.py                       # End-to-end training entrypoint
├── docs/
│   └── assets/                       # README figures
├── final_ablation_summary_all_cancers.csv
└── requirements.txt

Data Format

Each cancer directory under preprocessing/processed_multicancer/ is expected to contain:

mRNA_processed.npy
miRNA_processed.npy
CNV_processed.npy
Methy_processed.npy
labels.npy
mRNA_features.json
miRNA_features.json
CNV_features.json
Methy_features.json

Example:

preprocessing/processed_multicancer/GS-BRCA/
├── mRNA_processed.npy
├── miRNA_processed.npy
├── CNV_processed.npy
├── Methy_processed.npy
├── labels.npy
├── mRNA_features.json
├── miRNA_features.json
├── CNV_features.json
└── Methy_features.json

Environment Setup

The easiest way to run this project reproducibly is with Docker. The container installs all dependencies, includes the training code, and runs the pipeline in a headless environment that is already configured for matplotlib.

Quick Start With Docker

1. Build the image

docker build -t omnigate .

2. Run the full pipeline

docker run --rm -it omnigate

This starts the full multi-cancer training pipeline and writes outputs inside the container at:

/app/results_aggregated

3. Save generated outputs to your machine

If you want the generated plots and CSV files to appear directly in your local project folder, mount the results directory when running the container:

docker run --rm -it \
  -v "$(pwd)/results_aggregated:/app/results_aggregated" \
  omnigate

This is the most practical way to work with the project because the exported files will remain available on your host system after the container exits.

Data Setup

Before running the container, make sure the processed dataset is already present in:

preprocessing/processed_multicancer/

Each cancer directory should include:

mRNA_processed.npy
miRNA_processed.npy
CNV_processed.npy
Methy_processed.npy
labels.npy
mRNA_features.json
miRNA_features.json
CNV_features.json
Methy_features.json

How To Train

For most users, Docker is the recommended training path:

docker run --rm -it \
  -v "$(pwd)/results_aggregated:/app/results_aggregated" \
  omnigate

This command will:

Load each cancer dataset from preprocessing/processed_multicancer/
Train the gated fusion model with stratified cross-validation
Evaluate alternative classifier heads on the learned fused representation
Compute sensitivity-based Top-20 feature rankings for each omics modality
Save all aggregated figures and CSV summaries to results_aggregated/

Alternative Local Run

If you do not want to use Docker, you can still run the code locally with a Python environment and requirements.txt. That path is optional now, and Docker should be preferred when you want the easiest reproducible setup.

Outputs

After training, the pipeline writes outputs such as:

results_aggregated/final_ablation_summary_all_cancers.csv
results_aggregated/<CANCER>/detailed_ablation_results.csv
results_aggregated/<CANCER>/aggregated_gate_importance.png
results_aggregated/<CANCER>/aggregated_top20_mRNA.csv
results_aggregated/<CANCER>/aggregated_top20_mRNA.png

Equivalent Top-20 feature files are also produced for miRNA, CNV, and Methy.

Configuration

Main runtime settings are defined in src/config.py, including:

MAX_EPOCHS
MIN_EPOCHS
PATIENCE
LR
WEIGHT_DECAY
ALIGN_W
ORTHO_W
GATE_ENT_W
SPARSITY_W
OMICS_DROPOUT_P
LATENT_DIM

If you want to adapt the pipeline for new experiments, this is the first file to modify.

Method Summary

The model trains one encoder per modality, concatenates latent vectors to build global context, and then predicts modality-specific gates from that context. The gated latent vectors are fused and passed into a classifier head. Training combines focal loss with alignment, orthogonality, gate-entropy, and sparsity terms to improve robustness and reduce redundant modality usage.

The ablation workflow reuses learned fused embeddings and compares:

Neural baseline head
SVM head
XGBoost head
Deeper MLP head

This design makes it easier to test whether performance gains come from the representation itself, the classifier head, or both.

Reproducibility

Random seeds are fixed in src/config.py
Fold generation uses StratifiedKFold
CUDA is used automatically when available
Output directories are created automatically on startup

Intended Use

This codebase is structured for research experimentation, internal benchmarking, and figure generation around multi-omics cancer subtype classification. For production or clinical deployment, additional dataset validation, calibration, uncertainty estimation, and external evaluation would be required.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
docs/assets		docs/assets
preprocessing		preprocessing
results_aggregated		results_aggregated
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
final_ablation_summary_all_cancers.csv		final_ablation_summary_all_cancers.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniGate: Omics-Integrated Gating for Multi-Cancer Subtype Classification

Dataset Description

Overview

Repository Layout

Data Format

Environment Setup

Quick Start With Docker

1. Build the image

2. Run the full pipeline

3. Save generated outputs to your machine

Data Setup

How To Train

Alternative Local Run

Outputs

Configuration

Method Summary

Reproducibility

Intended Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniGate: Omics-Integrated Gating for Multi-Cancer Subtype Classification

Dataset Description

Overview

Repository Layout

Data Format

Environment Setup

Quick Start With Docker

1. Build the image

2. Run the full pipeline

3. Save generated outputs to your machine

Data Setup

How To Train

Alternative Local Run

Outputs

Configuration

Method Summary

Reproducibility

Intended Use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages