Skip to content

TreesLab/eLncRNA_mediation

Repository files navigation

eLncRNA-mediated_regulatory_axes

Pipeline for identifying and evaluating eLncRNA-mediated regulatory axes using TCGA, GTEx, eQTL, clinical, and genetic evidence.

Overview

This repository is organized as a stepwise Snakemake pipeline. Each numbered directory contains one analysis stage and its own README.md.

Step Directory Purpose
01 01_prepare_TCGA_data/ Prepare TCGA expression, genotype, clinical, and annotation files.
02 02_prepare_GTEx_data/ Prepare GTEx tissue expression, genotype, covariate, and annotation files.
03 03_eQTLs_prediction/ Run cis- and trans-eQTL prediction.
04 04_get_trios/ Build candidate lncRNA-SNP-gene regulatory trios.
05 05_clinical_evidences/ Evaluate clinical and survival evidence.
06 06_genetic_evidences/ Evaluate MR, coloc, SMR, PredictDB, and SPrediXcan evidence.
07 07_GWAS_enrichment_analyses/ Run GWAS enrichment analyses using eQTL-derived annotations.
08 08_add_druggability_and_STRING_IAS/ Annotate mediation eGenes with druggability and STRING IAS evidence.

Results

Selected processed results are available on Zenodo: 10.5281/zenodo.17605304.

Basic Usage

Run each step from its own directory. The exact Snakefile names and aggregate targets are documented in each numbered directory's README.md.

cd 01_prepare_TCGA_data
snakemake -s TCGA_SNPs_data.smk --use-apptainer --cores 8
snakemake -s TCGA_gene_expression_data.smk --use-apptainer --cores 8

For steps with a prepare_input.py script, run it before Snakemake:

python prepare_input.py
snakemake -s workflow.smk --use-apptainer --cores 8

The prepare_input.py scripts use hard links to avoid duplicating large files.

Input Preparation

Manual inputs are mainly reference files, TCGA/GTEx source data, and GWAS summary statistics. Check the README in each numbered directory for the required paths before running that step.

Some steps also require large reference resources to be prepared with step-specific download scripts before running Snakemake.

Configuration

Each step has its own config.yaml. Update sample lists, tissue lists, TCGA cancer types, and GWAS IDs there before running the workflow.

Notes

  • The workflows are designed to run with Apptainer/Singularity containers through Snakemake.
  • Hard links require source and destination files to be on the same filesystem.
  • Run the pipeline in numerical order unless you already have the required intermediate inputs.
  • Large protected datasets such as TCGA/GTEx genotype data may need to be downloaded manually.
  • Large public reference files should be downloaded or linked according to the step-specific README before running the corresponding workflow.

About

eLncRNA-mediated regulatory axes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors