eLncRNA-mediated_regulatory_axes

Pipeline for identifying and evaluating eLncRNA-mediated regulatory axes using TCGA, GTEx, eQTL, clinical, and genetic evidence.

Overview

This repository is organized as a stepwise Snakemake pipeline. Each numbered directory contains one analysis stage and its own README.md.

Step	Directory	Purpose
01	`01_prepare_TCGA_data/`	Prepare TCGA expression, genotype, clinical, and annotation files.
02	`02_prepare_GTEx_data/`	Prepare GTEx tissue expression, genotype, covariate, and annotation files.
03	`03_eQTLs_prediction/`	Run cis- and trans-eQTL prediction.
04	`04_get_trios/`	Build candidate lncRNA-SNP-gene regulatory trios.
05	`05_clinical_evidences/`	Evaluate clinical and survival evidence.
06	`06_genetic_evidences/`	Evaluate MR, coloc, SMR, PredictDB, and SPrediXcan evidence.
07	`07_GWAS_enrichment_analyses/`	Run GWAS enrichment analyses using eQTL-derived annotations.
08	`08_add_druggability_and_STRING_IAS/`	Annotate mediation eGenes with druggability and STRING IAS evidence.

Results

Selected processed results are available on Zenodo: 10.5281/zenodo.17605304.

Basic Usage

Run each step from its own directory. The exact Snakefile names and aggregate targets are documented in each numbered directory's README.md.

cd 01_prepare_TCGA_data
snakemake -s TCGA_SNPs_data.smk --use-apptainer --cores 8
snakemake -s TCGA_gene_expression_data.smk --use-apptainer --cores 8

For steps with a prepare_input.py script, run it before Snakemake:

python prepare_input.py
snakemake -s workflow.smk --use-apptainer --cores 8

The prepare_input.py scripts use hard links to avoid duplicating large files.

Input Preparation

Manual inputs are mainly reference files, TCGA/GTEx source data, and GWAS summary statistics. Check the README in each numbered directory for the required paths before running that step.

Some steps also require large reference resources to be prepared with step-specific download scripts before running Snakemake.

Configuration

Each step has its own config.yaml. Update sample lists, tissue lists, TCGA cancer types, and GWAS IDs there before running the workflow.

Notes

The workflows are designed to run with Apptainer/Singularity containers through Snakemake.
Hard links require source and destination files to be on the same filesystem.
Run the pipeline in numerical order unless you already have the required intermediate inputs.
Large protected datasets such as TCGA/GTEx genotype data may need to be downloaded manually.
Large public reference files should be downloaded or linked according to the step-specific README before running the corresponding workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

eLncRNA-mediated_regulatory_axes

Overview

Results

Basic Usage

Input Preparation

Configuration

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
01_prepare_TCGA_data		01_prepare_TCGA_data
02_prepare_GTEx_data		02_prepare_GTEx_data
03_eQTLs_prediction		03_eQTLs_prediction
04_get_trios		04_get_trios
05_clinical_evidences		05_clinical_evidences
06_genetic_evidences		06_genetic_evidences
07_GWAS_enrichment_analyses		07_GWAS_enrichment_analyses
08_add_druggability_and_STRING_IAS		08_add_druggability_and_STRING_IAS
README.md		README.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

eLncRNA-mediated_regulatory_axes

Overview

Results

Basic Usage

Input Preparation

Configuration

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages