This repository contains a complete RNA-Seq analysis pipeline designed to study differential gene expression in enterobacteria under two experimental conditions: control vs treatment.
The workflow processes raw sequencing data through quality control, preprocessing, transcriptome assembly, annotation, and differential expression analysis.
- Organism: Enterobacteria
- Conditions: Control vs Treatment
- Replicates: 3 biological replicates per condition
- Sequencing: Paired-end reads (300 bp)
The analysis consists of the following steps:
-
Assessment of raw read quality
-
Metrics evaluated:
- Per-base sequence quality
- GC content
- Sequence duplication
- Adapter contamination
-
Removal of low-quality reads:
- Mean quality < 25
- Length < 100 bp
-
Trimming of ambiguous bases (Ns)
-
Assembly using multiple k-mers:
- 21, 33, 55, 77, 99, 127
-
Combined reads from all samples
-
Alignment against Enterobacteria gene database
-
Filtering criteria:
- ≥ 90% identity
-
Functional assignment of transcripts
-
Alignment of reads to assembled transcriptome
-
Conversion and processing:
- SAM → BAM → sorted BAM → indexed BAM
- Clustering of transcripts
- Generation of gene-level count matrix
-
Statistical analysis in R
-
Outputs:
- log2 Fold Change (logFC)
- p-values and FDR
-
Significance threshold: p-value < 0.05
- ~98% mapping rate across samples
- 3385 assembled transcripts
- 6675 expression clusters
- ~1200 significantly differentially expressed genes
- Clear separation between conditions in MDS analysis
.
├── README.md
├── docs/
│ ├── project_report.pdf
│ └── diagram_pipeline.jpeg
├── scripts/
│ ├── 01_fastqc.sh
│ ├── 02_prinseq.sh
│ ├── 03_spades.sh
│ ├── 04_annotation.sh
│ ├── 05_mapping.sh
│ ├── 06_quantification.sh
│ └── 07_differential_expression.R
└── data/
└── README.md
A detailed explanation of methods, commands, and results is available in:
docs/project_report.pdf
- FastQC
- PRINSEQ
- SPAdes
- BLAST
- Bowtie2
- SAMtools
- Corset
- edgeR (R / Bioconductor)
See full bibliography in docs/project_report.pdf
Inés García de la Peña Marco Computational Omics Analysis