This GitLab repository contains the source code of the interactive web application that illustrates the examples and visual analyses presented in the paper:
Díaz, I., Enguita, J. M., Cuadrado, A. A., García, D., González, A., Valdés, N., & Chiara, M. D. (2023).
Exploratory Analysis of the Gene Expression Matrix Based on Dual Conditional Dimensionality Reduction.
IEEE Journal of Biomedical and Health Informatics.
DOI: 10.1109/JBHI.2023.3264029
The app allows users to explore the gene expression matrix interactively and reproduce the main visualizations discussed in the article.
The app makes use of a data file df_data.csv containing a GEM (Gene Expression Matrix) dataframe, having samples as rows and gene expressions as columns. It contains the minimal TCGA data to make the app work to reproduce the results of the paper (only tumors from the PCPG and KIRC studies are used in this paper, n=449).
The file was obtained from the TCGA database (downloaded from the Xenabrowser portal https://xenabrowser.net/datapages/, and curated removing samples with erroneous or missing data for some of the genes or miRNAs of interest; more details on the data preparation process can be found at https://gitlab.com/idiazblanco/morphing-projections-demo-and-dataset-preparation)
The resulting dataset contains 157 samples of pheochromocytoma and paraganglioma (PCPG), some carrying mutations in hypoxia related genes such as VHL, SDH and EPAS1, 221 of kidney clear cell carcinoma (KIRC), and 71 of normal renal tissue. For each sample, we selected the expression levels of 442 hypoxia-related genes and 129 miRNA.
The file structure of the app is as follows:
README.md
gem_i.py
imlpy.py
data/df_data.csv
data/mirna_mature.txt
Package Version
---------------------------------------- -------------------
bokeh 2.4
pandas 1.2.4
scikit-learn 0.24.1
umap-learn 0.5.2
From the main folder type:
bokeh serve gem_i.py --show
- Funding 🇪🇸. This work was supported in part by the Spanish Ministerio de Ciencia e Innovación / Agencia Estatal de Investigación MCIN/AEI/10.13039/501100011033 under Grant PID2020-115401GB-I00.
- The results shown in this app are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga