CELIOS
CEll LIne OmicS processor for extracting omics data into integrated activity datasets, from which calibration files can be created for Boolean models used in the DrugLogics and the TRAFIKK pipelines.
New to CELIOS? Start with QUICKSTART.md for a 5-minute introduction, or see INSTALL.md for detailed setup instructions.
| Document | Purpose |
|---|---|
| QUICKSTART.md | 5-minute quick reference with common commands |
| INSTALL.md | Installation guide, virtual environments, troubleshooting |
| PROJECT_STRUCTURE.md | Code organization, module descriptions, architecture |
| README.md | Full API reference, usage examples, configuration details (you are here) |
| notebooks/celios_visualize_bashi_final.ipynb | Interactive example: tissue selection, CELIOS execution, visualization |
Run the pipeline from the repository root with a configuration file (JSON or YAML):
python -m celios.cli run --config path\to\config.yaml --verboseOptions:
--config(required): Path to pipeline config (JSON or YAML)--verbose: Print detailed output--plan: Only print execution plan without running--stop-after: Stop execution after a specific step
Alternatively, define the configuration directly in a Python script and call run_celios():
from celios.core import run_celios
config = {
"paths": {
"base": ".",
"input": "data",
"output": "results",
"cellfiles_dir": "results/cell_lines",
},
"steps": {
"Node": {
"node_input": "node_dic_input/DNAdamage.sif",
"hgnc_symbols_file": "node_dic_input/hgnc_complete_set.txt",
"manual_symbols_file": "node_dic_input/manual_symbols.csv",
"include_alias_previous_symbols": False,
"directory_output": "results",
},
"Activity": {
"activity_file": "activity_input/rnaseq_tpm_20220624.csv",
"cell_line_file": "activity_input/cell_line_list.csv",
"tf_activity_file": "activity_input/ccle_tf_activities.csv",
"mutations_file": "activity_input/CCLE_muts_binary.csv",
"cnv_file": "activity_input/CCLE_CNV_binary.csv",
"directory_output": "results",
"data_sources": ["mutations", "cnv", "TF"],
},
},
}
artifacts = run_celios(config=config, plan=False, verbose=True)See tests/test_run_celios.py and tests/celios_consensus.py for additional examples.
CELIOS supports organizing DrugLogics training files by tissue type. When paths.tissue_dir is specified, training files are written to tissue_dir/<Tissue>/<cell_line_name>/ based on the tissue information in cell_line_file.
This is useful when:
- You have cell lines from multiple tissues and want organized output
- Existing tissue folders contain additional files (they will be preserved)
- You want to create new tissue/cell-line directories automatically
Example config:
paths:
tissue_dir: "results/tissue_folders" # Enable tissue-organized output
steps:
Activity:
cell_line_file: "data/cell_line_list.csv" # Must contain 'tissue', 'SIDM', and 'cell_line_name' columnsThe CSV file must include columns for tissue, SIDM (unique identifier), and cell_line_name (folder names) when using tissue-organized output. For legacy mode (cellfiles_dir), no specific columns are required.
If you already have a pre-built node dictionary file, you can skip STEP 1 (Node dictionary generation) by:
- Omitting the
"Node"section fromstepsin your config - Adding
"node_dic"to theActivitysection pointing to your CSV file
Example:
config = {
"paths": {
"base": ".",
"input": "consensus",
"output": "consensus/hgsoc_results",
},
"steps": {
"Activity": {
"node_dic": "hgsoc_net/NODE_HGNC_equivalences.csv", # Pre-built node dictionary
"activity_file": "activity_input/rnaseq_tpm_20220624.csv",
"cell_line_file": "activity_input/cell_line_list.csv",
"tf_activity_file": "activity_input/ccle_tf_activities.csv",
"mutations_file": "activity_input/CCLE_muts_binary.csv",
"cnv_file": "activity_input/CCLE_CNV_binary.csv",
"directory_output": "consensus/hgsoc_results",
"data_sources": ["mutations", "cnv", "TF"],
},
},
}
artifacts = run_celios(config=config, verbose=True)The pipeline will:
- Skip STEP 1 (Node dictionary generation) since
"Node"is not defined - Load your CSV file as the node dictionary
- Proceed to STEP 2 (Activity extraction)
Your CSV file should have at least two columns:
- First column: node names
- Second column: symbols (comma-separated list of gene symbols)
See tests/celios_hgsoc.py for a real example using a pre-built node dictionary.
Scenario 1: "Node" step is defined in config
- STEP 1 will run (Node dictionary generation), even if
node_dicis also in Activity - The generated node dictionary will be used for Activity extraction
Scenario 2: "Node" step is NOT defined, but node_dic is in Activity config
- STEP 1 is skipped
- The provided CSV file is loaded and used directly
Scenario 3: Neither "Node" step nor node_dic is provided
- The pipeline will raise an error (no node dictionary source available)
Call Node helpers directly with CLI subcommands:
python -m celios.cli node-from-sif --sif examples\DNAdamage.sif --hgnc examples\hgnc_complete_set.txt --out results\node_dict.csv
python -m celios.cli node-from-object --input "TP53,BRCA1,EGFR" --hgnc examples\hgnc_complete_set.txt --out results\node_dict.csv --include_alias_prevFor a quick setup with minimal steps:
pip install -e .For detailed installation instructions, troubleshooting, and virtual environment setup, see INSTALL.md.
Alternatively, run CLI commands from the repository root without installing:
### Pipeline Overview
CELIOS is a two-step pipeline:
1. **Node Extraction** - Extract nodes from a biological network (SIF format) and map them to standardized gene symbols (HGNC)
2. **Activity Calculation** - Integrate multi-omics data (mutations, CNV, TF activity, gene expression) into activity matrices by cell line
Each step can be skipped or customized via configuration.
### Configuration-Driven Design
The pipeline is entirely controlled via JSON or YAML configuration files, making it easy to:
- Define data sources and output locations
- Skip pipeline steps
- Reproduce analyses
- Scale to multiple datasets
See [PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md) for detailed code organization and module descriptions.
---
## Notes
- **YAML support** - YAML config files require `pyyaml` in your environment (optional). JSON configs work without extra packages.
```bash
pip install pyyaml
- Example configurations - See
tests/test_run_celios.py,tests/celios_consensus.py, andtests/celios_hgsoc.pyfor real-world examples - Interactive tutorials - See
notebooks/1_select_visualize.ipynbfor a step-by-step walkthrough - Project structure - See PROJECT_STRUCTURE.md for code organization This works as long as you're in the repository directory.
- YAML config support in the CLI requires
pyyamlin your environment (optional). JSON configs work without extra packages. - See
tests/test_run_celios.py,tests/celios_consensus.py, andtests/celios_hgsoc.pyfor example configurations and usage patterns.