VVIRAL

The primary functions of this repository include:

Data Preprocessing: Parsing raw AKTA result files (including .res, .zip, .csv, and .xlsx formats) to extract relevant experimental parameters and chromatogram data.
Feature Engineering: Transforming the extracted data into structured feature sets suitable for machine learning, including calculating peak metrics and collating run information.
Predictive Modeling: Building and training machine learning models to predict purification outcomes (e.g., "Total Capsids") based on process parameters.
Bayesian Optimization: Utilizing the trained model a surrogate within the BayBE framework to intelligently recommend new experimental conditions aimed at optimizing purification yield and purity.

Environemnt Setup

$ conda env create --name viral --file=viral.yml

$ conda activate viral

Project Structure

`data_extraction`

This folder contains scripts used to parse and extract relevant data from historical chromatography runs, primarily from AKTA systems. Key scripts include:

extract_matrix.py: Extracts features from .csv files (likely exported from AKTA results).
extract_xlsx_data.py: Extracts features from .xlsx files.
extract_peaks.py & extract_xlsx_peaks.py: Extract peak-specific metrics from .csv and .xlsx files respectively.
pycorn-bin.py: A utility (likely based on the PyCORN library) used for processing AKTA .res or .zip result files, potentially converting them to .xlsx or .csv. See extract_zips.sh for usage example.
utils.py & utils_xlsx.py: Contain helper functions for data loading, parsing filenames (e.g., get_resin_and_serotype, get_column_volume), calculating metrics, and potentially plotting (show_peaks).

The goal of these scripts is to generate structured datasets suitable for machine learning.

`Modeling`

This folder contains scripts for building predictive models based on the data extracted in the data_extraction phase.

train.py: Script to train models (specifically Gaussian Process models as indicated). It loads processed data, splits it, potentially scales features (log_transform), trains a model (train_gp_model), makes predictions (gp_predict), and evaluates performance (get_metrics).
models.py: Likely defines the model architectures and training/prediction functions (e.g., Gaussian Process related functions).
utils.py: Contains helper functions for modeling, such as transformations (log_transform, inverse_log_transform) and metric calculation (get_metrics).
ML_analysis.ipynb: A Jupyter Notebook for exploratory data analysis and potentially model experimentation.

`Optimization campaigns`

This folder contains scripts related to using the trained models for Bayesian optimization campaigns to suggest new experimental conditions.

surrogate_model.py: Defines the Gaussian Process surrogate model (gp_model) used within the optimization framework (BayBE). It specifies the kernel structure (using DotProductKernel, RQKernel, MaternKernel).
Each folder contains the serotypes that were purified in the downstream optimization campaign (AAV2, AAV5, AAV9).

Workflow

Data Extraction: Raw AKTA result files (e.g., .res, .zip) are processed using scripts in data_extraction (using pycorn-bin.py via extract_zips.sh) to generate intermediate .csv or .xlsx files.
Feature Engineering: extract_matrix.py, extract_xlsx_data.py, and extract_peaks.py parse these intermediate files to create feature matrices and target variable datasets.
Model Training: The Modeling/train.py script uses the generated datasets to train models.
Optimization: The Gaussian process model (surrogate_model.py) is used as a surrogate in Optimization campaigns scripts (e.g., AAV2_AAVA3_campaign.py) to recommend new experimental conditions expected to optimize the target ("Total Capsids").

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Modeling		Modeling
Optimization campaigns		Optimization campaigns
PyCORN		PyCORN
data_extraction		data_extraction
figures		figures
.gitignore		.gitignore
README.md		README.md
viral.yml		viral.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VVIRAL

Environemnt Setup

Project Structure

`data_extraction`

`Modeling`

`Optimization campaigns`

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VVIRAL

Environemnt Setup

Project Structure

data_extraction

Modeling

Optimization campaigns

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`data_extraction`

`Modeling`

`Optimization campaigns`

Packages