Protein Structure Landscape Visualization

This project provides an interactive visualization tool for exploring large protein databases, revealing structural complementarity and functional locality across different protein sources.

The webserver is available at https://protein-structure-landscape.sano.science/

Overview

The Protein Structure Landscape Visualization tool allows researchers and bioinformaticians to explore and analyze protein structures from various databases, including:

AlphaFold Protein Structure Database (AFDB)
ESMAtlas
Microbiome Immunity Project (MIP)

The tool presents a unified, low-dimensional representation of the protein space, enabling users to investigate the relationships between protein structure, function, and origin.

Features

Interactive 2D scatter plot visualization of protein structures
Filtering options for protein length, pLDDT scores, and superCOG annotations
Search functionality to find specific proteins by name
Finding proteins by their origin and function
Detailed information display for selected proteins
3D protein structure viewer for individual proteins
Publication details and contact information

Technology Stack

Frontend: React.js with Material-UI for the user interface
Charting: CosmoGL for high-performance 2D plotting
3D Visualization: PDBe Molstar for protein structure rendering
Backend: FastAPI

Automatic deployment (recommended)

This repo contains two files: deploy-staging.sh for staging deployment (without SSL enabled) and deploy-prod.sh for production deployment with SSL enabled. All prerequisities from below sections are required to sucessfuly run automatic deployment (for production environment variables DOMAIN, LOCAL_DOMAIN (could be the same as DOMAIN or localhost), EXTERNAL_PORT and EMAIL are also required)

Running in development mode

Prerequisities

Having webserver data fetched locally (including data.parquet, counts csv files etc.). There should be an Environment Variable DATA_WEBSERVER_PATH set pointing into this directory.
(Optional) Having fetched mip follow up clusters and having an Environment Variable DATA_PATH set pointing to this directory. WARNING: although this is optional, the env var must be set (even into an empty directory).

Clone the repository
(Once) Install frontend dependencies:
```
cd frontend
yarn install
cd ..
```

(Once) Install backend dependencies:

cd backend
pip3 install -r requirements.txt
cd ..

(Once) Set the environment variables for backend:

export DATA_WEBSERVER_PATH="{path to data for webserver}"
export DATA_PATH="{path to mip follow up clusters}"

Run backend
```
python backend/server.py
```
(in a separate terminal) Set up the environment variable for frontend:
```
export VITE_DJANGO_HOST="http://localhost:8000"
```
Run frontend:
```
cd frontend
npm run dev
```

Running in production

Prerequisities

Having webserver data fetched locally (including data.parquet, counts csv files etc.). There should be an Environment Variable DATA_WEBSERVER_PATH set pointing into this directory.
Having fetched mip follow up clusters and having an Environment Variable DATA_PATH set pointing to this directory.
having docker and docker compose installed.

No SSL Mode

In this mode, the application runs on port 8080.

Clone the repository
(Once) Install frontend dependencies:
```
cd frontend
yarn install
```
Build frontend artifacts
```
npm run build
cd ..
```

(Once) Set the environment variables for backend:

export DATA_WEBSERVER_PATH="{path to data for webserver}"
export DATA_PATH="{path to mip follow up clusters}"

(Once) Set the environment variables for nginx:

export EXTERNAL_PORT={external port to your VM, defaults to 8081}
export USE_SSL="0"

If the application is running in docker, put it down
```
sudo docker compose down
```

Run the application

sudo --preserve-env=DATA_PATH,DATA_WEBSERVER_PATH,EXTERNAL_PORT,USE_SSL docker compose up --build -d

SSL Mode

Clone the repository
(Once) Install frontend dependencies:
```
cd frontend
yarn install
```
Build frontend artifacts
```
npm run build
cd ..
```

(Once) Set the environment variables for backend:

export DATA_WEBSERVER_PATH="{path to data for webserver}"
export DATA_PATH="{path to mip follow up clusters}"

(Once) Set the environment variables for nginx:

export EXTERNAL_PORT={external port to your VM, defaults to 8081}
export USE_SSL="1"

Generate a certificate if it's not present:

sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/local/bin/certbot
sudo certbot certonly --webroot -w ${pwd}/certbot/www -d ${echo $DOMAIN} --email ${echo $EMAIL} --agree-tos --no-eff-email

Where EMAIL env is your email and DOMAIN is your domain (currently it's protein-structure-landscape.sano.science)

If the application is running in docker, put it down
```
sudo docker compose down
```

Run the application

sudo --preserve-env=DATA_PATH,DATA_WEBSERVER_PATH,EXTERNAL_PORT,USE_SSL docker compose up --build -d

Usage

Use the scatter plot to explore the protein landscape
Apply filters to focus on specific subsets of proteins
Click on data points to view detailed information and 3D structures
Use the search bar to find proteins by name

Citing

Szczerbiak P, Szydlowski L, Wydmański W, Renfrew PD, Koehler Leman J, Kosciolek T. Large protein databases reveal structural complementarity and functional locality. bioRxiv.

DOI: https://doi.org/10.1101/2024.08.14.607935

@article{szczerbiak_large_2024,
	title = {Large protein databases reveal structural complementarity and functional locality},
	url = {https://www.biorxiv.org/content/early/2024/08/17/2024.08.14.607935},
	doi = {10.1101/2024.08.14.607935},
	abstract = {Recent breakthroughs in protein structure prediction have led to an unprecedented surge in high-quality 3D models, highlighting the need for efficient computational solutions to manage and analyze this wealth of structural data. In our work, we comprehensively examine the structural clusters obtained from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. Our results show that, while each database occupies distinct regions within the protein structure space, they collectively exhibit significant overlap in their functional potential. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. To facilitate exploration and improve access to our data, we developed an open-access web server. Our findings lay the groundwork for more in-depth studies concerning protein sequence-structure-function relationships, where various biological questions can be asked about taxonomic assignments, environmental factors, or functional specificity.Competing Interest StatementThe authors have declared no competing interest.},
	journal = {bioRxiv},
	author = {Szczerbiak, Paweł and Szydlowski, Lukasz and Wydmański, Witold and Douglas Renfrew, P. and Leman, Julia Koehler and Kosciolek, Tomasz},
	year = {2024},
	note = {Publisher: Cold Spring Harbor Laboratory
\_eprint: https://www.biorxiv.org/content/early/2024/08/17/2024.08.14.607935.full.pdf},
}

Contact

For questions or support, please contact: wwydmanski@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
backend		backend
certbot/www/.well-known/acme-challenge		certbot/www/.well-known/acme-challenge
frontend		frontend
nginx		nginx
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
deploy-prod.sh		deploy-prod.sh
deploy-staging.sh		deploy-staging.sh
docker-compose.yml		docker-compose.yml
is-certificate-expired.sh		is-certificate-expired.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Structure Landscape Visualization

Overview

Features

Technology Stack

Automatic deployment (recommended)

Running in development mode

Prerequisities

Running in production

Prerequisities

No SSL Mode

SSL Mode

Usage

Citing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Protein Structure Landscape Visualization

Overview

Features

Technology Stack

Automatic deployment (recommended)

Running in development mode

Prerequisities

Running in production

Prerequisities

No SSL Mode

SSL Mode

Usage

Citing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages