ESMDynamic

This is the code repository for ESMDynamic: Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences. This repository is based on Evolutionary Scale Modeling, which has been archived.

Table of contents

Usage
Available Models and Datasets
Training
Citations
License

Usage

Quick Start

If you wish to use the model to predict a small number of sequences, we recommend you simply use our Google Colab Notebook with manual sequence entry.

Otherwise, building a Docker image with the Dockerfile is the simplest option to get started. Within the container, run_esmdynamic can be used to predict sequences in batches from a FASTA or CSV file using flags --fasta or --csv.

Installation

We recommend using the Dockerfile method to create an image with all required packages. Due to package deprecations, it may be difficult to install all requirements in a Python (e.g., Conda) environment. Additionally, the Docker setup process conveniently downloads the model weights. The only downside is that the Docker image takes relatively more space (~20 GB).

Docker

First, make sure you have installed Docker.

Since a GPU is recommended to run the model, you should have installed the NVIDIA Container Toolkit as well.

Next, follow the commands:

git clone https://github.com/ShuklaGroup/esmdynamic.git # Clone repo
cd esmdynamic
docker build -t esmdynamic .
docker run --rm -it --gpus all -v "$PWD":/workspace esmdynamic # Run container in current dir w/GPU access
run_esmdynamic -h # Print help for prediction script

Conda

Install Conda if not available. Create an environment and install packages (this is using Python 3.11, CUDA 12.9, torch 2.8.0).

conda create -n esmdynamic python=3.11.13
conda activate esmdynamic
conda install -c nvidia cuda-nvcc=12.9.86 cuda-toolkit=12.9.1
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip3 install mdtraj scipy omegaconf pytorch_lightning biopython ml_collections einops py3Dmol modelcif matplotlib plotly[express] dm-tree tensorboard
pip3 install git+https://github.com/NVIDIA/dllogger.git
pip3 install --no-build-isolation 'git+https://github.com/sokrypton/openfold.git' # Use the ColabFold fork!
pip install git+https://github.com/ShuklaGroup/esmdynamic.git

You can then run the run_esmdynamic script for inference:

run_esmdynamic -h # Print docs, will download weights when needed

Bulk Prediction

The predict.py script is the implementation for the executable run_esmdynamic. These are the docs:

usage: run_esmdynamic [-h] (--sequence SEQUENCE | --fasta FASTA | --csv CSV) [--batch_size BATCH_SIZE] [--chunk_size CHUNK_SIZE] [--device {cpu,cuda}] [--output_dir OUTPUT_DIR]
                      [--chain_ids CHAIN_IDS] [--low_memory] [--save_html] [--save_png] [--save_txt] [--save_raw_pt] [--num_recycles NUM_RECYCLES]

Predict dynamic contacts, frequency, and kinetics using ESMDynamic.

options:
  -h, --help            show this help message and exit
  --sequence SEQUENCE   Single sequence string.
  --fasta FASTA         Path to FASTA file with sequences.
  --csv CSV             CSV file with sequences (first column ID, second column sequence).
  --batch_size BATCH_SIZE
                        Batch size.
  --chunk_size CHUNK_SIZE
                        Model chunk size.
  --device {cpu,cuda}   Device to use.
  --output_dir OUTPUT_DIR
                        Directory where outputs will be written.
  --chain_ids CHAIN_IDS
                        Chain IDs to use for labels (e.g. ABCDEF). Default: A-Z.
  --low_memory          Use low-memory inference mode.
  --save_html           Also save interactive HTML heatmaps.
  --save_png            Save PNG heatmaps/plots.
  --save_txt            Save text/CSV outputs.
  --save_raw_pt         Save a .pt bundle with all cropped outputs for each sequence.
  --num_recycles NUM_RECYCLES
                        Optional number of recycles to pass to the model.

With FASTA file input, the headers will be used as protein IDs. With CSV input, the first row are column headers, the first column contains protein IDs, and the second column contains the protein sequences.

Use : to separate chains (unless using the Colab Notebook, then use /).

To recreate the dynamic contact maps in our publication, use either of the files in examples:

run_esmdynamic --csv example.csv --output_dir example

To interpret the output, please see next section.

Depending on your system's memory, you may change the default values for --batch_size or --chunk_size to trade off between speed and VRAM. You may also experiment with the --low_memory flag, which runs each head sequentially instead of in parallel, but this is considerably slower.

Output Interpretation

For a detailed breakdown of model outputs, please read our accompanying documentation: ESMDynamic Output Interpretation

Visualization

If you use the run_esmdynamic script or the Colab Notebook, you will obtain interactive HTML files that make visualization easier. Open the file(s) with a browser. Functionality includes zooming in and creating screen captures.

Avilable Models and Datasets

Pretrained Model

The ESMDynamic model weights are available at the Illinois Data Bank under DOI:10.13012/B2IDB-3773897_V2. Note you must still obtain the ESMFold weights to run the model. A simple way to download the weights is with:

import esm
model = esm.pretrained.esmdynamic()

Weights will be found in the path given by torch.hub.get_dir().

Datasets

Three datasets are available at DOI:10.13012/B2IDB-3773897_V2. Follow the instructions in the README at the Data Bank to convert the files to the format needed for training. Each directory contains information about the data splits (list of identifiers in CSV format).

Dataset Name	Original Data Source	Related Publication
ATLAS (Test Set)	ATLAS Database	ATLAS
mdCATH	mdCATH Dataset	mdCATH
RCSB Clusters	RCSB	RCSB

Warning

Datasets expand into large directories (>20 GB).

Human Proteome

You can access predictions for most of the proteins in the human proteome (UniProt Proteome ID UP000005640) on the data repository. See this table to find what archive fragment contains the predictions you need.

Training

These instructions apply for the training on mdCATH. First download and convert the required dataset from DOI:10.13012/B2IDB-3773897_V2 following the README from the Data Bank. Then, you can use the train.py script from this repository. You will need to write a file with training parameters, named something like train_params.txt, for example (to fit the kinetics heads only):

--loss_heads=kinetic_logits,kinetic_confidence
--kin_class_weights=kinetic_weights.pt # Class weights, bundled with dataset
--train_identifiers_file=train.csv
--val_identifiers_file=val.csv
--data_dir=./mdcath/
--outpath=./training_data_kinetics
--batch_size=4
--batch_accum=16
--epochs=100
--train_samples_per_epoch=1000
--val_samples_per_epoch=100
--alpha=0.85
--gamma=2
--device=cuda
--lr=0.001
--pretrained=previous_weights_kinetics.pt # Path to a full state dict

Then, training can be run with:

python esm/esmdynamic/training/train.py @train_params.txt

Citations

If you use this code or its related datasets, please cite:

@article {Kleiman2025.08.20.671365,
	author = {Kleiman, Diego E and Feng, Jiangyan and Xue, Zhengyuan and Shukla, Diwakar},
	title = {ESMDynamic: A Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences},
	elocation-id = {2025.08.20.671365},
	year = {2025},
	doi = {10.1101/2025.08.20.671365},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/08/24/2025.08.20.671365},
	eprint = {https://www.biorxiv.org/content/early/2025/08/24/2025.08.20.671365.full.pdf},
	journal = {bioRxiv}
}

You should also include citations to the related publications if appropriate:

License

Code is shared under the MIT License.

Code from ESM is also shared under the MIT License (see THIRD_PARTY_NOTICES.txt).

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
esm		esm
examples		examples
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICES.txt		THIRD_PARTY_NOTICES.txt
environment.yml		environment.yml
hubconf.py		hubconf.py
model_scheme.png		model_scheme.png
output_interpretation.md		output_interpretation.md
pyproject.toml		pyproject.toml
setup.py		setup.py
viz_plotly.gif		viz_plotly.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESMDynamic

Usage

Quick Start

Installation

Docker

Conda

Bulk Prediction

Output Interpretation

Visualization

Avilable Models and Datasets

Pretrained Model

Datasets

Human Proteome

Training

Citations

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ESMDynamic

Usage

Quick Start

Installation

Docker

Conda

Bulk Prediction

Output Interpretation

Visualization

Avilable Models and Datasets

Pretrained Model

Datasets

Human Proteome

Training

Citations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages