This is the code repository for ESMDynamic: Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences. This repository is based on Evolutionary Scale Modeling, which has been archived.
Table of contents
If you wish to use the model to predict a small number of sequences, we recommend you simply use our Google Colab Notebook with manual sequence entry.
Otherwise, building a Docker image with the Dockerfile is the simplest option to get started. Within the container, run_esmdynamic can be used to predict sequences in batches from a FASTA or CSV file using flags --fasta or --csv.
We recommend using the Dockerfile method to create an image with all required packages. Due to package deprecations, it may be difficult to install all requirements in a Python (e.g., Conda) environment. Additionally, the Docker setup process conveniently downloads the model weights. The only downside is that the Docker image takes relatively more space (~20 GB).
First, make sure you have installed Docker.
Since a GPU is recommended to run the model, you should have installed the NVIDIA Container Toolkit as well.
Next, follow the commands:
git clone https://github.com/ShuklaGroup/esmdynamic.git # Clone repo
cd esmdynamic
docker build -t esmdynamic .
docker run --rm -it --gpus all -v "$PWD":/workspace esmdynamic # Run container in current dir w/GPU access
run_esmdynamic -h # Print help for prediction script Install Conda if not available. Create an environment and install packages (this is using Python 3.11, CUDA 12.9, torch 2.8.0).
conda create -n esmdynamic python=3.11.13
conda activate esmdynamic
conda install -c nvidia cuda-nvcc=12.9.86 cuda-toolkit=12.9.1
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip3 install mdtraj scipy omegaconf pytorch_lightning biopython ml_collections einops py3Dmol modelcif matplotlib plotly[express] dm-tree tensorboard
pip3 install git+https://github.com/NVIDIA/dllogger.git
pip3 install --no-build-isolation 'git+https://github.com/sokrypton/openfold.git' # Use the ColabFold fork!
pip install git+https://github.com/ShuklaGroup/esmdynamic.gitYou can then run the run_esmdynamic script for inference:
run_esmdynamic -h # Print docs, will download weights when neededThe predict.py script is the implementation for the executable run_esmdynamic. These are the docs:
usage: run_esmdynamic [-h] (--sequence SEQUENCE | --fasta FASTA | --csv CSV) [--batch_size BATCH_SIZE] [--chunk_size CHUNK_SIZE] [--device {cpu,cuda}] [--output_dir OUTPUT_DIR]
[--chain_ids CHAIN_IDS] [--low_memory] [--save_html] [--save_png] [--save_txt] [--save_raw_pt] [--num_recycles NUM_RECYCLES]
Predict dynamic contacts, frequency, and kinetics using ESMDynamic.
options:
-h, --help show this help message and exit
--sequence SEQUENCE Single sequence string.
--fasta FASTA Path to FASTA file with sequences.
--csv CSV CSV file with sequences (first column ID, second column sequence).
--batch_size BATCH_SIZE
Batch size.
--chunk_size CHUNK_SIZE
Model chunk size.
--device {cpu,cuda} Device to use.
--output_dir OUTPUT_DIR
Directory where outputs will be written.
--chain_ids CHAIN_IDS
Chain IDs to use for labels (e.g. ABCDEF). Default: A-Z.
--low_memory Use low-memory inference mode.
--save_html Also save interactive HTML heatmaps.
--save_png Save PNG heatmaps/plots.
--save_txt Save text/CSV outputs.
--save_raw_pt Save a .pt bundle with all cropped outputs for each sequence.
--num_recycles NUM_RECYCLES
Optional number of recycles to pass to the model.
With FASTA file input, the headers will be used as protein IDs. With CSV input, the first row are column headers, the first column contains protein IDs, and the second column contains the protein sequences.
Use : to separate chains (unless using the Colab Notebook, then use /).
To recreate the dynamic contact maps in our publication, use either of the files in examples:
run_esmdynamic --csv example.csv --output_dir exampleTo interpret the output, please see next section.
Depending on your system's memory, you may change the default values for --batch_size or --chunk_size to trade off between speed and VRAM. You may also experiment with the --low_memory flag, which runs each head sequentially instead of in parallel, but this is considerably slower.
For a detailed breakdown of model outputs, please read our accompanying documentation: ESMDynamic Output Interpretation
If you use the run_esmdynamic script or the Colab Notebook, you will obtain interactive HTML files that make visualization easier. Open the file(s) with a browser. Functionality includes zooming in and creating screen captures.
The ESMDynamic model weights are available at the Illinois Data Bank under DOI:10.13012/B2IDB-3773897_V2. Note you must still obtain the ESMFold weights to run the model. A simple way to download the weights is with:
import esm
model = esm.pretrained.esmdynamic()Weights will be found in the path given by torch.hub.get_dir().
Three datasets are available at DOI:10.13012/B2IDB-3773897_V2. Follow the instructions in the README at the Data Bank to convert the files to the format needed for training. Each directory contains information about the data splits (list of identifiers in CSV format).
| Dataset Name | Original Data Source | Related Publication |
|---|---|---|
| ATLAS (Test Set) | ATLAS Database | ATLAS |
| mdCATH | mdCATH Dataset | mdCATH |
| RCSB Clusters | RCSB | RCSB |
Warning
Datasets expand into large directories (>20 GB).
You can access predictions for most of the proteins in the human proteome (UniProt Proteome ID UP000005640) on the data repository. See this table to find what archive fragment contains the predictions you need.
These instructions apply for the training on mdCATH. First download and convert the required dataset from DOI:10.13012/B2IDB-3773897_V2 following the README from the Data Bank. Then, you can use the train.py script from this repository. You will need to write a file with training parameters, named something like train_params.txt, for example (to fit the kinetics heads only):
--loss_heads=kinetic_logits,kinetic_confidence
--kin_class_weights=kinetic_weights.pt # Class weights, bundled with dataset
--train_identifiers_file=train.csv
--val_identifiers_file=val.csv
--data_dir=./mdcath/
--outpath=./training_data_kinetics
--batch_size=4
--batch_accum=16
--epochs=100
--train_samples_per_epoch=1000
--val_samples_per_epoch=100
--alpha=0.85
--gamma=2
--device=cuda
--lr=0.001
--pretrained=previous_weights_kinetics.pt # Path to a full state dict
Then, training can be run with:
python esm/esmdynamic/training/train.py @train_params.txtIf you use this code or its related datasets, please cite:
@article {Kleiman2025.08.20.671365,
author = {Kleiman, Diego E and Feng, Jiangyan and Xue, Zhengyuan and Shukla, Diwakar},
title = {ESMDynamic: A Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences},
elocation-id = {2025.08.20.671365},
year = {2025},
doi = {10.1101/2025.08.20.671365},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/08/24/2025.08.20.671365},
eprint = {https://www.biorxiv.org/content/early/2025/08/24/2025.08.20.671365.full.pdf},
journal = {bioRxiv}
}You should also include citations to the related publications if appropriate:
Code is shared under the MIT License.
Code from ESM is also shared under the MIT License (see THIRD_PARTY_NOTICES.txt).

