Skip to content

generatebio/lock_gp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flexible Kernels for Protein Property Prediction

Martin Jankowiak ⋅ Yerdos Ordabayev ⋅ Rudraksh Tuwani ⋅ Henry N. Ward ⋅ Hunter Nisonoff ⋅ James M. McFarland ⋅ Gevorg Grigoryan

arXiv preprint


Paper Abstract

Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore—by learning what are in effect structure-aware substitution matrices—we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.

Repo contents

This repo contains a GPyTorch implementation of the LOCK GP kernel as well as a demo on CR6261-H1 antibody fitness data.

Setup

Python 3.10 or later is required. We recommend uv for easy and reproducible installation.

git clone git@github.com:generatebio/lock_gp.git
cd lock_gp
uv python install
uv sync

This repo was tested with PyTorch 2.6.

Citation

If you use LOCK GP please consider citing our paper:

@inproceedings{jankowiak2026flexible,
  title={Flexible Kernels for Protein Property Prediction},
  author={Jankowiak, Martin and Ordabayev, Yerdos and Tuwani, Rudraksh and Ward, Henry N. and Nisonoff, Hunter and McFarland, James M. and Grigoryan, Gevorg},
  booktitle={Proceedings of the 43rd International Conference on Machine Learning},
  year={2026},
  series={Proceedings of Machine Learning Research},
  publisher={PMLR}
}

About

LOCK GP: data-efficient modeling of protein property landscapes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages