Generalized biomolecular modeling and design with RoseTTAFold all-atom
Data files
Mar 11, 2024 version files 23.20 MB
Abstract
Although AlphaFold2 (AF2) and RoseTTAFold (RF) have transformed structural biology by enabling high-accuracy protein structure modeling, they are unable to model covalent modifications or interactions with small molecules and other non-protein molecules that can play key roles in biological function. Here, we describe RoseTTAFold All-Atom (RFAA), a deep network capable of modeling full biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications given the sequences of the polymers and the atomic bonded geometry of the small molecules and covalent modifications. Following training on structures of full biological assemblies in the Protein Data Bank (PDB), RFAA has comparable protein structure prediction accuracy to AF2, excellent performance in CAMEO for flexible backbone small molecule docking, and reasonable prediction accuracy for protein covalent modifications and assemblies of proteins with multiple nucleic acid chains and small molecules which, to our knowledge, no existing method can model simultaneously. By fine-tuning on diffusive denoising tasks, we develop RFdiffusion All-Atom (RFdiffusionAA), which generates binding pockets by directly building protein structures around small molecules and other non-protein molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we design and experimentally validate proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and optically active bilin molecules with potential for expanding the range of wavelengths captured by photosynthesis. We anticipate that RFAA and RFdiffusionAA will be widely useful for modeling and designing complex biomolecular systems.
README: Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom
https://doi.org/10.5061/dryad.mcvdnck6v
Description of the data and file structure
This is a deposition of data for the manuscript "Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom which describes two methods:
- A structure prediction method that can accept, proteins, nucleic acids, small molecules and covalent modifications. (RoseTTAFold All-Atom)
- A generative model that generates protein structures around small molecules. (RFDiffusion All-Atom)
In this data deposition, there are a set of folders.
A. fig2_structures, which contain pymol sessions used to make images in figure 2 of the manuscript. These are examples of predicted protein small molecule complexes.
B. fig3_structures, which contain pymol sessions used to make images in figure 3 of the manuscript. These are examples of predicted covalent modifications to proteins.
C. posebusters, which has the target wise performance of RFAA on the posebusters benchmark set which was used to compare to other methods. The csv file in the posebusters directory was generated by running the Posebusters suite on our predicted structures (as detailed in the manuscript). We choose not to edit this file to remove empty cells created by the automated software to maintain the integrity of the results.
D. training_set_sequences, which has all the potential sequences that could have been seen in the training set.
E. dig_designs, which has PDB files representing design models for all the experimentally characterized digoxigenin binders designed by RFdiffusion All-Atom
F. heme_designs, which has PDB files representing design models for all the experimentally characterized heme binders designed by RFdiffusion All-Atom
G. bilin_designs, which has PDB files representing design models for all the experimentally characterized bilin binders designed by RFdiffusion All-Atom
During the revision process, we had also included the PDB validation files that were included to confirm that the solved crystal structure was deposited to the PDB, but we remove these now as the structure is publicly accessible.