NGS data from: Deploying synthetic coevolution and machine learning to engineer protein-protein interactions
Data files
Jul 21, 2023 version files 17.48 MB
-
HL1_naive.csv
-
HL1_R2.csv
-
HL1_R3.csv
-
HL1_R4.csv
-
HL2_naive.csv
-
HL2_R2.csv
-
HL2_R3.csv
-
HL2_R4.csv
-
HL2_R5.csv
-
LL1_Naive.csv
-
LL1_R2.csv
-
LL1_R4.csv
-
LL1_R5.csv
-
LL1_R6.csv
-
LL1_R7.csv
-
LL1_R8.csv
-
LL2_Naive.csv
-
LL2_R2.csv
-
LL2_R4.csv
-
LL2_R5.csv
-
LL2_R6.csv
-
LL2_R7.csv
-
LL2_R8.csv
-
README.md
Abstract
Fine-tuning of protein-protein interactions occurs naturally through coevolution, but this process is difficult to recapitulate in the laboratory. We describe a synthetic platform for protein-protein coevolution that can isolate matched pairs of interacting muteins from complex libraries. This large dataset of coevolved complexes drove a systems-level analysis of molecular recognition between Z domain-affibody pairs spanning a wide range of structures, affinities, cross-reactivities, and orthogonalities, and captured a broad spectrum of coevolutionary networks. Furthermore, we harnessed pre-trained protein language models to expand, in silico, the amino acid diversity of our coevolution screen, predicting remodeled interfaces beyond the reach of the experimental library. The integration of these approaches provides a means of generating protein complexes with diverse molecular recognition properties as tools for biotechnology and synthetic biology.
Methods
The Z domain-affibody coevolution library data from the paper was obtained using an Illumina MiSeq sequencer. The sequencing data was processed to extract the sequences corresponding to the library positions in each Z-A and Z-B pair. The raw next-generation sequencing (NGS) data was parsed to retrieve the relevant information. Specifically, the sequences from Z-A were captured in the first column, followed by the sequences from Z-B in the second column, and the corresponding read counts were recorded in the third column from the uploaded csv files. This parsing process allowed for the extraction of the specific sequences present in the coevolution library and the quantification of their abundance through read counts. By analyzing this dataset, the study was able to investigate the coevolution patterns between Z-A and Z-B and examine the interplay of mutations on protein-protein interactions.
Usage notes
All further data processing, filtering, and analysis steps were performed as described in the paper. Data and codes for MI, coevolution analysis, and deep learning model are available at https://github.com/akds/CoevolveML.