Structural models and Sort-seq data for: Packing of apolar amino acids is not a strong stabilizing force in transmembrane helix dimerization
Data files
Sep 03, 2025 version files 12.32 MB
-
Count_selection.csv
859.24 KB
-
Count_sort.csv
572.52 KB
-
Percentage_selection.csv
1.85 MB
-
Percentage_sort.csv
1.23 MB
-
README.md
11.42 KB
-
refSeqs.csv
700.13 KB
-
structural_models.zip
7.11 MB
Abstract
The factors that stabilize the folding and oligomerization of membrane proteins are still not well understood. In particular, it remains unclear how the tight and complementary packing between apolar side chains observed in the core of membrane proteins contributes to their stability. Complementary packing is a necessary feature since packing defects are generally destabilizing for membrane proteins. The question is the extent to which packing of apolar side chains – and the resulting van der Waals interactions – is a sufficient driving force for stabilizing the interaction between transmembrane helices in the absence of hydrogen bonding and polar interactions. We addressed this question with an approach based on high-throughput protein design and the homodimerization of single-pass helices as the model system. We designed hundreds of transmembrane helix dimers mediated by apolar packing in the backbone configurations that are most commonly found in membrane proteins. We assessed the association propensity of the designs in the membrane of Escherichia coli and found that they were most often monomeric or, at best, weakly dimeric. Conversely, a set of controls designed in the backbone configuration of the GASright motif, which is mediated by weak hydrogen bonds, displayed significantly higher dimerization propensity. The data suggest that packing of apolar side chains and van der Waals interactions are a relatively weak force in driving transmembrane helix dimerization. It also confirms that GASright is a special configuration for achieving stability in membrane proteins.
This repository contains data relative to Loiseau and Senes, bioRxiv article https://doi.org/10.1101/2025.04.26.649789
- Structural model of designed dimers (PDB files)
- Sort-seq Data
Structural model of dimers (PDB files)
The compressed zip file contains the structural models of the designed transmembrane dimers. The file name corresponds to the constructs listed in Table S1: [G/R/L]_NNN.pdb, where
- G identifies the GAS-right dimers
- L the Left dimers
- R the Right dimers NNN is an integer serial number.
File
structural_models.zip
Code/software
PDB files are viewable with PyMol or other software that can read Protein Data Bank coordinate files.
NGS Data
Raw data from Next Generation Sequencing utilizing the TOXGREEN sort-seq methods as in Anderson et al., 2025 doi: https://doi.org/10.1101/2025.04.22.650048.
Oligo pool segment data
File refSeqs.csv
This CSV file contains the segment in the oligo pool library, the protein sequence, and the DNA sequence of all constructs. These include the designs (supplementary Table S3), their mutants, and the control sequences used to calibrate the TOXGREEN sort-seq runs.
| Column Name | Content |
|---|---|
| Segment number | Number of the segment in which the protein was coded (C=calibration control, P=Positive Maltose Control, N=Negative Maltose Control) |
| Protein sequence | Protein sequence |
| DNA sequence | DNA sequence ordered |
Raw TOXGREEN Sort-seq NGS Data
Raw NGS sequencing counts are provided in the following files:
Counts_sort.csvPercentage_sort.csv
These data were used to reconstruct the fluorescence profiles for all design and mutant sequences ("sort” files, Fig. 2C-D and Fig. 3).
File Count_sort.csv
Counts of constructs sorted in NGS
| Column Name | Content |
|---|---|
| Sequence | Protein Sequence |
| Segment | Number of the segment in which the protein was coded (C=Calibration control, P=Positive Maltose Control, N=Negative Maltose Control) |
| G1-Rep1 | Counts for GASright constructs sorted into bin 1, replicate 1 |
| G1-Rep2 | Counts for GASright constructs sorted into bin 1, replicate 2 |
| … | |
| L1-Rep1 | Counts for Left constructs sorted into bin 1, replicate 1 |
| L2-Rep2 | Counts for Left constructs sorted into bin 1, replicate 2 |
| … | |
| R1-Rep1 | Counts for Right constructs sorted into bin 1, replicate 1 |
| R2-Rep2 | Counts for Right constructs sorted into bin 1, replicate 2 |
| … |
File Percentage_sort.csv
Percent total reads of constructs sorted in NGS
| Column Name | Content |
|---|---|
| Sequence | Protein Sequence |
| Segment | Number of the segment in which the protein was coded (C=Calibration control, P=Positive Maltose Control, N=Negative Maltose Control) |
| G1-Rep1 | Percent total reads for GASright constructs sorted into bin 1, replicate 1 |
| G1-Rep2 | Percent total reads for GASright constructs sorted into bin 1, replicate 2 |
| … | |
| L1-Rep1 | Percent total reads for Left constructs sorted into bin 1, replicate 1 |
| L2-Rep2 | Percent total reads for Left constructs sorted into bin 1, replicate 2 |
| … | |
| R1-Rep1 | Percent total reads for Right constructs sorted into bin 1, replicate 1 |
| R2-Rep2 | Percent total reads for Right constructs sorted into bin 1, replicate 2 |
| … |
G# = Sequence counts for GASright constructs in a sort-seq bin*
L# = Sequence counts for Left constructs in a sort-seq bin*
R# = Sequence counts for Right constructs in a sort-seq bin*
Rep# = Replicate number
*Bin numbers in ascending order correspond to increasing fluorescence intensity
NSG data from MalE Complementation Assay
Raw data from Next Generation Sequencing for the MalE Complementation assay, utilizing the TOXGREEN sort-seq methods as in Anderson et al., 2025 doi: https://doi.org/10.1101/2025.04.22.650048.
The corresponding relative frequencies of the above counts for each construct are provided in the following files:
Count_selection.csvPercentage_selection.csv
These data were used to select the final set of design constructs of design ("sort” files, Fig. 2C-D and Fig. 3).
File Count_selection.csv
Counts of constructs from the Maltose test, used for selection
| Column Name | Content |
|---|---|
| Sequence | Protein Sequence |
| Segment | Number of the segment in which the protein was coded (C=Calibration control, P=Positive Maltose Control, N=Negative Maltose Control) |
| G1-LB-0H | Counts for GASright constructs grown in LB media, replicate 1 at 0 hours |
| G1-LB-12H | Counts for GASright constructs grown in LB media, replicate 1 at 12 hours |
| … | |
| G1-M9-30H | Counts for GASright constructs grown in Maltose media, replicate 1 at 30 hours |
| … | |
| L1-LB-0H | Counts for Left constructs grown in LB media, replicate 1 at 0 hours |
| R1-LB-0H | Counts for Right constructs grown in LB media, replicate 1 at 0 hours |
File Percentage_selection.csv
Percent total reads of constructs from the Maltose test, used for selection
| Column Name | Content |
|---|---|
| Sequence | Protein Sequence |
| Segment | Number of the segment in which the protein was coded (C=Calibration control, P=Positive Maltose Control, N=Negative Maltose Control) |
| G1-LB-0H | Percent total reads for GASright constructs grown in LB media, replicate 1 at 0 hours |
| G1-LB-12H | Percent total reads for GASright constructs grown in LB media, replicate 1 at 12 hours |
| … | |
| G1-M9-30H | Percent total reads for GASright constructs grown in Maltose media, replicate 1 at 30 hours |
| … | |
| L1-LB-0H | Percent total reads for Left constructs grown in LB media, replicate 1 at 0 hours |
| R1-LB-0H | Percent total reads for Right constructs grown in LB media, replicate 1 at 0 hours |
| … |
G# = GASright replicate number
L# = Left replicate number
R# = Right replicate number
LB = Luria Broth Growth sample
M9 = Maltose Broth Growth sample
#H = # of hours of growth
Contact
Alessandro Senes, senes@wisc.edu
