Epistasis plays a limited role in driving entrenchment during neutral protein evolution
Data files
Jun 05, 2026 version files 447.98 KB
-
NxE.diff
882 B
-
README.md
3.25 KB
-
SI_NxE-Orthodomain-Alignments-2.zip
443.86 KB
Abstract
This dataset contains curated orthologous domain (“orthodomain”) multiple sequence alignments generated for large-scale analyses of protein evolution, epistasis, and evolutionary constraint. The alignments were derived from protein domain family datasets analyzed in de la Paz et al. For each domain family, human domain sequences were identified through BLAST searches against human protein sequences obtained from UCSC 100-vertebrate alignments. Corresponding vertebrate alignments were then trimmed to retain only the regions homologous to each query domain while preserving the complete domain length. The resulting ortho-domain alignments comprise homologous vertebrate sequence sets for individual protein domains and were used to investigate evolutionary dynamics, sequence constraints, and epistatic interactions across proteins.
Dataset DOI: 10.5061/dryad.sj3tx96kk
Description of the data and file structure
This dataset includes curated orthologous domain alignments (FASTA) used to calculate the NTME neighborhoods described in our manuscript. The datasets are input to the Generalhamiltonian.m script found in the original SEEC model (https://github.com/AlbertodelaPaz/SEEC), with the calculated Hamiltonian energies to be used during NxE simulations. Additional information can be found here: https://github.com/Sarah-Chung/NxE.
Files and variables
File: SI_NxE-Orthodomain-Alignments-2.zip
Description:
Fasta files for orthodomains from the following domain families are included:
| Domain family | Length (L) of domain | Number of ortho-domains |
|---|---|---|
| PF00001 (7tm_1) | 268 | 80 |
| PF00004 (AAA) | 132 | 42 |
| PF00005 (ABC_tran) | 137 | 46 |
| PF00041 (fn3) | 85 | 34 |
| PF00153 (Mito_carr) | 94 | 29 |
| PF00271 (Helicase_C) | 111 | 79 |
File: NxE.diff
Relevant code modifications to the original SEEC model (https://github.com/AlbertodelaPaz/SEEC) for NxE simulations. More information about these modifications can be found here: https://github.com/Sarah-Chung/NxE.
NxE simulations were performed in MATLAB R2024a with the following toolboxes:
- Bioinformatics Toolbox
- Statistics and Machine Learning Toolbox
Additional datasets
Potts Hamiltonian parameters for NxE simulations were obtained from the following sources:
- de la Paz, J. A., Nartey, C. M., Yuvaraj, M., & Morcos, F. Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Dryad. doi: 10.5061/dryad.2ngf1vhj8 (2020).
References
SEEC paper
de la Paz, J. A., Nartey, C. M., Yuvaraj, M., & Morcos, F. Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1913071117 (2020)
SEEC repository
de la Paz, J. A., Nartey, C. M., Yuvaraj, M., & Morcos, F. Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. GitHub. https://github.com/AlbertodelaPaz/SEEC (2020).
Potts Hamiltonian parameters
de la Paz, J. A., Nartey, C. M., Yuvaraj, M., & Morcos, F. Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Dryad. doi: 10.5061/dryad.2ngf1vhj8 (2020).
