Skip to main content

Experimental evolution of ancestrally reconstructed BCL2 family proteins

Cite this dataset

Metzger, Brian et al. (2021). Experimental evolution of ancestrally reconstructed BCL2 family proteins [Dataset]. Dryad.


The roles of chance, contingency, and necessity in evolution is unresolved, because they have never been assessed in a single system or on timescales relevant to historical evolution. We combined ancestral protein reconstruction and a new continuous evolution technology to mutate and select B-cell-lymphoma-2-family proteins to acquire protein-protein-interaction specificities that occurred during animal evolution. By replicating evolutionary trajectories from multiple ancestral proteins, we found that contingency generated over long historical timescales steadily erased necessity and overwhelmed chance as the primary cause of acquired sequence variation; trajectories launched from phylogenetically distant proteins yielded virtually no common mutations, even under strong and identical selection pressures. Chance arose because many sets of mutations could alter specificity at any timepoint; contingency arose because historical substitutions changed these sets. Our results suggest that patterns of variation in BCL-2 sequences – and likely other proteins, too – are idiosyncratic products of a particular, unpredictable course of historical events.


Illumina sequencing yielded 22 million total and 13 million identified reads. The number of reads for each library are listed in Supplemental Table 13. We used Trim Galore to trim the primary sequences ( BBMerge was then used to merge the paired sequences ( Next, we used the Clumpify script in the BBMap package to remove repeated sequences. After this, Seal was used to separate the three fragments of each library based on the library sequences. Finally, we used BBDuk to remove the sequence of the library construction primers. Reads were then binned by experiment and aligned to the appropriate WT sequence using Geneious (low Sensitivity, 5 iterations, gaps allowed). Sequences were then processed in R to remove sequences containing Ns or that were not full length. Insertions found in less than 1% of the population and sites that extended outside of the coding region were also removed from all sequences. Remaining gaps were then standardized among replicates and within an experiment. Finally, allele frequencies were calculated for each site and amino acid, as well as remaining insertions and deletions.


National Institute of General Medical Sciences, Award: R01GM131128

National Science Foundation, Award: DGE-1746045