Skip to main content
Dryad

Data from: Detecting macroevolutionary genotype-phenotype associations using error-corrected rates of protein convergence

Cite this dataset

Fukushima, Kenji; Pollock, David (2022). Data from: Detecting macroevolutionary genotype-phenotype associations using error-corrected rates of protein convergence [Dataset]. Dryad. https://doi.org/10.5061/dryad.tx95x6b0v

Abstract

On macroevolutionary timescales, extensive mutations and phylogenetic uncertainty mask the signals of genotype-phenotype associations underlying convergent evolution. To overcome this problem, we extended the widely used framework of nonsynonymous-to-synonymous substitution rate ratios and developed the novel metric ωC, which measures the error-corrected convergence rate of protein evolution. While ωC distinguishes natural selection from genetic noise and phylogenetic errors in simulation and real examples, its accuracy allows an exploratory genome-wide search of adaptive molecular convergence without phenotypic hypothesis or candidate genes. Using gene expression data, we explored over 20 million branch combinations in vertebrate genes and identified the joint convergence of expression patterns and protein sequences with amino acid substitutions in functionally important sites, providing hypotheses on undiscovered phenotypes. We further extended our method with a heuristic algorithm to detect highly repetitive convergence among computationally nontrivial higher-order phylogenetic combinations. Our approach allows bidirectional searches for genotype-phenotype associations, even in lineages that diverged for hundreds of millions of years.

Funding

Japan Society for the Promotion of Science, Award: 18J00178

Alexander von Humboldt Foundation, Award: Sofja Kovalevskaja programme

International Human Frontier Science Program Organization, Award: RGY0082/2021

National Institute of General Medical Sciences, Award: GM083127