Data from: Detecting macroevolutionary genotype-phenotype associations using error-corrected rates of protein convergence
Data files
Oct 10, 2022 version files 11.54 GB
-
animal_genomes.zip
7.97 GB
-
animal_highly_repetitive_convergence.zip
3.28 GB
-
empirical_convergence.zip
238.34 MB
-
highly_repetitive_C4_convergence.zip
2.32 MB
-
mitochondrial_genome_selection.zip
1.03 MB
-
README.txt
20.12 KB
-
simulations.zip
7.56 MB
-
transcriptome_assembly_cds.zip
39.98 MB
Abstract
On macroevolutionary timescales, extensive mutations and phylogenetic uncertainty mask the signals of genotype-phenotype associations underlying convergent evolution. To overcome this problem, we extended the widely used framework of nonsynonymous-to-synonymous substitution rate ratios and developed the novel metric ωC, which measures the error-corrected convergence rate of protein evolution. While ωC distinguishes natural selection from genetic noise and phylogenetic errors in simulation and real examples, its accuracy allows an exploratory genome-wide search of adaptive molecular convergence without phenotypic hypothesis or candidate genes. Using gene expression data, we explored over 20 million branch combinations in vertebrate genes and identified the joint convergence of expression patterns and protein sequences with amino acid substitutions in functionally important sites, providing hypotheses on undiscovered phenotypes. We further extended our method with a heuristic algorithm to detect highly repetitive convergence among computationally nontrivial higher-order phylogenetic combinations. Our approach allows bidirectional searches for genotype-phenotype associations, even in lineages that diverged for hundreds of millions of years.