Simulated trait data for convergence analyses
Data files
May 31, 2024 version files 24.38 MB
Abstract
Tests of phenotypic convergence can provide evidence of adaptive evolution, and the popularity of such studies has grown in recent years due to the development of novel, quantitative methods for identifying and measuring convergence. These methods include the commonly applied C1–C4 measures of Stayton (2015), which measure morphological distances between lineages, and Ornstein-Uhlenbeck (OU) model-fitting analyses, which test whether lineages converged on shared adaptive peaks. We test the performance of C-measures and other convergence measures under various evolutionary scenarios and reveal a critical issue with C-measures: they often misidentify divergent lineages as convergent. We address this issue by developing novel convergence measures— Ct1–Ct4-measures —that calculate distances between lineages at specific points in time, minimizing the possibility of misidentifying divergent taxa as convergent. Ct-measures are most appropriate when focal lineages are of the same or similar geologic ages (e.g., extant taxa), meaning that the lineages’ evolutionary histories include considerable overlap in time. Beyond C-measures, we find that all convergence measures are influenced by the position of focal taxa in phenotypic space, with morphological outliers often statistically more likely to be measured as strongly convergent. Further, we mimic scenarios in which researchers assess convergence using OU models with a priori regime assignments (e.g., classifying taxa by ecological traits) and find that multiple-regime OU models with phenotypically divergent lineages assigned to a shared selective regime often outperform simpler models. This highlights that model support for these multiple-regime OU models should not be assumed to always reflect convergence among focal lineages of a shared regime. Our new Ct1–Ct4-measures provide researchers with an improved comparative tool, but we emphasize that all available convergence measures are imperfect, and researchers should recognize the limitations of these methods and use multiple lines of evidence to test convergence hypotheses.
README: Simulated trait data for convergence & divergence analyses
https://doi.org/10.5061/dryad.34tmpg4sh
This Dryad submission includes many simulated datasets and a Wolfram's Mathematica notebook with a custom script used to produce the simulation. Each dataset includes six simulated phenotypic trait values for 201 taxa. For 13 focal taxa (representing five monophyletic clades) for which we simulated convergence or divergence, some or all traits (between two and six of the six total) were evolved to trait optima (targets) via an Ornstein-Uhlenbeck (OU) process (alpha = 0.1, sigma = 0.1). For convergence simulations, the 13 focal taxa were evolved to the same trait optimum. For divergence simulations (file names start with "divBase"), the five clades were evolved to different optima. See our included Mathematica notebook and our associated publication for more details on how the traits were simulated.
Description of the data and file structure
We produced 15 simulated 'base' datasets in which the six traits of most taxa were evolved by Brownian motion (BM). For each base dataset, we produced 45 spreadsheets in which the traits of 13 focal taxa were sequentially altered. The base dataset number is indicated in the spreadsheet file names by "Base=XX". The traits of the focal taxa vary in two ways. First, they evolved to trait optima (targets) of varying distances away from the ancestral traits (zero). The target distance is indicated in the spreadsheet file name by "Target=XX", with targets varying between 0 and 80 in increments of 10. Second, we altered the number of traits (of six total) that were OU-evolved toward trait optima. This varies between three and six traits, with the value indicated in the file name by "Traits=XX". Any traits that were not OU-evolved were evolved by BM. Each spreadsheet has a column of taxon names (representing 201 mammalian species) and six columns of simulated trait values.
Sharing/Access information
Contact the corresponding authors of the associated publication for more information.
Code/Software
For scripts to produce similar simulated datasets, see our supplemental Wolfram's Mathematica notebook and sample R script in the Supplemental Information file.
Methods
This Dryad submission includes many simulated datasets and a Wolfram's Mathematica notebook with a custom script used to produce the simulations. See the README file, included Mathematica file, and associated publication for more information.