Data from: The comparative analysis of lineage-pair traits
Data files
Sep 09, 2025 version files 458.94 KB
-
andersonetal.analysis.data.and.code.zip
454.61 KB
-
README.md
4.33 KB
Abstract
For many questions in ecology and evolution, the most relevant data to consider are attributes of lineage pairs. Comparative tests for causal relationships among traits like ‘diet niche overlap’, ‘divergence time’, and ‘strength of reproductive isolation (RI)’ – measured for pairwise combinations of related species or populations – have led to several groundbreaking insights, but the correct statistical approach for these analyses has never been clear. Lineage-pair traits are non-independent, but unlike the expected covariance among species’ traits, which is captured by a phylogenetic covariance matrix arising from a given model, the expected covariance among lineage-pair traits has not been explicitly formulated. Analyses of pairwise-defined data have thus employed untested workarounds for non-independence rather than direct models of lineage-pair covariance, with consequences that are unexplored. Here, we consider how evolutionary relatedness among taxa translates into non-independence among taxonomic pairs. We develop models by which phylogenetic signal in an underlying character generates covariance among pairs in a lineage-pair trait. We incorporate the resulting lineage-pair covariance matrices into modified versions of phylogenetic generalized least squares and a new phylogenetic beta regression for bounded response variables. Both outperform previous approaches in simulation tests. We find that a common heuristic method, node averaging, imparts a greater cost to model performance than does the non-independence it was designed to correct. We re-analyze two empirical datasets to find dramatic improvements in model fit and, in the case of avian hybridization data, an even stronger relationship between pair age and RI than is revealed from uncorrected analysis. We finally present a new tool, the R package phylopairs, that allows empiricists to test relationships among pairwise-defined variables in a way that is statistically robust and more straightforward to implement.
https://doi.org/10.5061/dryad.q83bk3jt7
Description of the data and file structure
This folder contains code and data files for conducting the analyses in Anderson et al., Systematic Biology, 10.1093/sysbio/syaf061.
Data files include raw data files as in the formats in which they were downloaded as well as the analysis-ready files that were created from the raw files. Code for converting raw data to analysis-ready formats is given in anderson.etal.datawrangling.R.
Files and variables
File: andersonetal.analysis.data.and.code.zip
Description: Within this zip file, find the following:
Downloaded Data Files:
- full_scaled_4d.tree: Drosophila tree from Kim et al. 2024 PLoS Biology
- birds_pulidoSC2016_MaxCladeCredTree.txt: All-birds phylogeny from Pulido-Santacruz and Weir 2016 Evolution.
- bouvoir_price.xls: Avian hybridization data from Price and Bouvier (2002)
- price_bouvier.csv: Avian hybridization data from Price and Bouvier (2002) converted to .csv format
- yukilevich.dros.csv: Fly hybridization data converted into .csv format
Analysis-Ready Files:
- yukilevich.dros.intree.csv: Fly hybridization dataset for which subspecies and geographic races have been removed, among other processing details (see the data wrangling R file for details).
- dros.ultra.rdata: Ultra metric relative-time version of the Kim et al. (2024) Drosophila tree.
- pb.in.tree.csv: Price-Bouvier data trimmed to contain just the crosses for which both species are included in the Pulido-Santacruz et al. phylogenetic tree.
Code files:
anderson.etal.datawrangling.R: Code for converting downloaded data files to analysis-ready files
anderson.etal.analyses.R: Code for conducting model performance tests and re-analyses of empirical bird and drosophila datasets.
Other files:
README.txt: a readme file that contains the information present in this Dryad readme description.
Data Contained in Each File:
For all datasets, a cell filled with "NULL" indicates that data were unmeasured or were not relevant for a given pair.
- full_scaled_4d.tree: a 'phylo' object containing species names and branch lengths for the Drosophila phylogenetic tree, first published by Kim et al. in Plos Biology (2024).
- birds_pulidoSC2016_MaxCladeCredTree.txt: a 'phylo' object containing species names and branch lengths for the all-birds phylogenetic tree, first published by Pulido-Santacruz and Weir (2016) in Evolution.
- bouvier_price.xls: data and meta-data for experimental crosses in birds, created by Price and Bouvier in 2002. Many of the variables those authors generated and are not analyzed in the present study. For the analyses of this study, the relevant data columns in the dataset are as follows:
- "Species 1": the genus and species of species 1 in each cross
- "Species 2": the genus and species of species 2 in each cross
- "Fertility Category": a categorical measure of fertility in the cross, with values ranging from 1 to 5.
- price_bouvier.csv: data contained in this file are the same as above, but they have been reformatted for easier use in downstream analyses.
- yukilevich.dros.csv: data from experimental crosses of drosophilid flies, downloaded from the https://life2.bio.sunysb.edu/drosophila/datatables.html, a resource curated by Roman Yukilevich and Fumio Aoki. This dataset contains numerous columns that we are not interested in and have not analyzed; we include the dataset here so that users can understand how we went from the downloaded data to our final results (via the code contained in the data wrangling files). The relevant data columns for the analyses in the present work are:
- "Sp.1": the name of species 1 in each cross
- "Sp2": the name of species 2 in each cross
- "sexual.isolation": a metric of reproductive isolation as described in Sobel and Chen (2014) Evolution.
- "Ecological.Distance": a categorical measure of ecological similarity / distinctiveness as measured and defined by Funk et al. (2006) PNAS.
- "allop....0...symp....1" categorical variable defining whether a given pair is geographically sympatric (value = 1) or allopatric (value = 0).
Code/software
R