Skip to main content
Dryad

Data from: Uneven missing data skew phylogenomic relationships within the lories and lorikeets

Cite this dataset

Smith, Brian; Mauck III, William M.; Benz, Brett W.; Andersen, Michael J. (2022). Data from: Uneven missing data skew phylogenomic relationships within the lories and lorikeets [Dataset]. Dryad. https://doi.org/10.5061/dryad.n5tb2rbsp

Abstract

Inlcuded is the supplementary data for Smith, B. T., Mauck, W. M., Benz, B., & Andersen, M. J. (2018). Uneven missing data skews phylogenomic relationships within the lories and lorikeets. BioRxiv, 398297. 

The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9x more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

Usage notes

Included in the compresed file is supplementary data with four main directories: Filtered, Low_Coverage, final_consensus_trees_ and scripts. The Filtered and Low_Coverage directories contain the concatenated alignments and partition files for each likelihood threshold score. These directories also include the likelihodd scores for each of the six clades. The final_consensus_trees directories contain the consensus trees from the major phylogenetic analyses in the paper. The scripts directory contains two scripts for formatting concatenated alignments. Readme files are included in each subdirectory.

Funding

National Science Foundation, Award: DEB-1655736

National Science Foundation, Award: DEB‐1557051