Phylogenomic datasets are illuminating many areas of the Tree of Life. However, the large size of these datasets alone may be insufficient to resolve problematic nodes in the most rapid evolutionary radiations, because inferences in zones of extraordinarily low phylogenetic signal can be sensitive to the model and method of inference, as well as the information content of loci employed. We used a dataset of >3,950 ultraconserved element (UCE) loci from a classic mammalian radiation, ground-dwelling squirrels of the tribe Marmotini (Sciuridae: Xerinae), to assess sensitivity of phylogenetic estimates to varying per-locus information content across 4 different inference methods (RAxML, ASTRAL, NJst, SVDquartets). Persistent discordance was found in topology and bootstrap support between concatenation- and coalescent-based inferences; among methods within the coalescent framework; and within all methods in response to different filtering scenarios. Contrary to some recent empirical UCE-based studies, filtering by information content did not promote complete among-method concordance. Nevertheless, filtering did improve concordance relative to randomly selected locus sets, largely via improved consistency of two-step summary methods (particularly NJst) under conditions of higher average per-locus variation (and thus increasing gene tree precision). The benefits of dataset filtering are notably variable among classes of inference methods and across different evolutionary scenarios, reiterating the complexities of resolving rapid radiations, even with robust taxon and character sampling.

README

Supplementary Appendix S1

.csv file containing taxonomic identities, institutional IDs, collection year, and number of UCE contigs assembled for all samples sequenced in this study data fields are synonymized with standard Darwin Core terms (https://github.com/tdwg/dwc) where possible

Supplementary Fig. S1

Barplots displaying changes across empirically filtered (‘Information-Filtered’) and randomly selected (‘Randomly Selected 1-5’) datasets in 3 different metrics; a) total numbers of sites analyzed, b) total numbers of variable sites analyzed, and c) total numbers of parsimony-informative sites analyzed. The grey box at far left represents the baseline UCE dataset. Boxes of the same color contain identical numbers of loci. Cumulatively, the plots show that changes in our metric of information content (proportion of variable sites) due to changes in absolute numbers of variable sites (i.e., panel b) and not change in locus length or other properties.

Supplementary Fig. S2

Relationships of UCE variability (proportion of variable sites per locus (i.e., our metric)) with alternative metrics of information content: a) the proportion of parsimony sites per locus, b) the mean bootstrap support found in corresponding gene trees, and c) the proportion of gene tree nodes resolved with >70% bootstrap support.

Supplementary Fig. S3

Pairwise changes in Robinson-Foulds distances among 4 inference methods across the filtering regimes employed in this study. Base = the original baseline UCE dataset.

UCE alignments - baseline dataset

Trimmed UCE alignments in fasta format. This is the “baseline” dataset as described in the text. These same data are available on GenBank.

UCE_alignments_baseline_dataset.zip

raxml_empirical_trees

RAxML trees from empirically filtered datasets with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset

astral_empirical_trees

ASTRAL species trees from empirically filtered datasets with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset

njst_empirical_trees

NJst species trees from empirically filtered datasets labeled with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset

svdquartets_empirical_trees

SVDQuartets species trees from empirically filtered datasets with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset

Data from: Impacts of inference method and dataset filtering on phylogenomic resolution in a rapid radiation of ground squirrels (Xerinae: Marmotini)

Data files

Abstract

README

Supplementary Appendix S1

Supplementary Fig. S1

Supplementary Fig. S2

Supplementary Fig. S3

UCE alignments - baseline dataset

raxml_empirical_trees

astral_empirical_trees

njst_empirical_trees

svdquartets_empirical_trees

Data from: Impacts of inference method and dataset filtering on phylogenomic resolution in a rapid radiation of ground squirrels (Xerinae: Marmotini)

Data files

Abstract

Usage notes

README

Supplementary Appendix S1

Supplementary Fig. S1

Supplementary Fig. S2

Supplementary Fig. S3

UCE alignments - baseline dataset

raxml_empirical_trees

astral_empirical_trees

njst_empirical_trees

svdquartets_empirical_trees

Works referencing this dataset