Phylogenomic datasets are illuminating many areas of the Tree of Life. However, the large size of these datasets alone may be insufficient to resolve problematic nodes in the most rapid evolutionary radiations, because inferences in zones of extraordinarily low phylogenetic signal can be sensitive to the model and method of inference, as well as the information content of loci employed. We used a dataset of >3,950 ultraconserved element (UCE) loci from a classic mammalian radiation, ground-dwelling squirrels of the tribe Marmotini (Sciuridae: Xerinae), to assess sensitivity of phylogenetic estimates to varying per-locus information content across 4 different inference methods (RAxML, ASTRAL, NJst, SVDquartets). Persistent discordance was found in topology and bootstrap support between concatenation- and coalescent-based inferences; among methods within the coalescent framework; and within all methods in response to different filtering scenarios. Contrary to some recent empirical UCE-based studies, filtering by information content did not promote complete among-method concordance. Nevertheless, filtering did improve concordance relative to randomly selected locus sets, largely via improved consistency of two-step summary methods (particularly NJst) under conditions of higher average per-locus variation (and thus increasing gene tree precision). The benefits of dataset filtering are notably variable among classes of inference methods and across different evolutionary scenarios, reiterating the complexities of resolving rapid radiations, even with robust taxon and character sampling.
Supplementary Appendix S1
.csv file containing taxonomic identities, institutional IDs, collection year, and number of UCE contigs assembled for all samples sequenced in this study data fields are synonymized with standard Darwin Core terms (https://github.com/tdwg/dwc) where possible
Supplementary Fig. S1
Barplots displaying changes across empirically filtered (‘Information-Filtered’) and randomly selected (‘Randomly Selected 1-5’) datasets in 3 different metrics; a) total numbers of sites analyzed, b) total numbers of variable sites analyzed, and c) total numbers of parsimony-informative sites analyzed. The grey box at far left represents the baseline UCE dataset. Boxes of the same color contain identical numbers of loci. Cumulatively, the plots show that changes in our metric of information content (proportion of variable sites) due to changes in absolute numbers of variable sites (i.e., panel b) and not change in locus length or other properties.
Supplementary Fig. S2
Relationships of UCE variability (proportion of variable sites per locus (i.e., our metric)) with alternative metrics of information content: a) the proportion of parsimony sites per locus, b) the mean bootstrap support found in corresponding gene trees, and c) the proportion of gene tree nodes resolved with >70% bootstrap support.
Supplementary Fig. S3
Pairwise changes in Robinson-Foulds distances among 4 inference methods across the filtering regimes employed in this study. Base = the original baseline UCE dataset.
UCE alignments - baseline dataset
Trimmed UCE alignments in fasta format. This is the “baseline” dataset as described in the text. These same data are available on GenBank.
UCE_alignments_baseline_dataset.zip
raxml_empirical_trees
RAxML trees from empirically filtered datasets with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset
astral_empirical_trees
ASTRAL species trees from empirically filtered datasets with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset
njst_empirical_trees
NJst species trees from empirically filtered datasets labeled with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset
svdquartets_empirical_trees
SVDQuartets species trees from empirically filtered datasets with support values from 100 bootstrap replicates and rooted on Aplodontia rufa. Trees are named with the filtering level used (e.g., 5var = at least 5 percent variable sites per locus) followed by the number of resulting UCE loci in each dataset