Ultraconserved elements support the elevation of a new avian family, Eurocephalidae, the white-crowned shrikes
Data files
May 10, 2023 version files 19.23 MB
-
avian-uce-baits.v2.tissues.STRICT.fasta
-
Eurocephalus47_complete_RAxML.tre
-
Eurocephalus47_complete_SVD.tre
-
Eurocephalus47_mafft_trim_complete_clean_nexus_loci.zip
-
Eurocephalus47_timetree.tre
-
index-map.xlsx
-
README.md
-
TableS1_Sampling.xlsx
-
TableS2_BEASTdatasets.xlsx
-
TableS3_Pdist.xlsx
Abstract
In this study, we infer genus-level relationships within shrikes (Laniidae), crows (Corvidae), and their allies using ultraconserved elements (UCEs). We confirm previous results of the Crested Shrikejay (Platylophus galericulatus) as comprising its own taxonomic family and find strong support for its sister relationship to laniid shrikes. We also find strong support that the African-endemic genus Eurocephalus, which comprises two allopatric species (E. ruppelli and E. anguitimens), are not shrikes. We propose elevating the white-crowned shrikes to their own family, Eurocephalidae.
Methods
Sampling and laboratory methods
Following the taxonomy of Gill et al. (2022), we collected genus-level UCE data for Corvidae, Laniidae, and their allies (Supplementary Material Table 1). In total, our dataset comprised 47 samples and every genus within Laniidae and Corvidae. We downloaded raw reads from recent, higher-level UCE studies of passerines (n=23; (Moyle et al. 2016, Oliveros et al. 2019, McCullough et al. 2022). For all other samples (n=24), we extracted DNA from ethanol- or frozen-preserved tissue samples loaned from natural history collections. However, for some rarely collected taxa, we relied on specimen toepad clippings (14% of samples, denoted in Supplementary Material Table 1). Because this source of genetic material is well known to be degraded, we treated samples derived from frozen or ethanol preserved samples differently than those derived from museum toepad clippings. For tissue-derived samples, we used the Qiagen DNeasy kit to extract genomic DNA and estimated fragment size using gel electrophoresis. For toepad-derived samples, we used a phenol-chloroform DNA extraction with gel phase-lock tubes. This method has been shown to produce higher yields of DNA than silica columns of Qiagen kits (Tsai et al. 2019). Prior to library prep, we quantified DNA concentrations with a Qubit 3.0 Fluorometer (ThermoFisher Scientific) for all samples.
We followed established protocols for library preparation and target capture of UCEs (Faircloth et al. 2012, McCormack et al. 2016). We made toepad-specific modifications to improve yields, following McCormack et al. (2016); these included increasing concentrations of AMPure bead (Beckman Coulter) cleanups to 3X, extending ligation times, and using Eppendorf Lo-Bind tubes to increase retention of DNA. We pooled tissue (8 libraries per pool) and toepad-derived (5–6 per pool) libraries separately for UCE enrichment. We used an updated version of the Arbor Biosciences MYbaits kit for Tetrapods UCE-5Kv2b probe set, which includes a larger number of baits per locus designed from both chicken and Zebra Finch genomes. We hybridized probes at either 65ºC (tissues) or 62ºC (toepads) for 24 hours. We sequenced samples either on an Illumina HiSeq 2500 System at the University of Kansas Genome Sequencing Core or an Illumina NovaSeq 6000 at the Oklahoma Medical Research Foundation (sequencing technology is denoted in Supplementary Material Table 1).
Data processing and phylogenetic analyses
To process UCE data, we used the Phyluce v1.7.0 (Faircloth 2016); described in full at https://github.com/faircloth-lab/phyluce) Python package with the University of New Mexico’s Center for Advanced Computing (CARC) cluster. We trimmed adaptor sequences and low quality sites from demultiplexed raw reads with Illumiprocessor v2.1 (Faircloth 2013, Bolger et al. 2014), assembled clean reads into contigs with Spades v1.7 (Prjibelski et al. 2020), and extracted UCE loci with this updated probe set.
During initial phylogenetic exploration, we identified five toepad-derived (two Temnurus and three Zavattariornis) samples that had extraordinary long branches in concatenated RAxML analyses (see below for more depth on these phylogenetic methods and Supplementary Material Table 1 for these problematic samples). These long branches are biologically unlikely and this problem has been shown to be an artifact of poor trimming and ‘dirty ends’ of UCE loci from degraded toepad-sourced samples (Smith et al. 2020). To remove these problematic artifacts that contribute to artificially long branches, we followed the bioinformatic pipeline by Smith et al. (2020). In detail, we identified our reference samples by expanding fastas (with phyluce_assembly_explode_get_fastas_file) and chose a closely related, tissue-derived sample. We chose Platysmurus atterimus (LSU B58660) as a reference for Temnurus samples and Podoces hendersoni (KU 20444) for Zavattarriornis samples based on an initial concatenated RAxML analyses (Kozlov et al. 2019). With bwa, SAMtools, and GNU parallel (Li and Durbin 2009, Li et al. 2009, Tange 2021), we indexed the reference samples (UCE contigs) and aligned cleaned reads of the problematic toepad-derived samples to these references. To remove the low quality data in the flanking regions that were contributing to spurious inferences, we dropped sites with less than 5x coverage and quality scores < 20. We incorporated these cleaned samples back into the pipeline with the other samples by manually adding 1) the nucleotide data into the combined, unaligned fasta file, 2) the names of samples into the .conf file, and 3) adding the list of loci for each sample into the incomplete matrix conf file; all these files are originally produced from the phyluce_assembly_get_fastas_from_match_counts phyluce script. Together with these cleaned toepad-based samples, we aligned all 47 samples with MAFFT (Katoh and Standley 2013) without initially trimming. Instead, we used TrimAl v1.4.rev15 (Capella-Gutiérrez et al. 2009) to trim UCE loci with the “-automated1” flag. Finally, we produced a 100%-complete matrix, in which all samples are present at each UCE locus.
We implemented both concatenated maximum likelihood (ML) and species tree methods. We estimated the maximum likelihood tree with RAxML-ng v 1.0.1 (Kozlov et al. 2019) and evaluated support with bootstrap replicates with the autoMRE function (set to 100 BS). We accounted for gene tree heterogeneity with SVDquartets (Chifman and Kubatko 2014) implemented in Paup*v4.0a166 (Swofford 2003). SVDquartets is a concatenated quartet method that does not rely on individual gene trees and has recently been shown to perform better for large multilocus datasets than other coalescent based tree-building programs (Wascher and Kubatko 2021). We analyzed all possible quartets (n=169,661 quartets) and performed 100 bootstrap replicates to assess nodal support.
Time calibration
To infer a time-calibrated tree, we used BEAST v2.6.7 (Bouckaert et al. 2014). We created six randomized subsets of 50 UCE loci each without replacement (a total of 300 loci) from our complete matrix (Supplementary Material Table 2). We ran two independent MCMC chains per dataset for 10 million generations, sampling every 5,000 generations. We used a relaxed log normal clock, a birth-death tree prior, and assigned the HKY+G sequence model to each UCE locus. We constrained the BEAST topology to the RAxML-inferred topology using a multi-monophyletic constraint prior. We used two secondary calibrations from Oliveros et al. (2019) to date the phylogeny. This comprehensive phylogeny used 13 fossil calibrations to date a family-level UCE tree of all songbirds. For this study, we assigned a normal distribution for the split between Rhipidura and the rest of our sampled taxa with a mean date of 18.93 Ma (confidence interval = 22.0–15.9, standard deviation (sigma value) of 1.5). The second calibration point was assigned to the split between Dicrurus and all other taxa with a normal distribution and a mean date of 18.46 Ma (CI = 21.4–15.6, sigma value of 1.5). We visualized posterior estimates in Tracer v 1.7.1 (Rambaut et al. 2018) to assess convergence of chains and that ESS values were >200. We discarded the first 25% of trees as burn-in and first combined the two runs from each dataset, then we combined those six tree files into one final maximum clade credibility (MCC) tree using TreeAnnotator v 2.6.7 (Bouckaert et al. 2014).
ND2 P-distances
To assess relative divergences between major clades within our dataset, we compared uncorrected p-distances in the mitochondrial ND2 gene for representative taxa. We used Mitofinder v 1.4 (Allio et al. 2020) to extract mitochondrial genomes from cleaned UCE reads using a complete mitochondrial genome of Corvus corax as a reference (PRJNA321255; Johnsen et al. 2017). Next, we extracted ND2 from these mitochondrial genomes and used PAUP* v 4a168 (Swofford 2003) to generate P-distances.
Usage notes
fasta and Nexus files can be opened in a text editor, like BBedit. ".tre" files can be viewed in FigTree.