Skip to main content

Data sets and analyses for genomics of cryptic speciation in Catharus thrushes

Cite this dataset

Edwards, Scott; Termignoni-Garcia, Flavia; Kirchman, Jeremy (2021). Data sets and analyses for genomics of cryptic speciation in Catharus thrushes [Dataset]. Dryad.


Cryptic speciation may occur when reproductive isolation is recent or the accumulation of morphological differences between sister lineages is slowed by stabilizing selection preventing phenotypic differentiation. In North America, Bicknell’s Thrush (Catharus bicknelli) and its sister species, the Gray-cheeked Thrush (Catharus minimus), are parapatrically breeding migratory songbirds, distinguishable in nature only by subtle differences in song and coloration, and were recognized as distinct species only in the 1990s. Previous molecular studies have estimated that the species diverged ~120 - 420 thousand YBP and found very low levels of introgression despite their similarity and sympatry in the spring (prebreeding) migration.  To further clarify the history, genetic divergence, genomic structure and adaptive processes in C. bicknelli and C. minimus, we sequenced and assembled high-coverage reference genomes of both species and re-sequenced genomes from population samples of C. bicknelli, C. minimus, and two individuals of the Swainson’s Thrush (C. ustulatus). The genome of C. bicknelli exhibits markedly higher abundances of transposable elements compared to other Catharus and chicken. Demographic and admixture analyses confirm moderate genome-wide differentiation (Fst 0.10) and limited gene flow between C. bicknelli and C. minimus, but suggest a more recent divergence than estimates based on mtDNA. We find evidence of rapid evolution of the Z-chromosome and elevated divergence consistent with natural selection on genomic regions near genes involved with neuronal processes in C. bicknelli. These genomes are a useful resource for future investigations of speciation, migration, and adaptation in Catharus thrushes.


For all individuals, we sheared each DNA sample to a length of 300bp by sonication (Covaris 2200) and prepared paired-end libraries on the automated Apollo 324 NGS library preparation system using the PrepX ILM 32i protocol and sequenced to ~15X coverage on four lanes of the Illumina HiSeq platform. Read quality was assessed with FastQC (Andrews 2012) and adapter trimming was performed with Trimmomatic. The assembly was produced with ALLPATHS-LG assembler with default settings but including haploidify flag as true.  To confirm the authenticity and phylogenetic position of the two reference genomes we used PHYLUCE (Faircloth 2016) to extract the 2111 ultra-conserved elements (UCEs) used by Everson et al. (2019) and compared them with the same loci those authors obtained from 10 species of Catharus, including multiple individuals of C. bicknelli and C. minimus.  We aligned UCEs with MAFFT  v7.4 (Katoh and Standley 2013) (with flags --maxiterate 1000 --localpair --adjustdirection ), trimmed with trimal v1.2 with the flag -automated1 (Capella-Gutierrez et al 2009), and checked for odd alignments with OD-seq (Jehl et al 2015) with default settings. We concatenated the UCE alignment with the perl script v0.9 ( with default settings and reconstructed the phylogeny with RAxML v8.2 (Stamatakis 2014) under a GTR+ɣ model and 1000 rapid bootstrap replicates.  We evaluated the repetitive fraction of the reference genomes of our two focal species and compared them with publicly available reference genomes of two other, closely related congeners: a Swainson’s Thrush (C. ustulatus) sequenced by the Vertebrate Genomes Project (GenBank accession number GCA_009819885.2) and a Veery (C. fuscescens) sequenced by the B10K consortium  (GCA_013398975.1; Fang et al. 2020). We estimated transposable element (TE) content with RepeatMasker v. 4.1.1 as implemented in RepeatModeler2 (Flynn et al. 2020), which requires a genome assembly, as well as dnaPipeTE (Goubert et al. 2015), which works on unassembled sequence reads. we used the RepBase-RepeatMasker database version 20181026, available on the GIRI web site (Jurka 2000). Wee augmented this library with improved annotations of repeat content (including olfactory receptor (OR) genes) using CENSOR (Kohany et al. 2006) and TEclass (Abrusán et al. 2009) annotations of TEs unannotated by dnaPipeTE. We performed a principal components analysis with SNPRelate (Zheng et al. 2012), and calculated ancestry proportions shared between individuals with ADMIXTURE (Alexander & Lange 2011). We used IMcoalHMM (Mailund et al. 2012) to estimate divergence time, effective population size, migration interval, and recombination rate under an isolation model and isolation with migration model, comparing ~10 Mbp regions from each pseudochromosome of both reference genomes.


New York State Museum

Harvard University

Dean's Competitive Fund for Promising Scholarship, Harvard University

Center for Forest Science Innovation

American Ornithological Society

American Museum of Natural History