Skip to main content

Speciation and gene flow across an elevational gradient in New Guinea kingfishers

Cite this dataset

Linck, Ethan; Freeman, Benjamin; Dumbacher, John (2021). Speciation and gene flow across an elevational gradient in New Guinea kingfishers [Dataset]. Dryad.


Closely related species with parapatric elevational ranges are ubiquitous in tropical mountains worldwide. The gradient speciation hypothesis proposes that these series are the result of in situ ecological speciation driven by divergent selection across elevation. Direct tests of this scenario have been hampered by the difficulty inferring the geographic arrangement of populations at the time of divergence. In cichlids, sticklebacks, and Timema stick insects, support for ecological speciation driven by other selective pressures has come from demonstrating parallel speciation, where divergence proceeds independently across replicated environmental gradients. Here, we take advantage of the unique geography of the island of New Guinea to test for parallel gradient speciation in replicated populations of Syma kingfishers that show extremely subtle differentiation across elevation and between historically isolated mountain ranges. We find that currently described high elevation and low elevation species have reciprocally monophyletic gene trees and form nuclear DNA clusters, rejecting this hypothesis. However, demographic modeling suggests selection has likely maintained species boundaries in the face of gene flow following secondary contact. We compile evidence from the published literature to show that while in situ gradient speciation in labile organisms such as birds appears rare, divergent selection and post-speciation gene flow may be an underappreciated force in the origin of elevational series and tropical beta diversity along mountain slopes.


Morphological data: We measured bill length, bill depth, tarsus, wing chord, and tail length from 72 museum specimens of Syma torotoro (n=30), S. (t.) ochracea (n=10), and S. megarhyncha (n=32) at the American Museum of Natural History, collected from 1894-1965 and including only individuals of known sex as originally identified by the preparator. 

Bioacoustic data: We downloaded all available vocalizations from S. torotoro (n=34) and S. megarhyncha (n=14) from xeno-canto and Cornell’s Macaulay Library. We filtered these data for quality and quantified 36 standard bioacoustic variables using the warbleR package v. 1.1.14 in R (Araya-Salas & Smith-Vidaurre, 2017), resulting in output representing 278 distinct vocalizations from S. torotoro and 106 from S. megarhyncha. 

Genomic data: We extracted DNA from fresh tissues (n=6) and toepad samples from historical museum specimens (n=34) from 28 individuals of S. torotoro, 2 individuals of S. (t.) ochracea, and 10 individuals of S. megarhyncha (n=10), including 3 individuals collected in 1928 by Ernst Mayr himself. Though only partially overlapping with samples used in morphometric analyses, these individuals represented the full extent of both species’ ranges in New Guinea and Australia and included all described subspecies. We extracted DNA from fresh tissues using a Qiagen DNAeasy kit and the manufacturer’s recommended protocol. For historical toepad samples (collected 1877-1973), we extracted DNA using either a using a phenol-chloroform and centrifugal dialysis method (Dumbacher & Fleischer, 2001) (for reduced representation sequencing) or a standard salt extraction protocol (for whole genome sequencing). Due to constraints of cost and time, we employed two complementary sequencing approaches. On a subset of samples (n=20), we performed reduced representation genome sequencing using a hybridization capture with RAD probes (hyRAD) approach, described in detail elsewhere (Linck, Hanna, Sellas, & Dumbacher, 2017). We sent the remaining samples to the UC Berkeley’s Vincent J. Coates Genomic Sequencing Laboratory, where laboratory staff prepared genomic libraries for low coverage whole genome sequencing (WGS) using Illumina TruSeq Nano kits and a modified protocol that enzymatically repaired fragments with RNAse and skipped sonication. They then pooled (n=20) and sequenced these samples with 150 base pair paired end reads on a single lane of an Illumina HiSeq 4000. We processed demultiplexed reads from both sequencing strategies together with a custom bioinformatic pipeline optimized for handling degraded DNA data and available at Briefly, we trimmed raw reads for adapters and low-quality bases using bbduk from BBTools version 38.06 suite of bioinformatics tools. We aligned these reads to an unpublished draft genome of Woodland Kingfisher Halcyon senegalensis from the Bird 10K Genome Project using bbmap with a k-mer value of 12, a maximum indel length of 200 bp, and a minimum sequence identity of 0.65. These highly sensitive alignment parameters were necessary given the clade containing Syma diverged from the clade containing Halcyon approximately 15 mya (Andersen, McCullough, Mauck, Smith, & Moyle, 2018). We used PicardTools v. 2.17.8 and GATK v. 3.6.0 (McKenna et al., 2010) to append read groups and perform local realignment on .bam files. We then used mapDamage 2.0.9 to account for postmortem damage to DNA from historical museum specimens by rescaling quality scores (Jónsson, Ginolhac, Schubert, Johnson, & Orlando, 2013). We performed multisample variant calling using the UnifiedGenotyper tool in GATK v. 3.6.0, and filtered our variant calls for missing data, coverage, and quality with VCFtools 0.1.16 (Danecek et al., 2011). We generated multiple SNP datasets by implementing different filters to suit the requirements and goals of different analyses; these settings and their underlying rationale are described in detail below. To complement our nuclear DNA sequences, we assembled near-complete and fully annotated mitochondrial genomes from a majority of individuals using mitofinder v. 1.0.2 (Allo et al., 2020) and a complete mtDNA genome from close relative Todiramphus sanctus as a reference (Andersen et al., 2015). We then extracted NADH dehydrogenase 2 (ND2) from these genomes and performed multiple sequence alignment using MAFFT v7.407 under its “–auto” parameter setting package (Katoh & Standley, 2013). 

Phylogenetic data: We evaluated phylogenetic relationships among using our mitochondrial DNA (ND2) alignment, which was near-complete and did not feature correlations between missing data and library preparation method or tissue type. We inferred a time-calibrated phylogeny in BEAST 2.6.0 (Bouckaert et al., 2019) from the 37 samples of Syma torotoro (n=24), S. (t.) ochracea (n=3), and S. megarhyncha (n=10) with sufficiently high quality ND2 sequence data, including Sacred Kingfisher Todiramphus sanctus as an outgroup (Andersen et al., 2015). We used a strict molecular clock with a rate of 2.9x10−8 following Lerner, Meyer, James, Hofreiter, & Fleischer (2011), a GTR+GAMMA model of nucleotide evolution, and a Yule species tree prior. We ran BEAST for 50 million generations, storing trees every 1000 generations, and assessed mixing, likelihood stationarity and adequate effective sample sizes (ESS) above 200 for all estimated parameters using Tracer v.1.7.1 (Rambaut, Drummond, Xie, Baele, & Suchard, 2018). We then generated a maximum clade credibility tree using TreeAnotator v.2.6.0, discarding the first 10% of trees as burnin (Bouckaert et al., 2019). 

Usage notes

Uploaded data can be used in conjunction with scripts in the publication's GitHub repository ( to recreate all figures and results; the GitHub repository also features additional intermediary data outputs not included here. Zipped archive of all cleaned morphological (morphology.csv), bioacoustic (params.csv), and specimen locality data (syma_spp_master.csv). Compressed VCF file used for PCA and clustering analyses. 

syma_ND2.tree: Ultrametric NADH dehydrogenase 2 (ND2) gene tree from BEAST. 

syma_ND2_final.xml: BEAST XML document with ND2 sequence alignment and parameter choices.

syma_ND2_final.xml: BEAST XML document with ND2 sequence alignment and parameter choices.

syma_ND2_full_msa.fasta: Multiple sequence alignment of ND2 data used to infer the ultrametric gene tree. 

SRA_accessions.txt: NCBI Sequence Read Archive accession numbers for sequence data. 


National Science Foundation, Award: 1701224

National Science Foundation, Award: 0108247

United States Department of Defense, Award: NDSEG Fellowship