Data from: Speciation with gene flow in an island endemic hummingbird
Data files
Apr 24, 2025 version files 7.63 MB
-
mtcr_museumID_label.fasta
71.91 KB
-
museumID_raw.analysis.SNPs.csv
7.18 MB
-
README.md
1.36 KB
-
SNP_Loci.fasta
382.57 KB
Abstract
We find evidence for speciation in streamertail hummingbirds (Trochilus polytmus and T. scitulus), Jamaican endemic taxa thought to represent an exception to the rule that, for birds, speciation cannot progress in situ on small islands. Our analysis shows that divergent selection acting on male bill color, a sexual ornament that is red in polytmus and black in scitulus, plays a pivotal role as a reproductive barrier. We sequenced 6,451 single nucleotide polymorphisms (SNPs) and the mitochondrial control region. Low genomic divergence was detected across the two datasets, consistent with a demographic history of recent speciation or extensive gene flow between previously diverged taxa. Such low background divergence offers little support for the presence of post-mating reproductive incompatibilities. Yet, a geographic cline analysis reveals narrow clines in a handful of traits likely to be under selection. In particular, the cline width for male bill color is only 2.3 km, marking it as one of the narrowest phenotypic clines observed in an avian hybrid zone. Notably, cline centers for individual genomic and phenotypic traits converged in the Rio Grande Valley, suggesting that this landscape feature played a role in stabilizing the hybrid zone’s position despite the fact that it does not impose a physical dispersal barrier to these highly volant hummingbirds. Pre-mating selection for bill color is the most likely driver of divergence in Jamaican streamertails.
Authors of the Dataset:
Caroline D. Judy, PhD, Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution
caroline.duffie@gmail.com
Gary R. Graves, John McCormack, Katherine Faust Stryjewski, Robb T. Brumfield
==========================================
Description of the data files
Three genomic data tables are archived in this repository (.csv, .fasta).
museumID_raw.analysis.SNPs.csv
A data matrix containing 6,451 individual SNPs in 145 individuals.
Variable names and descriptions:
- Column1: MuseumID - a unique identifier associated with each vouchered specimen that can be cross referenced in Table S1.
- Column2: phenotype - description of bill color based on morphological assessement
Row1[,3:6453] labels for individual SNP IDs.
SNP_Loci.fasta
A fasta file containing SNP loci.
Variable names and descriptions:
- Loci (rows) are labeled with their SNP IDs.
mtcr_museumID_label.fasta
A fasta file containing sequenced haplotypes from the mitochondrial control region.
Variable names and descriptions:
- Individuals (rows) are labeled with a MuseumID, a unique identifier associated with each vouchered.
- specimen that can be cross referenced in Table S1.
Laboratory Methods and Bioinformatics Processing:
Genotyping-by-sequencing
SNPs were generated using a genotyping-by-sequencing method by the Cornell Institute of Genomic Diversity (IGD). Briefly, DNA extracts for 160 Trochilus individuals was digested with PstI (CTGCAG), and the resulting fragments were ligated to sample-specific adapters and common adapters. Fragments were then pooled, cleaned, and amplified using an 18-cycle PCR. Final libraries were sequenced as single-end, 100-base pairs reads in one lane of an Illumina HiSeq 2000 (Illumina, San Diego, CA, USA).
SNP calling was performed using UNEAK, a reference-free SNP calling pipeline specific to GBS data that is an extension of the Java program TASSEL. UNEAK filters to remove reads that lack barcodes, cut sites, or that have ‘N’s present in the first part of the sequence following the barcode. High quality reads were then clustered into tags. Unique tags were merged and their counts in the whole sample of individuals were stored. Pairwise alignments of tags with a 1-bp mismatch were considered as candidate SNPs. Further filtering was performed to remove rare or singleton tags that likely result from sequencing error. We performed additional filtering steps using custom perl scripts to filter loci that are likely reverse complements of other loci in the dataset and to remove paralogous loci.
The raw GBS reads were processed using the reference-free SNP calling pipeline UNEAK, an extension of the Java program ‘tassel’ (54). The UNEAK pipeline remove reads that lack barcodes, cut sites, or that have ‘N’s present in the first part of the sequence following the barcode. High quality reads were then clustered into tags. Unique tags in the pooled sample of individuals were compiled and counted. Pairwise alignments of tags with a 1-bp mismatch were considered as candidate SNPs. Further filtering was performed to remove rare or singleton tags that likely result from sequencing error.
Running the initial UNEAK pipeline yielded 97,432 SNPs. Filtering for missing data ( > 20%) reduced the number of SNPs to 32,949. We then used custom perl scripts to filter potential reverse complements or loci that presumably contained paralogous reads, which reduced the total number of SNPs to 6,451 SNPs. Two individuals (635683, 636151) had ambiguous calls (‘N’s) for the majority of SNPs (> 90%) and were removed from the dataset. In addition, 14 outliers in the multidimensional scaling and PCA plots were pruned because they were potentially biased by sequencing error: 633563, 633565, 633590, 633591, 633600, 633603, 633623, 635683, 635691, 636149, 636157, 636159, 636163, and 63617. The final dataset contains 6,451 high-quality SNPs in 145 individuals.
MtCR haplotypes
Total DNA was extracted from approximately 25 mg of muscle tissue from 168 individuals using a DNeasy tissue kit (Qiagen, Valencia, California). A segment of the mitochondrial control region was amplified using a standard protocol for polymerase chain reaction (primers: H1251, 5'-TCTTGGCATCTTCAGTGCCRTGC-3' and TrochL-CR1: 5'-TGGCTAACGCGGAGCATACAATCTC-3'. Sequencing products were purified Sephadex and loaded on an ABI Prism 3100 Genetic Analyzer. Sequences were aligned in Sequencher 4.1 (Gene Codes, Ann Arbor, Michigan). When direct sequencing revealed more than one heterozygous site within a sequence, haplotypes were resolved probabilistically using PHASE.
- Judy, Caroline Duffie; Graves, Gary R; McCormack, John E et al. (2025). Speciation with gene flow in an island endemic hummingbird. PNAS Nexus. https://doi.org/10.1093/pnasnexus/pgaf095
