A complex genomic architecture underlies reproductive isolation in a North American Oriole hybrid zone
Data files
Jul 14, 2023 version files 5.56 GB
-
GWAS_lmm_ears_forfig.txt.zip
400.44 MB
-
GWAS_lmm_forehead_forfig.txt.zip
359.75 MB
-
GWAS_lmm_greatercoverts_forfig.txt.zip
400.43 MB
-
GWAS_lmm_lessercoverts_forfig.txt.zip
400.26 MB
-
GWAS_lmm_neck_forfig.txt.zip
400.35 MB
-
GWAS_lmm_tailbase_forfig.txt.zip
401.12 MB
-
GWAS_lmm_tailtip_forfig.txt.zip
400.38 MB
-
GWAS_lmm_throat_forfig.txt.zip
355.13 MB
-
Orioles_filtered_final_093020_55individuals.recode.vcf.gz
2.44 GB
-
README.MD
5.56 KB
Abstract
Natural hybrid zones provide powerful opportunities for identifying the mechanisms that facilitate and inhibit speciation. Documenting the extent of genomic admixture allows us to discern the architecture of reproductive isolation through the identification of isolating barriers. This approach is particularly powerful for characterizing the accumulation of isolating barriers in systems exhibiting varying levels of genomic divergence. Here, we use a hybrid zone between two species--the Baltimore (Icterus galbula) and Bullock’s (I. bullockii) orioles--to investigate this architecture of reproductive isolation. We combine whole genome re-sequencing with data from an additional 313 individuals amplityped at ancestry-informative markers to characterize fine-scale patterns of admixture, and to quantify links between genes and the plumage traits. On a genome-wide scale, we document several putative barriers to reproduction, including elevated peaks of divergence above a generally high genomic baseline, a large putative inversion on the Z chromosome, and complex interactions between melanogenesis-pathway candidate genes. Concordant and coincident clines for these different genomic regions further suggest the coupling of pre- and post-mating barriers. Our findings of complex and coupled interactions between pre- and post-mating barriers suggest a relatively rapid accumulation of barriers between these species, and they demonstrate the complexities of the speciation process.
Methods
Sample Collection. Baltimore and Bullock’s orioles were sampled during the spring/summer (May 19 through June 27) of 2016-2018 from 12 localities along the Platte River in Colorado and Nebraska, USA. This intensive sampling effort resulted in the collection of 313 vouchered specimens of phenotypic Bullock’s Orioles, phenotypic Baltimore Orioles, and phenotypic hybrids from within the contact zone. From this collection, we sequenced the genomes of 60 male orioles from five genotypic classes, including allopatric Baltimore Orioles (n=10), allopatric Bullock’s Orioles (n=10), backcrossed Baltimore Orioles (n=10), backcrossed Bullock’s Orioles (n=10), and recent generation hybrids (n=20).
Whole Genome Sequencing. Genomic DNA was extracted from all samples using the DNeasy blood and tissue kit (Qiagen, CA, USA). DNA concentrations were quantified using the Qubit dsDNA High Sensitivity Assay Kit (Life Technologies). For whole genome re-sequencing, individual libraries were prepared using the NEBNext Ultra II FS kit (New England BioLabs, MA, USA). We followed manufacturer protocol for concentrations >=100 ng with the following conditions: fragmentation for 12.5 minutes and bead-based size selection of 400-600 bp (insert size of 275-475 bp). Libraries were sequenced on three Illumina NextSeq lanes at the Cornell Institute for Biotechnology core facility. Library quality was assessed using FastQC version 0.11.8 (http://www.bioinformatics.babraham. ac.uk/projects/fastqc). An average number of 24.5 million reads were obtained per individual. Library preparation for four samples (two backcrossed individuals and two recent generation hybrids) failed (i.e. a low number of reads). These four individuals were removed from the data set for all subsequent analyses.
Data Filtering & Variant Discovery. For the whole-genome data set, we used AdapterRemoval version 2.1.1 for sequence trimming, adapter removal, and quality filtering, requiring a minimum Phred quality score of 20 and merged overlapping paired-end reads. We aligned filtered reads to the Myrtle Warbler Reference genome (Setophaga coronata coronata) using the default settings in BWA 0.7.4 and obtained alignment statistics from Qualimap version 2.2.1. We used Samtools version 1.9 to convert all resulting BAM files to SAM files and to sort and index files. We used Picard Tools v.2.19.2 (https://broadinstitute.github.io/picard/) to add index groups and mark duplicates. We used the Haplotype Caller module in GATK version 3.8.1 for SNP variant discovery and genotyping for the 56 orioles and used the following filtering parameters to remove variants: QD < 2, FS > 60.0, MQ < 30.0, and ReadPosRankSum < -8.0. We additionally filtered out variants that were not biallelic, had minor allele counts less than 4, mean coverage less than 2X or more than 50X, and more than 20% missing data. This resulted in a total of 11,651,297 SNPs across the five ancestry categories (parental Baltimore oriole, parental Bullock’s oriole, F1/F2, backcrossed Baltimore oriole, and backcrossed Bullock’s oriole). We removed one additional individual (backcrossed Bullock’s Oriole) due to high relatedness with another male in the data set (r = 0.53) resulting in a final data set containing 55 orioles.
Admixture Mapping. To identify the genomic underpinnings of plumage variation in Bullock’s and Baltimore orioles, we employed genome-wide association analyses implemented in the package gemma. gemma detects SNPs that are associated with specified traits while controlling for population structure by including a relatedness matrix in the model. Because gemma requires a complete SNP data set, we imputed missing data using beagle v 4.1. We conducted separate univariate linear mixed models for each of the nine plumage traits and used the Wald test (-lmm 1) with a significance threshold of ⍺-value of –log10 = 7 to identify associations between genotype and phenotypes. Models for supercilium failed to converge, and thus we present results for 8 out of the 9 plumage traits. We visualized results using the manhattan function in the qqman package in R.