Skip to main content

Ancient introgression between distantly related white oaks (Quercus sect Quercus) shows evidence of climate-associated asymmetric gene exchange

Cite this dataset

O'Donnell, Scott (2021). Ancient introgression between distantly related white oaks (Quercus sect Quercus) shows evidence of climate-associated asymmetric gene exchange [Dataset]. Dryad.


Ancient introgression can be an important source of genetic variation that shapes the evolution and diversification of many taxa. Here, we estimate the timing, direction and extent of gene flow between two distantly related oak species in the same section (Quercus sect. Quercus). We estimated these demographic events using genotyping by sequencing data (GBS), which generated 25,702 single nucleotide polymorphisms (SNPs) for 24 individuals of California scrub oak (Quercus berberidifolia) and 23 individuals of Engelmann oak (Q. engelmannii). We tested several scenarios involving gene flow between these species using the diffusion approximation-based population genetic inference framework and model-testing approach of the Python package DaDi. We found that the most likely demographic scenario includes a bottleneck in Q. engelmannii that coincides with asymmetric gene flow from Q. berberidifolia into Q. engelmannii. Given that the timing of this gene flow coincides with the advent of a Mediterranean-type climate in the California Floristic Province, we propose that changing precipitation patterns and seasonality may have favored the introgression of climate-associated genes from the endemic into the non-endemic California oak. 


Leaf tissue was collected from mature individuals of both Quercus engelmannii and Q. berberidifolia during the summer of 2017. Tissue was collected from two individuals per site at 12 unique sites per species (24 total sites, 48 total individuals). Leaf tissue was flash frozen in liquid nitrogen in lab and ground with a mortar and pestle into a fine powder. DNA was extracted following a modified Qiagen DNEasy plant kit protocol. DNA was prepared using GBS sequencing protocols (Elshire, et al., 2011) including a bead clean up step to ensure proper sized fragments. Libraries were sequenced at the Technology Center for Genomics and Bioinformatics at the University of California, Los Angeles on an Illumina Hiseq3000 (Illumina Inc. 5200 Illumina Way San Diego, CA) in August of 2017. To equalize coverage between species, samples were re-sequenced in July 2018. Samples were sequenced using two lanes (48 samples per lane) and the combined sequence data for each individual was used for this project. Each individual library was sequenced 2-3 times. Sequences were filtered for quality and sequencing depth and aligned to the Quercus lobata v.3.0 reference genome. SNPs were called from the aligned reads using GATK.

Usage notes

Raw sequence reads are included in the folders GBS1_2017 for the initial sequencing run and in GBS2_2018 for any subsequent resequencing performed on the same libraries. Each individual has up to 3 raw sequence files that were mapped to the Quercus lobata v.3.0 reference genome then combined in order to call SNPs for the final analysis.

Called variants for each individual are compiled in GBS_Variants.vcf 

Final variants used in analysis after filtering for linkage disequilibrium are compiled in PW_Filter_2018.vcf


National Science Foundation, Award: IOS-1444661