Skip to main content

High-quality SNPs from genic regions highlight introgression patterns among European white oaks (Quercus petraea and Q. robur)

Cite this dataset

Lang, Tiange et al. (2020). High-quality SNPs from genic regions highlight introgression patterns among European white oaks (Quercus petraea and Q. robur) [Dataset]. Dryad.


In the post-genomics era, non-model species like most Fagaceae still lack operational diversity resources for population genomics studies. Sanger sequences were produced from over 800 gene fragments covering ~530 kb across the genic partition of European oaks in a range-wide sampling of 25 individuals (11 Quercus petraea, 13 Q. robur, one Q. ilex as an outgroup). Regions targeted represented broad functional categories potentially involved in species ecological preferences, and a random set of genes. Using a high-quality dedicated pipeline, we provide a detailed characterization of these genic regions, which included over 14500 polymorphisms, with ~12500 SNPs -218 being triallelic-, over 1500 insertion-deletions, and ~200 novel di- and tri-nucleotide SSR loci. This catalog also provides various summary statistics within and among species, gene ontology information, and standard formats to assist loci choice for genotyping projects. The distribution of nucleotide diversity and differentiation across genic regions are also described for the first time in those species (mean nucleotide diversity close to ~0.0049 in Q. petraea and to ~0.0045 in Q. robur across random regions, and mean FST ~0.13 across SNPs), with an estimate across the genome of 41 to 51 million SNPs expected in both species. We observed robust patterns of slightly but significantly higher diversity in Q. petraea, across a random gene set and in the abiotic stress functional category, and a heterogeneous landscape of both diversity and differentiation. These patterns are discussed in the context of both species documented introgression history despite strong reproductive barriers. The quality, representativity in terms of species genomic diversity, and usefulness of the resources provided are discussed for possible applications in medium-scale landscape ecology projects, and as a reference resource for validation purposes in larger-scale re-sequencing projects. These are preferentially recommended in oaks in contrast to SNP array development, given the large nucleotide variation and low levels of linkage disequilibrium revealed.


The data here are the original Sanger sequences (*.ab1 trace files) obtained from a discovery panel of 25 Quercus individuals sampled across a large part of both species geographic range (13 from Quercus robur, 11 from Quercus petraea, 1 from Quercus ilex). These sequences represent amplicons for gene fragments associated with 759 reference contigs from the assembly provided in Appendix S2 of the lang et al. manuscript in BioRxiv (

Each subfolder can contain more than one fragment from the same reference contig and most fragments but not all are overlapping. More than 85% of the fragments yielded at least 12 high-quality sequences. All subfolders contain at least one sequence that is of very good Sanger quality.

Leaves from individuals were sampled and stored in silica gel. DNA extraction was performed following Guichoux et al. (2013, DOI : 10.1111/mec.12125). DNA quality and concentration were assessed with a Nanodrop spectrophotometer (NanoDrop Technologies, Wilmington, 152 DE, USA). Extractions were repeated until we obtained at least 20 micrograms of genomic DNA per sample, which was needed for a few thousand individual PCRs.

More information on the overall bioinformatic strategy is in Figure 1 of the lang et al. manuscript, and the original list of amplicons with primer sequences and functional annotations are in Tables S1 and S2 of the supporting information for the manuscript.

All the sequencing work was performed on ABI3730 capillary sequencers (Applied Biosciences). Data quality steps were designed throughout the process in order to maximize the amount and quality of the sequences finally obtained.

Usage notes

The additional file includes the names of  to the 50608 *.ab1 trace files distributed across the 759 folders


EVOLTREE network of Excellence, Award: 016322

Agence Nationale de la Recherche, TRANSBIODIV project, Award: 06-BDIV-003-04

Biodiversa LINKTREE project, Award: 2008-966

Agence Nationale de la Recherche, REALTIME project, Award: 59000256

Biodiversa LINKTREE project, Award: 2008-966