Large‐scale genomic SNP dataset for central and southeast European Turkey oak (Quercus cerris L.) populations generated by ddRAD‐seq method
Data files
Jul 30, 2025 version files 1.76 GB
-
Qcer_SNPs_mapped_cork_oak2.0.map
7.14 MB
-
Qcer_SNPs_mapped_cork_oak2.0.ped
259.16 MB
-
Qcer_SNPs_mapped_dhQueCerr2.1.map
7.60 MB
-
Qcer_SNPs_mapped_dhQueCerr2.1.ped
294.08 MB
-
README.md
8.95 KB
-
Sample_information.txt
22.21 KB
-
Soil_data.txt
107.82 KB
-
Tree-ring_samples.zip
1.20 GB
Abstract
In Central and Southeast Europe, oak-dominated forests constitute the backbone of temperate forest ecosystems. Therefore, their adaptability is crucial for maintaining these ecosystems’ biodiversity under the current climate change. In this study, we investigated Turkey oak (Quercus cerris L.) populations in the Carpathian Basin and the Balkan Peninsula, as it is one of the most abundant species in these regions. The goal of our study was to build a genomic SNP dataset to enable detailed analyses of the species' biogeography, genetic diversity, population structure, and local selection and adaptation. To this end, 32 natural populations were sampled in the study regions covering most of the natural habitat range of the species. To obtain large amounts and highly variable genomic information for the sampled populations we applied double digest restriction site-associated DNA sequencing for the genotyping of the 321 sampled individuals. SNP calling was performed by using reference mapping, in which the Turkey oak and the cork oak 2.0 genomes were used as references. With this approach, two datasets were generated allowing future users to obtain information on the markers’ genomic positions which is required e.g. for functional genomic analyses. The dataset mapped to the Turkey oak genome comprising 229 026, whereas the second dataset mapped to the cork oak genome 201 829 highly variable genome-wide SNP loci. Thanks to the scale and the sampling design, there are great opportunities to use these datasets to survey the species’ genetic diversity, population structure, gene flow and investigate the biogeography of Turkey oak in the Central and Southeast European region. In addition, tree-ring and soil data were also included, which can be used for more complex analyses, such as genotype-phenotype or genotype-environment associations, as well as dendrochronological studies.
https://doi.org/10.5061/dryad.pk0p2ngzz
Description of the data and file structure
Our study aimed to build a genomic SNP (single nucleotide polymorphism) dataset to enable detailed analyses of the biogeography, genetic diversity, population structure, and local selection and adaptation of the Turkey oak (Quercus cerris L.) in the Central and Southeast European regions. To this end, 32 natural populations were sampled in the study regions covering most of the natural habitat range of the species. To obtain large amounts and highly variable genomic information for the sampled populations, we applied double digest restriction site-associated DNA sequencing (ddRAD-seq) for the genotyping of the 321 sampled individuals. As a result of SNP calling, two reference-mapped datasets were generated using the genomes of the Turkey oak and the cork oak as references. The dataset mapped to the Turkey oak genome comprising 229 026, whereas the second dataset mapped to the cork oak genome 201 829 highly variable genome-wide SNP loci. Genotype data files were uploaded in the standard PLINK ".ped" and ".map" format. In addition, there is an additional 'Sample information' file, which contains GPS coordinates, height and diameter data for almost all the sampled individuals. Thanks to the scale and the sampling design, there are great opportunities to use these datasets to survey the species’ genetic diversity, population structure, gene flow and investigate the biogeography of Turkey oak in the Central and Southeast European regions. Furthermore, tree-ring and soil data were also provided in the "Soil_data.txt" and "Tree-ring_samples.zip" files, which can be used for more complex analyses, such as genotype-phenotype or genotype-environment associations, as well as dendrochronological studies.
Files and variables
File: Qcer_SNPs_mapped_cork_oak2.0.map
Description: Standard PLINK ".map" formatted file, containing genomic position information for SNP loci. This file is part of the dataset that was mapped to the cork oak genome.
File: Qcer_SNPs_mapped_dhQueCerr2.1.map
Description: Standard PLINK ".map" formatted file, containing genomic position information for SNP loci. This file is part of the dataset that was mapped to the Turkey oak genome.
File: Qcer_SNPs_mapped_cork_oak2.0.ped
Description: Standard PLINK ".ped" formatted file, containing the genotype information for sampled individuals. This file is part of the dataset that was mapped to the cork oak genome.
File: Qcer_SNPs_mapped_dhQueCerr2.1.ped
Description: Standard PLINK ".ped" formatted file, containing the genotype information for sampled individuals. This file is part of the dataset that was mapped to the Turkey oak genome.
File: Sample_information.txt
Description: Tab-delimited text file, containing GPS coordinates, Country, Municipality, hight and diameter (measured in two perpendicular directions) data for almost all the sampled individuals.
Variables
- POPID: Population identifier
- IID: Individual identifier
- Latitude: Latituted (WGS84)
- Longitude: Longitude (WGS84)
- Elevation_a.s.l_(m): Elevation of the sampling site above see level
- H_(m): Hight of the sampled tree
- d1.3_1_(cm): Breath height diameter in the first direction
- d1.3_2_(cm): Breath height diameter in the second direction
- Country: Country of the sampled populations
- Municipality: Municipality of the sampled populations
File: Soil_data.txt
Description: Tab-delimited text file containing soil data from 28 populations. Soil samples were collected using a hand soil borer to a maximum depth of 1 meter, depending on soil rockiness. Multiple samples were taken within each stand, typically from at least two or three locations (more in cases where rocky conditions made sampling difficult). Within each core, laboratory samples were taken approximately every 15 cm.
The dataset includes measurements of: pH(H₂O), pH(KCl), hydrolytic acidity, exchangeable acidity, total water-soluble salts (%), particle size distribution, total carbon (% dry weight), total organic carbon (% dry weight), total inorganic carbon (% dry weight), total nitrogen (% dry weight).
pH(H₂O) and pH(KCl) (1 M KCL) were ascertained at a soil–solvent ratio of 1:2.5 (w/v) with a pH meter using a glass electrode. pH values were measured following the protocol of MSZ-08-0206-2:1978; Evaluation of Some Chemical Properties of the Soil. Laboratory Tests. (pH Value, Phenolphtaleine Alkalinity Expressed in Soda, All Water Soluble Salts, Hydrolite (Yˇ1ˆ-Value) and Exchanging Acidity (Yˇ2ˆ- Value). Hungarian Standards Institution: Budapest, Hungary, 1978.
Hydrolite and exchanging acidity (y1 and y2) and the percentage of total soluble salts were measured following the protocol of MSZ-08-0206-2:1978; Evaluation of Some Chemical Properties of the Soil. Laboratory Tests. (pH Value, Phenolphtaleine Alkalinity Expressed in Soda, All Water Soluble Salts, Hydrolite (Yˇ1ˆ-Value) and Exchanging Acidity (Yˇ2ˆ- Value). Hungarian Standards Institution: Budapest, Hungary, 1978.
Particle size distributions (Coarse sand (%), Fine sand (%), Clay (%), Silt (%)) were measured using the pipetting method following 0.5 M sodium pyrophosphate (Na4P2O7) treatment in the range of 0.25–0.002 mm according to the protocol of MSZ-08-0205:1978; Determination of Physical and Hydrophysical Properties of Soils. Hungarian Standards Institution: Budapest, Hungary, 1978.
The total carbon (TC (%)), total organic carbon (TOC (%)), and total inorganic carbon (TIC (%)) contents were analyzed by dry combustion in an RC612 analyzer (LECO, St. Joseph, MI, USA) following the protocol of ISO 10694:1995; Soil Quality—Determination of Organic and Total Carbon after Dry Combustion (Elementary Analysis). International Organization for Standardization: Geneva, Switzerland, 1995.
A dry combustion method was also used to determine the total nitrogen (TN (%)) content using a CN628 analyzer (LECO, St. Joseph, MI, USA) following the protocol of ISO 13878:1998; Soil Quality—Determination of Total Nitrogen Content by Dry Combustion (“elemental Analysis”). International Organization for Standardization: Geneva, Switzerland, 1998.
Variables
- ID: Laboratory ID of the soil sample
- COREID: ID of the soil core
- POPID: ID of the population
- Latitude: Latitude (WGS84)
- Longitude: Longitude (WGS84)
- Elevation (m): Elevation in m
- Depth (cm): Range of depth in cm from where a given sample was taken
- pH(H2O): pH measured in water
- pH(KCl): pH measured in a 1M KCl solution
- y1: Hydrolite acidity
- y2: Exchanging acidity
- TSS (%): Percentage of total soluble salts
- Coarse sand (%): Percentage of coarse sand particles in the sample
- Fine sand (%): Percentage of fine sand particles in the sample
- Clay (%): Percentage of clay particles in the sample
- Silt (%): Percentage of silt particles in the sample
- TC (%): Total carbon content (dry weight basis)
- TOC (%): Total organic carbon content (dry weight basis)
- TIC (%): Total inorganic carbon content (dry weight basis)
- TN (%): Total nitrogen content (dry weight basis)
File: Tree-ring_data.zip
Description: Scanned images of tree-ring samples in JPG format. Each image contains samples from multiple individuals belonging to a given population (a total of 270 individuals across 27 populations). File names include the population identifier and the number of samples shown in the image. For example:
The file "QC-RO3-1-6" contains scanned images of the samples QC-RO3-1, QC-RO3-2, QC-RO3-3, QC-RO3-4, QC-RO3-5, and QC-RO3-6.
Sample names are written above the corresponding tree-ring samples on the scanned images. In addition, abbreviations in parentheses at the end of the labels indicate the direction from which the samples were taken (north or east).
For example: the label "RO_3 QCERR_1E (N)" in the image QC-RO3-1-6 refers to the individual QC-RO3-1, and indicates that the sample was taken from the north side.
The "E" and "K" characters after the sample ID numbers are abbreviations for north and east in Hungarian (Észak and Kelet, respectively).
These data are suitable for multiple purposes, including the determination of individual tree age, dendrochronological analyses, and integration with genotype data for genotype–phenotype association studies.
Code/software
These datasets can be easily analyzed using the freely available PLINK software (see: https://www.cog-genomics.org/plink/). With PLINK, datasets can be directly analysed or easily manipulated, filtered, and converted to several other genomic formats required by specific software.