Data from: The late Quaternary climate impact on the genome of the woodland strawberry (Fragaria vesca), a perennial herb
Data files
Jan 15, 2026 version files 1.03 GB
-
202.samples.renamed.vcf.gz
67.12 MB
-
fig_S10.tar.gz
1.08 MB
-
fig_S11-14.tar.gz
399.85 KB
-
fig_S16C.tar.gz
2.20 MB
-
Fig.1B_S1_ABC_Principal_component_analyses.csv
31.67 KB
-
Fig.1E_S3B_isolation_by_distance_105_pairs_corrected.xlsx
13.56 KB
-
Fig.3A-B.tar.gz
148.73 KB
-
Fig.3C-D.tar.gz
65.52 KB
-
Fig.3E-F.tar.gz
66.06 KB
-
Fig.3G.tar.gz
154.43 KB
-
Fig.3H.tar.gz
1.76 MB
-
Fig.S18A.E.tar.gz
543.10 KB
-
Fig.S18B.F.tar.gz
134.07 KB
-
Fig.S18C.G.tar.gz
549.58 KB
-
Fig.S18D.H.tar.gz
552.39 KB
-
Fig.S19A.E.tar.gz
558.47 KB
-
Fig.S19B.F.tar.gz
549.72 KB
-
Fig.S19C.G.tar.gz
557.11 KB
-
Fig.S19D.H.tar.gz
555.95 KB
-
Fig.S1G.admixture.zip
10.69 KB
-
Fig.S2.PCA.xlsx
36.66 KB
-
Fig.S20__alta_kofjord.zip
32.94 KB
-
Fig.S21.tar.gz
98.43 KB
-
Fig.S8_nucleotide_diversity.xlsx
9.95 KB
-
input_files_peripheral_ne_decline.during.MIS2.tar.gz
2.67 MB
-
mask_files.tar.gz
953.86 MB
-
README.md
6.56 KB
Abstract
Genomes record past climatic impacts on species’ range shifts, admixture, refugial isolation, and adaptive evolution, yet these processes remain poorly understood in perennial herbaceous species, a dominant group of temperate flora. We present a demographic history of the perennial herb woodland strawberry (Fragaria vesca L.) reconstructed from 200 genomes spanning most of its European range. Temporal population structure reveals a strong division into western and eastern genetic clusters along a longitudinal climatic gradient, with eastern core populations showing greater resilience during glaciations. Divergence patterns indicate that postglacial recolonization of western and eastern Europe occurred from distinct refugia in multiple waves. The current largest, admixed populations from the Mediterranean to northern Europe form a continuous chain maintained by east–west gene flow through Central Europe, with historical migration patterns indicating comparable connections during earlier interglacials. Our reconstruction of woodland strawberry’s climatic history with high temporal resolution reveals how the late Pleistocene core-periphery dynamics shaped its survival and population structure under climate change. These data highlight populations that are crucial for maintaining long-term genetic diversity and establish a framework for linking genomic regions shaped by distinct climatic periods to genome evolution and climatic adaptation in temperate flora.
Dataset DOI: 10.5061/dryad.8cz8w9h43
Description of the data and file structure
The deposited data includes all source data and numerical results presented in the article (Communications Biology: https://doi.org/10.1038/s42003-026-09539-5), that are not available in the supplemental data files.
The data include the raw files used to plot the demographic trajectories by MSMC-IM (Wang et al., 2020), as shown in the article (Figs. 3, S10–S14, S16, and S18–S21). Variables of those tables are described below in the file: Fig.3A-B.tar.gz. In addition, the dataset contains the source data for the principal component analyses presented in the article (Figs. 1B, S1, and S2), Admixture analysis (Fig. S1G), and nucleotide diversity estimates (Fig. S8). Moreover, it includes a biallelic variant file (VCF) comprising all 202 samples, as well as mask files generated using the bamCaller.py script (https://github.com/stschiff/msmc-tools/blob/master/bamCaller.py) for each individual sample and universal mask files for each chromosome.
Files and variables
File: Fig.3A-B.tar.gz
Description: Demographic trajectories with the core pattern in Figure 3A and 3B. Files are MSMC-IM-output files. Variables below have been estimated throughout generations.
Variables
- M=cumulative migration probability
- m=symmetric migration rate,
- m_N1=effective population size of N1 population
- im_N2=effective population size of N2 population
- Generation time 2 years.
File: Fig.3C-D.tar.gz
Description: Demographic trajectories with the peripheral pattern having strong bottleneck during the PGP glaciation. Files are MSMC-IM-output files.
File: Fig.3E-F.tar.gz
Description: Demographic trajectories with the peripheral pattern having strong bottleneck during MIS 8 glaciation. Files are MSMC-IM-output files. Variables below have been estimated throughout generations.
File: Fig.3G.tar.gz
Description: MSMC-IM-output files used to plot isolation event (time intervals when m < 1e-7) midpoints the peripheral pattern in Figure 3G Variables below have been estimated throughout generations.
File: Fig.3H.tar.gz
Description: MSMC-IM-output files used to plot isolation event midpoints across all bootstrap replicates (N=400) showing the peripheral pattern in Figure 3H. File includes four subfolders, which each containing 100 bootstrap replicates.
File: Fig.S8_nucleotide_diversity.xlsx
Description: Intergenic nucleotide diversity for each region. Nucleotide diversity was estimated with vcftools v.0.1.16 with no missing data per site (imputed).
Variables
- pi=average pairwise intergenic nucleotide diversity across regions
File: Fig.1E_S3B_isolation_by_distance_105_pairs_corrected.xlsx
Description: Data for isolation by distance shown in Figure. 1E and S3B
Variables
- Genetic differentiation between regions (Fst) based on Table S1. Fst was calculated with vcftools across whole genomes using Weir and Cockerham method (Weir and Cockerham 1984).
- geographic distance (kilometers) between regions based on average GPS coordinates of each region
File: fig_S10.tar.gz
Description: MSMC-IM-output files used to plot Figure. S10. They were created by analyzing MSMC2 output files with two mutation rates: Arabidopsis (7.1e-9 per nucleotide per generation) and Fragaria (5.6e-9 per nucleotide per generation).
File: fig_S11-14.tar.gz
Description: MSMC-IM-output files used to plot Figure. S11-14. They were produced by analyzing MSMC2 output files with two mutation rates: Arabidopsis (7.1e-9 per nucleotide per generation) and Fragaria (5.6e-9 per nucleotide per generation).
File: fig_S16C.tar.gz
Description: MSMC-IM-output files from bootstrap replicates (N=400) with the core pattern used to plot Figure. S16C. File includes four subfolders, which each containing 100 bootstrap replicates.
File: Fig.S18x.x.tar.gz
Description: MSMC-IM-output files from bootstrap replicates with the peripheral pattern used to plot Figure. S18x. File includes four subfolders, which each containing 100 bootstrap replicates and a whole genome run.
File: Fig.S19x.x.tar.gz
Description: MSMC-IM-output files from bootstrap replicates with the core pattern used to plot Figure. S19x. File includes four subfolders, which each containing 100 bootstrap replicates and a whole genome run.
File: Fig.S20__alta_kofjord.zip
Description: MSMC-IM-output files used to plot Figure. S20.
File: Fig.S21.tar.gz
Description: MSMC-IM-output files to plot Fig.S21.
File: mask_files.tar.gz
Description: Sample specific masks and universal masks for the MSMC2 method.
File: Fig.1B_S1_ABC_Principal_component_analyses.csv
Description: Numerical source data to plot PCA plots in Fig. 1B and S1B-C.
File: Fig.S2.PCA.xlsx
Description: Principal component analysis results to plot Figure S2.
342 woodland strawberry samples from northern Italy, southern Finland, central Sweden, Alta, Kåfjord and Tromso were genotyped by sequencing (GBS).
File: Fig.S1G.admixture.zip
Description: Admixture data presented in Fig. S1G. Admixture proportions shown for different values of K (2, 4, 6 and 8). Data for cross-validation error (199_CV.txt).
File: input_files_peripheral_ne_decline.during.MIS2.tar.gz
Description: Input files (MSMC-IM output files) used to test whether peripheral populations experienced significant declines in effective population size during the MIS 2 glaciation relative to southern European reference populations (Supplementary Data 4).
File: 202.samples.renamed.vcf.gz
Description: VCF file containing all SNPs/indels across all samples. Data is imputed and haplotyped.
Code/software
The code used for data-analysis is available at Github (https://github.com/tuomas64/strawberry)
References:
Wang, K., Mathieson, I.,O’Connell, J. & Schiffels, S. Tracking human population structure through time from whole genome sequences. PLoS Genet. 16, e1008552 (2020).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
