Data and code from: Contrasting drivers of genetic diversity in plants across previously glaciated Northern hemisphere landscapes
Data files
May 14, 2026 version files 1.02 MB
-
Final_model_data.csv
900.46 KB
-
README.md
6.68 KB
-
Script_BayesianPhyloModel.R
52.08 KB
-
Script_EcologicalDistance.R
3.30 KB
-
Script_Figures.R
11.31 KB
-
Script_GeneratePhylogeny.R
471 B
-
Script_Range_refugia_Distance.R
4.07 KB
-
SDM.csv
44.63 KB
Abstract
Understanding how global biogeographical and evolutionary processes have shaped continental-scale patterns of plant genetic diversity is increasingly tractable given the proliferation of population genetic data. Predominant theories include the geographical central–marginal hypothesis (CMH), ecological CMH, and historical CMH, which predict decreasing genetic diversity from range centres to margins, from suitable to marginal environments, and from refugia to newly colonised areas, respectively. In addition, the latitudinal (LG) and longitudinal (LoG) gradient hypotheses predict a decrease in within-population genetic diversity along dispersal routes across species ranges. Here, these hypotheses were tested across North America, Europe, and East Asia, regions that experienced contrasting patterns of post-glacial landscape fragmentation from the last glacial maximum to the present. Data were collated from 8,333 populations representing 435 plant species, and distances from populations to range margins, climatic niche margins, and refugia, as well as latitude and longitude, were calculated. Bayesian phylogenetic mixed-effects models were applied to assess relationships between these variables and genetic diversity, with all possible combinations of the five variables evaluated (31 candidate models) and the best-supported model identified through model comparison. Results indicate that geographical CMH, ecological CMH, historical CMH, LG, and LoG have all shaped patterns of genetic diversity across the Northern Hemisphere, although their effects varied substantially by region and between woody and herbaceous species. The geographical CMH primarily influenced herbaceous species in Europe and East Asia, whereas the ecological CMH mainly affected woody species in North America, and the historical CMH had limited effects in East Asia. Overall, the findings support the view that patterns of genetic diversity are shaped by interacting geographical, ecological, and historical factors, with contrasting continental drivers reflecting differences in glaciation history as well as life history traits. These results underscore the value of geographical, historical, and ecological variables as proxies for within-population genetic diversity and their utility in identifying populations for conservation prioritisation in previously glaciated regions of the Northern Hemisphere.
Dataset DOI: 10.5061/dryad.z34tmpgtp
Description of the data and file structure
This dataset contains all data and scripts used to investigate the geographic, ecological, and historical drivers of genetic diversity across 435 plant species distributed in East Asia, North America, and Europe. The dataset includes population-level genetic diversity metrics, distances from each population to range margins (DRange), refugia (DRefugia), and niche margins (DNiche). The accompanying scripts include code for calculating DRange, DRefugia, and DNiche, as well as code for Bayesian phylogenetic modeling and figure generation.
Files and variables
File: Script_Range_refugia_Distance.R
Description: This script is used to estimate two spatial metrics for each population in the dataset: distance to the geographical range margin and distance (DRange) to glacial refugia (DRefugia). The workflow includes two main components. The first part of the code builds species distribution models to estimate contemporary and historical potential distributions for each species. The species distribution model (SDM) analyses use species occurrence data downloaded from GBIF, with individual species download links provided in the SDM.csv file. Environmental variables for the SDMs are obtained from WorldClim 1.4 and the ENVIREM dataset (Title & Bemmels 2018). The second part of the code calculates DRange and DRefugia.These spatial metrics are derived from the SDM results generated in the first part of the workflow.
File: Script_EcologicalDistance.R
Description: This script computes the distance from each population to the climatic niche margin (DNiche) of its species. Species occurrence coordinates used in this analysis are the same as those used for the SDM analyses and are downloaded from GBIF, with individual species download links provided in the SDM.csv file.
File: Script_GeneratePhylogeny.R
Description: This script generates a phylogenetic tree for all study species using the V.PhyloMaker2 package. It standardizes species names, integrates them into the megatree backbone, and outputs a fully resolved phylogeny for downstream Bayesian phylogenetic modeling. All species names are consistent with those in the Final_model_data.csv file, ensuring compatibility with the dataset used for the analyses. The resulting phylogeny is used as input for the Bayesian phylogenetic models in Script_BayesianPhyloModel.R.
File: Script_BayesianPhyloModel.R
Description: This script contains the full workflow for fitting the Bayesian phylogenetic models used in the analysis. It prepares the input data, incorporates phylogenetic covariance structures, specifies the model formulas, runs the brms models, and produces the associated outputs. The input files for this script are generated by filtering the total dataset to select the relevant genetic diversity parameters for the species or populations of interest. Users can create their own CSV files from the full dataset to include specific genetic diversity metrics or species, and then run this script to reproduce the analyses.
File: Script_Figures.R
Description: This script contains the code used for plotting figures in the present study. It generates Figure 2, Figure 3, Supplementary Figure 2, and Supplementary Figure 3. Figures 2 and Supplementary Figure 2 are plotted using population genetic data and coordinates from Final_model_data.csv. Figure 3 is plotted using the effect values obtained from Script_BayesianPhyloModel.R. Supplementary Figure 3 is plotted using SDM model AUC values from SDM.csv.
File: Final_model_data.csv
Description: This file contains the complete dataset used in the study, integrating species information, population-level genetic diversity metrics, distance to the geographical range margin(Drange), distance to the climatic niche margin (DNiche), distance to glacial refugia (Drefugia), and references for each population. Missing values for variables that were not calculated or unavailable are indicated as NA.
Variables
- Region_ID: Region coding (East Asia, North America, and Europe)
- Literature_ID: Each reference included in the meta-analysis was assigned a unique numerical code. This coding system is consistently used across Supplementary tables 1, 2, and 3 to identify and cross-reference individual studies.
- Species: Species name calibrated by the World Flora Online Plant List.
- Genus: Genus name calibrated by the World Flora Online Plant List.
- Family: Family name calibrated by the World Flora Online Plant List.
- Life history: Whether or not the species is a woody, annual herb, or perennial herb plant.
- Longitude: Population longitude.
- Latitude: Population latitude.
- Ar: Allelic richness.
- Ho: Observed heterozygosity.
- He: Expected heterozygosity.
- DRange: Distance to range margins.
- DNiche: Distance to climatic niche margins.
- DRefugia: Distance to refugia.
File: SDM.csv
Description: The file SDM.csv contains information related to species distribution modeling (SDM) for all study species. This dataset allows users to track the origin and quality of occurrence data used in the SDM analyses and reproduce SDM results if desired.
Variables
For each species, it includes:
- Region_ID: Region coding (East Asia, North America, and Europe).
- species_names: Species name calibrated by the World Flora Online Plant List.
- GBIF DOI: GBIF download link (DOI) for occurrence points used in SDM.
- Unique occurrences: Number of occurrence points available for modeling.
- AUC_Domain/AUC_GLM/AUC_Maxent/AUC_mean: SDM model performance (AUC values), used for Supplementary Figure 3.
- References: Literature sources documenting species distribution data.
Code/software
All analyses and scripts were conducted in R (version 4.1.1). The scripts load the required R packages internally (e.g., brms, ape, V.PhyloMaker2, raster, sp). Users should ensure these major packages are installed.
Run the scripts in the following general order:
Distance calculations (Script_Range_refugia_Distance.R and Script_EcologicalDistance.R)
Phylogenetic tree generation (Script_GeneratePhylogeny.R)
Bayesian phylogenetic modeling (Script_BayesianPhyloModel.R)
Figure generation (Script_Figures.R)
Access information
Other publicly accessible locations of the data:
- None
Data was derived from the following sources:
- None
