Skip to main content

Global patterns of population genetic differentiation in seed plants

Cite this dataset

Gamba, Diana; Gamba, Diana; Muchhala, Nathan (2020). Global patterns of population genetic differentiation in seed plants [Dataset]. Dryad.


Evaluating the factors that drive patterns of population differentiation in plants is critical for understanding several biological processes such as local adaptation and incipient speciation. Previous studies have given conflicting results regarding the significance of pollination mode, seed dispersal mode, mating system, growth form, and latitudinal region in shaping patterns of genetic structure, as estimated by FST values, and no study to date has tested their relative importance together across a broad scale. Here we assembled a 337-species dataset for seed plants from publications with data on FST from nuclear markers and species traits, including variables pertaining to the sampling scheme of each study. We used species traits, while accounting for sampling variables, to perform phylogenetic multiple regressions. Results demonstrated that FST values were higher for tropical, mixed-mating, non-woody species pollinated by small insects, indicating greater population differentiation, and lower for temperate, outcrossing trees pollinated by wind. Among the factors we tested, latitudinal region explained the largest portion of variance, followed by pollination mode, mating system and growth form, while seed dispersal mode did not significantly relate to FST. Our analyses provide the most robust and comprehensive evaluation to date of the main ecological factors predicted to drive population differentiation in seed plants, with important implications for understanding the basis of their genetic divergence. Our study supports previous findings showing greater population differentiation in tropical regions and is the first that we are aware of to robustly demonstrate greater population differentiation in species pollinated by small insects.


The main dataset was collected from a systematic literature review in google scholar. From the manuscript:

"We constructed an FST dataset through a systematic search in google scholar (key words: “plant” AND —the following words, each in a separate search— “genetic structure”, “population differentiation”, “population genetics”, “genetic diversity”, “population gene flow”) for articles published up until June 2018. The search yielded thousands of studies, which we reduced to 356 peer-reviewed publications on seed plants that determined population genetic structure (FST) based on nuclear markers. When multiple studies reported FST values for the same species, we recorded the FST from the study with the largest geographic range, as this may better represent the genetic diversity found in the species (Cavers et al., 2005). By this criterion, we compiled a dataset that included 337 unique species. We extracted information for the predictor variables directly from the publications, and infrequently complemented this, where necessary, with information from peer-reviewed literature on the studied species (see Appendix S1 and Table S1 in Supporting Information). Predictor variables were included in multiple regressions to explain variation in FST values (see section FST models). We also included three factors that pertained to the sampling scheme of each study and that can potentially affect FST (Nybom, 2004; Nybom & Bartish, 2000): genetic marker used, maximum distance between populations, mean sample size per population. We used them to construct a null model to be compared against models with our factors of interest. Factors of interest consisted of five categorical variables with 2–4 levels: mating system (outcrossing, mixed-mating), growth form (non-woody, shrub, tree), pollination mode (large insects, small insects, vertebrates, wind), seed dispersal mode (animal, gravity, wind), and latitudinal region (tropics, sub-tropics, temperate)."

Usage notes

Fst_data.csv: dataset analyzed in this study

Fst_spp.csv: species list of this study (used to produce phylogeny)

Fst_script.R: R script to reproduce all results (preliminary and final) of the manuscript 

FST.Phylo.tre: phylogeny of studied species

All other .csv files (20 in total) correspond to 10 reduced datasets and their corresponding species list