Skip to main content

VCF files of common grassland plants from wild collected seeds of 19 common European grassland species with up to 4 consecutive generations grown in monoculture for seed production for restoration

Cite this dataset

Conrady, Malte et al. (2022). VCF files of common grassland plants from wild collected seeds of 19 common European grassland species with up to 4 consecutive generations grown in monoculture for seed production for restoration [Dataset]. Dryad.


A growing number of restoration projects require large amounts of seeds. As harvesting natural populations cannot cover the demand, wild plants are often propagated in large-scale monocultures. There are concerns that this cultivation process may cause genetic drift and unintended selection, which would alter the genetic properties of the cultivated populations and reduce their genetic diversity. Such changes could reduce the pre-existing adaptation of restored populations, and limit their adaptability to environmental change.

We used single nucleotide polymorphism (SNP) markers and a pool-sequencing approach to test for genetic differentiation and changes in gene diversity during cultivation in 19 wild grassland species, comparing the source populations and up to four consecutive cultivation generations. We then linked the magnitudes of genetic changes to the species’ breeding systems and seed dormancy, to understand the roles of these traits in genetic change.

The propagation changed the genetic composition of the cultivated generations only moderately. The genetic differentiation we observed as a consequence of cultivation was much lower than the natural genetic differentiation between different source regions. The propagated generations harbored even higher gene diversity than wild-collected seeds. Genetic change was stronger in self-compatible than in self-incompatible species, probably as a result of increased outcrossing in the monocultures.

Synthesis and applications: Our study indicates that large-scale seed production maintains the genetic integrity of natural populations. Increased genetic diversity may be indicative of increased adaptive potential of propagated seeds, which would make them especially suitable for ecological restoration. Yet, it remains to be tested whether these patterns observed on the level of molecular markers will be mirrored also in plant phenotypes. Further, we used seeds produced in Germany and Austria, where the seed production is regulated and certified. Whether other seed production systems perform equally well remains to be tested.


We obtained seeds of grassland plants from two producers of seeds for ecological restoration, one in Austria and one in Germany, which we call Producer 1 and Producer 2 in the remainder of this paper. The seed propagation process starts with the collection of seeds in the wild. Producer 1 bases its propagation on a mixture of seeds from multiple wild populations, whereas Producer 2 on seeds coming from only one population. The wild-collected seeds (F0) thus consist of a mixture of five populations in Producer 1, or a single population in Producer 2. The F0 seeds are often first germinated in a greenhouse to produce plugs which are then planted into an agriculture field. The seeds of this first cultivated generation (F1) are then harvested and used to establish the next cultivation (F2, Figure 1). The F2 seeds are mostly sold, but some are kept to establish the F3 generation. When a certain farm propagates multiple cultivation lines of the same species, there must be at minimum 500 m (more for grasses) between the production fields to minimize gene flow between them (Prasse et al. 2010). This process is repeated until F5, then new seeds must be collected from the wild, to avoid that plants (presumably) adapt to their propagation environments or lose genetic diversity (Figure 1) (Espeland et al. 2017). The seeds are usually mechanically harvested, using agricultural machinery.

For our study, we obtained wild-collected and cultivated seeds of 19 different plant species. Because the seed producers carefully stored the seeds from both the wild collection and almost every consecutive generation in cultivation, we were able to test for possible genetic changes during this cultivation process from generation to generation up to the fourth cultivated generation. Four of the species were provided by both producers, and one from two regions from the same producer (Table 1). In total we obtained 24 independent cultivation lines (19 species, 5 of them from two regions) which resulted in 83 accessions. To obtain material for the genetic analysis, we sowed seeds on seeding substrate, and sampled leaves from 18 random individuals per species and generation when the plants had become large enough.

Molecular Analysis

Because of the large sample size, we used a population pool approach (Futschik & Schlötterer 2010), where we pooled 18 individuals per generation and cultivation line into one sample. We used a reduced-representation sequencing approach for SNP (single nucleotide polymorphism) detection and genotyping and followed the ddRAD protocol (Peterson et al. 2012) with slight deviations (see Supplementary Information).

We used process_radtags from the Stacks 2.0 pipeline (Catchen et al. 2013; Rochette et al. 2019) to demultiplex reads. We then used the dDocent 2.6.0 pipeline (Puritz et al. 2014; O’Leary et al. 2018) for contig assembly, SNP detection and assessment of allelic read counts for each cultivation line. SNP filtering of the resulting VCF file with vcftools removed indels, kept only biallelic loci with minimum Phred-scores of 30 and kept only one SNP per contig. To allow a comparison of genetic diversity between generations that is not biased by different sequencing depths, we further filtered the data using R. We used only markers with a minor allele frequency of at least 0.05, and genotypes that had a minimum read depth of 36. We corrected for unequal sequencing depth of the same locus across pools of the same cultivation line by rarefaction, i.e. drawing the minimum number of reads from each pool to assess allelic read counts. Pools with less than 500 genotyped SNPs were removed (Tab. 1). See Supplementary information, Table S2, and supplemental R code for details. The final data sets consisted of 657 - 9721 (average 5137) biallelic SNP loci per pool across the 19 species and 24 cultivation lines, with between 0% and 14.6% missing data.

Usage notes

Generation 3 and 5 from anthoxanthum odoratum and generation 2 from salvia pratensis need to be discarded as they are most likely not part of the cultivation lines and are therefore not consecutive generations.


Deutsche Forschungsgemeinschaft, Award: LA4038/2-1

Deutsche Forschungsgemeinschaft, Award: DU404/14-1