Data from: Genetic diversity unveiled: Cost-effective methods for grassland species
Data files
Mar 13, 2026 version files 42.06 GB
-
01_multispecies_MSAS.zip
7.20 GB
-
02_loper_MSAS.zip
6.24 GB
-
03_loper_GBS.zip
28.62 GB
-
README.md
19.47 KB
Abstract
Permanent grasslands are the basis for sustainable ruminant livestock production and provide various ecosystem services. They are mainly composed of outcrossing plant species, leading to populations with high genetic diversity (i.e., intraspecific diversity). Grasslands of high plant genetic diversity (PGD) can better cope with environmental stress and have stabilised biomass productivity. Additionally, they are valuable reservoirs of genetic resources used for forage plant breeding. To detect undesired changes and intervene accordingly, monitoring PGD in these grasslands is key. Despite the availability of various molecular genetic approaches, PGD monitoring is often neglected in biodiversity reports, which is attributed to a lack of standardised and affordable indicators of genetic diversity in natural populations.
To assess PGD of agronomically relevant grassland species, we applied multispecies amplicon sequencing (MSAS) and genotyping-by-sequencing (GBS), resulting in three data sets. Using MSAS, we analysed 39 samples based on five species (Dactylis glomerata L., Festuca pratensis Huds., Lolium perenne L., Trifolium pratense L., Trifolium repens L.). The sample set contains 30 single-accession (SA) seedling samples (five species, two accessions (A and B) per species, three replicates) and nine mixed-species (MS) seedling samples (three compositions, three replicates). The latter were prepared by pooling DNA from SA samples for three different compositions: 1) MS-A100, containing accession A of each species at equal amounts, 2) MS-B100, containing accession B of each species at equal amounts, and 3) MS-AB50, for equal amounts of MS-A100 and MS-B100.
Furthermore, we prepared an extended L. perenne sample set consisting of 42 samples based on six cultivars. This sample set contained 18 single-cultivar samples (six cultivars, three replicates) and 18 mixtures of two cultivars (three mixtures, two mixing ratios (50:50 and 75:25), three replicates). In addition to these samples, which were based on a greenhouse experiment, the sample set contained six 50:50-ratio mixtures based on a field experiement (one mixture, two locations, three replicates). Subsequently, the sample set was analysed using MSAS and GBS.
01_multispecies_MSAS.zip
Illumina MiSeq raw reads (2 x 300bp) from multispecies amplicon sequencing (MSAS) experiment based on five species sample set (39 samples in total).
The following table shows the sample name, which is part of the file name, and the corresponding species and accession of the single-accession (SA) seedling samples.
60 plants per accession were analyzed. The 60 plants were grouped into pools of 20 plants before DNA extraction.
| Sample name | Species | Accession (origin) |
|---|---|---|
| Dg-A-1 | Dactylis glomerata L. | 'DG1525' (Agroscope, CH) |
| Dg-A-2 | Dactylis glomerata L. | 'DG1525' (Agroscope, CH) |
| Dg-A-3 | Dactylis glomerata L. | 'DG1525' (Agroscope, CH) |
| Dg-B-1 | Dactylis glomerata L. | 'Bremgarten AG Folenweid' (Agroscope, CH) |
| Dg-B-2 | Dactylis glomerata L. | 'Bremgarten AG Folenweid' (Agroscope, CH) |
| Dg-B-3 | Dactylis glomerata L. | 'Bremgarten AG Folenweid' (Agroscope, CH) |
| Fp-A-1 | Festuca pratensis Huds. | 'FP1515' (Agroscope, CH) |
| Fp-A-2 | Festuca pratensis Huds. | 'FP1515' (Agroscope, CH) |
| Fp-A-3 | Festuca pratensis Huds. | 'FP1515' (Agroscope, CH) |
| Fp-B-1 | Festuca pratensis Huds. | 'Schleitheim SH Babental 05' (Agroscope, CH) |
| Fp-B-2 | Festuca pratensis Huds. | 'Schleitheim SH Babental 05' (Agroscope, CH) |
| Fp-B-3 | Festuca pratensis Huds. | 'Schleitheim SH Babental 05' (Agroscope, CH) |
| Lp-A-1 | Lolium perenne L. | 'LP1715' (Agroscope, CH) |
| Lp-A-2 | Lolium perenne L. | 'LP1715' (Agroscope, CH) |
| Lp-A-3 | Lolium perenne L. | 'LP1715' (Agroscope, CH) |
| Lp-B-1 | Lolium perenne L. | 'Kirchberg SG Tuttifrutti 99' (Agroscope, CH) |
| Lp-B-2 | Lolium perenne L. | 'Kirchberg SG Tuttifrutti 99' (Agroscope, CH) |
| Lp-B-3 | Lolium perenne L. | 'Kirchberg SG Tuttifrutti 99' (Agroscope, CH) |
| Tp-A-1 | Trifolium pratense L. | 'Crossway' (PGG Wrightson, NZ) |
| Tp-A-2 | Trifolium pratense L. | 'Crossway' (PGG Wrightson, NZ) |
| Tp-A-3 | Trifolium pratense L. | 'Crossway' (PGG Wrightson, NZ) |
| Tp-B-1 | Trifolium pratense L. | 'Belpberg 225' (Agroscope, CH) |
| Tp-B-2 | Trifolium pratense L. | 'Belpberg 225' (Agroscope, CH) |
| Tp-B-3 | Trifolium pratense L. | 'Belpberg 225' (Agroscope, CH) |
| Tr-A-1 | Trifolium repens L. | 'Beaumont' (Barenbrug, NL) |
| Tr-A-2 | Trifolium repens L. | 'Beaumont' (Barenbrug, NL) |
| Tr-A-3 | Trifolium repens L. | 'Beaumont' (Barenbrug, NL) |
| Tr-B-1 | Trifolium repens L. | 'TR1205' (Agroscope, CH) |
| Tr-B-2 | Trifolium repens L. | 'TR1205' (Agroscope, CH) |
| Tr-B-3 | Trifolium repens L. | 'TR1205' (Agroscope, CH) |
The following table shows the sample name used in the manuscript, the sample name used in the file name, and the composition of the DNA mixture of the mixed-species (MS) seedling samples.
These samples were prepared by pooling DNA from SA samples for three different compositions: 1) MS-A100, containing accession A of each species at equal amounts, 2) MS-B100, containing accession B of each species at equal amounts, and 3) MS-AB50, for equal amounts of MS-A100 and MS-B100.
| Sample name in manuscript | Sample name in file name | Sample composition (proportion) |
|---|---|---|
| MS-A100-1 | Mx-A100-1 | Dg-A-1 (20%), Fp-A-1 (20%), Lp-A-1 (20%), Tp-A-1 (20%), Tr-A-1 (20%) |
| MS-A100-2 | Mx-A100-2 | Dg-A-2 (20%), Fp-A-2 (20%), Lp-A-2 (20%), Tp-A-2 (20%), Tr-A-2 (20%) |
| MS-A100-3 | Mx-A100-3 | Dg-A-3 (20%), Fp-A-3 (20%), Lp-A-3 (20%), Tp-A-3 (20%), Tr-A-3 (20%) |
| MS-B100-1 | Mx-B100-1 | Dg-B-1 (20%), Fp-B-1 (20%), Lp-B-1 (20%), Tp-B-1 (20%), Tr-B-1 (20%) |
| MS-B100-2 | Mx-B100-2 | Dg-B-2 (20%), Fp-B-2 (20%), Lp-B-2 (20%), Tp-B-2 (20%), Tr-B-2 (20%) |
| MS-B100-3 | Mx-B100-3 | Dg-B-3 (20%), Fp-B-3 (20%), Lp-B-3 (20%), Tp-B-3 (20%), Tr-B-3 (20%) |
| MS-AB50-1 | Mx-AB50-1 | MS-A100-1 (50%), MS-B100-1 (50%) |
| MS-AB50-2 | Mx-AB50-2 | MS-A100-2 (50%), MS-B100-2 (50%) |
| MS-AB50-3 | Mx-AB50-3 | MS-A100-3 (50%), MS-B100-3 (50%) |
02_loper_MSAS.zip
Illumina MiSeq raw reads (2 x 300bp) from MSAS experiment based on extended L. perenne sample set (42 samples in total).
Samples M-2-75-1 and Rep-2 did not produce any sequencing reads and a sampling error occurred for LT-1. Therefore, these three samples were not considered further.
The following table shows the sample name used in the manuscript, the sample name used in the file name, and the corresponding cultivar composition in percent.
The single-cultivar and the mixed samples contained 30 and 60 plants, respectively. For the mixtures, each of the two cultivars was represented by 30 plants.
| Sample name in manuscript | Sample name in file name | 'Arara' | 'Araias' | 'Repentinia' | 'Artonis' | 'Arcturus' | 'Algira' |
|---|---|---|---|---|---|---|---|
| Ara-1 | Ara-1 | 100 | 0 | 0 | 0 | 0 | 0 |
| Ara-2 | Ara-2 | 100 | 0 | 0 | 0 | 0 | 0 |
| Ara-3 | Ara-3 | 100 | 0 | 0 | 0 | 0 | 0 |
| Ari-1 | Ari-1 | 0 | 100 | 0 | 0 | 0 | 0 |
| Ari-2 | Ari-2 | 0 | 100 | 0 | 0 | 0 | 0 |
| Ari-3 | Ari-3 | 0 | 100 | 0 | 0 | 0 | 0 |
| Rep-1 | Rep-1 | 0 | 0 | 100 | 0 | 0 | 0 |
| Rep-2 | Rep-3 | 0 | 0 | 100 | 0 | 0 | 0 |
| Art-1 | Art-1 | 0 | 0 | 0 | 100 | 0 | 0 |
| Art-2 | Art-2 | 0 | 0 | 0 | 100 | 0 | 0 |
| Art-3 | Art-3 | 0 | 0 | 0 | 100 | 0 | 0 |
| Arc-1 | Arc-1 | 0 | 0 | 0 | 0 | 100 | 0 |
| Arc-2 | Arc-2 | 0 | 0 | 0 | 0 | 100 | 0 |
| Arc-3 | Arc-3 | 0 | 0 | 0 | 0 | 100 | 0 |
| Alg-1 | Alg-1 | 0 | 0 | 0 | 0 | 0 | 100 |
| Alg-2 | Alg-2 | 0 | 0 | 0 | 0 | 0 | 100 |
| Alg-3 | Alg-3 | 0 | 0 | 0 | 0 | 0 | 100 |
| Rep50-1 | M-1-50-1 | 0 | 50 | 50 | 0 | 0 | 0 |
| Rep50-2 | M-1-50-2 | 0 | 50 | 50 | 0 | 0 | 0 |
| Rep50-3 | M-1-50-3 | 0 | 50 | 50 | 0 | 0 | 0 |
| EB-Rep50-1 | EB-1 | 0 | 50 | 50 | 0 | 0 | 0 |
| EB-Rep50-2 | EB-2 | 0 | 50 | 50 | 0 | 0 | 0 |
| EB-Rep50-3 | EB-3 | 0 | 50 | 50 | 0 | 0 | 0 |
| LT-Rep50-1 | LT-2 | 0 | 50 | 50 | 0 | 0 | 0 |
| LT-Rep50-2 | LT-3 | 0 | 50 | 50 | 0 | 0 | 0 |
| Art50-1 | M-2-50-1 | 0 | 50 | 0 | 50 | 0 | 0 |
| Art50-2 | M-2-50-2 | 0 | 50 | 0 | 50 | 0 | 0 |
| Art50-3 | M-2-50-3 | 0 | 50 | 0 | 50 | 0 | 0 |
| Alg50-1 | M-3-50-1 | 0 | 50 | 0 | 0 | 0 | 50 |
| Alg50-2 | M-3-50-2 | 0 | 50 | 0 | 0 | 0 | 50 |
| Alg50-3 | M-3-50-3 | 0 | 50 | 0 | 0 | 0 | 50 |
| Rep25-1 | M-1-75-1 | 0 | 75 | 25 | 0 | 0 | 0 |
| Rep25-2 | M-1-75-2 | 0 | 75 | 25 | 0 | 0 | 0 |
| Rep25-3 | M-1-75-3 | 0 | 75 | 25 | 0 | 0 | 0 |
| Art25-1 | M-2-75-2 | 0 | 75 | 0 | 25 | 0 | 0 |
| Art25-2 | M-2-75-3 | 0 | 75 | 0 | 25 | 0 | 0 |
| Alg25-1 | M-3-75-1 | 0 | 75 | 0 | 0 | 0 | 25 |
| Alg25-2 | M-3-75-2 | 0 | 75 | 0 | 0 | 0 | 25 |
| Alg25-3 | M-3-75-3 | 0 | 75 | 0 | 0 | 0 | 25 |
03_loper_GBS.zip
Illumina NextSeq raw reads (1 x 75bp) from genotyping-by-sequencing (GBS) experiment based on extended L. perenne sample set (42 samples in total). Sample LT-1 was not considered further due to sampling error.
The following table shows the sample name used in the manuscript, the sample name used in the file name, and the corresponding cultivar composition in percent.
The single-cultivar and the mixed samples contained 30 and 60 plants, respectively. For the mixtures, each of the two cultivars was represented by 30 plants.
Cultivar Composition Table
| Sample name in manuscript | Sample name in file name | 'Arara' | 'Araias' | 'Repentinia' | 'Artonis' | 'Arcturus' | 'Algira' |
|---|---|---|---|---|---|---|---|
| Ara-1 | Ara-1 | 100 | 0 | 0 | 0 | 0 | 0 |
| Ara-2 | Ara-2 | 100 | 0 | 0 | 0 | 0 | 0 |
| Ara-3 | Ara-3 | 100 | 0 | 0 | 0 | 0 | 0 |
| Ari-1 | Ari-1 | 0 | 100 | 0 | 0 | 0 | 0 |
| Ari-2 | Ari-2 | 0 | 100 | 0 | 0 | 0 | 0 |
| Ari-3 | Ari-3 | 0 | 100 | 0 | 0 | 0 | 0 |
| Rep-1 | Rep-1 | 0 | 0 | 100 | 0 | 0 | 0 |
| Rep-2 | Rep-2 | 0 | 0 | 100 | 0 | 0 | 0 |
| Rep-3 | Rep-3 | 0 | 0 | 100 | 0 | 0 | 0 |
| Art-1 | Art-1 | 0 | 0 | 0 | 100 | 0 | 0 |
| Art-2 | Art-2 | 0 | 0 | 0 | 100 | 0 | 0 |
| Art-3 | Art-3 | 0 | 0 | 0 | 100 | 0 | 0 |
| Arc-1 | Arc-1 | 0 | 0 | 0 | 0 | 100 | 0 |
| Arc-2 | Arc-2 | 0 | 0 | 0 | 0 | 100 | 0 |
| Arc-3 | Arc-3 | 0 | 0 | 0 | 0 | 100 | 0 |
| Alg-1 | Alg-1 | 0 | 0 | 0 | 0 | 0 | 100 |
| Alg-2 | Alg-2 | 0 | 0 | 0 | 0 | 0 | 100 |
| Alg-3 | Alg-3 | 0 | 0 | 0 | 0 | 0 | 100 |
| Rep50-1 | M-1-50-1 | 0 | 50 | 50 | 0 | 0 | 0 |
| Rep50-2 | M-1-50-2 | 0 | 50 | 50 | 0 | 0 | 0 |
| Rep50-3 | M-1-50-3 | 0 | 50 | 50 | 0 | 0 | 0 |
| EB-Rep50-1 | EB-1 | 0 | 50 | 50 | 0 | 0 | 0 |
| EB-Rep50-2 | EB-2 | 0 | 50 | 50 | 0 | 0 | 0 |
| EB-Rep50-3 | EB-3 | 0 | 50 | 50 | 0 | 0 | 0 |
| LT-Rep50-1 | LT-2 | 0 | 50 | 50 | 0 | 0 | 0 |
| LT-Rep50-2 | LT-3 | 0 | 50 | 50 | 0 | 0 | 0 |
| Art50-1 | M-2-50-1 | 0 | 50 | 0 | 50 | 0 | 0 |
| Art50-2 | M-2-50-2 | 0 | 50 | 0 | 50 | 0 | 0 |
| Art50-3 | M-2-50-3 | 0 | 50 | 0 | 50 | 0 | 0 |
| Alg50-1 | M-3-50-1 | 0 | 50 | 0 | 0 | 0 | 50 |
| Alg50-2 | M-3-50-2 | 0 | 50 | 0 | 0 | 0 | 50 |
| Alg50-3 | M-3-50-3 | 0 | 50 | 0 | 0 | 0 | 50 |
| Rep25-1 | M-1-75-1 | 0 | 75 | 25 | 0 | 0 | 0 |
| Rep25-2 | M-1-75-2 | 0 | 75 | 25 | 0 | 0 | 0 |
| Rep25-3 | M-1-75-3 | 0 | 75 | 25 | 0 | 0 | 0 |
| Art25-1 | M-2-75-1 | 0 | 75 | 0 | 25 | 0 | 0 |
| Art25-2 | M-2-75-2 | 0 | 75 | 0 | 25 | 0 | 0 |
| Art25-3 | M-2-75-3 | 0 | 75 | 0 | 25 | 0 | 0 |
| Alg25-1 | M-3-75-1 | 0 | 75 | 0 | 0 | 0 | 25 |
| Alg25-2 | M-3-75-2 | 0 | 75 | 0 | 0 | 0 | 25 |
| Alg25-3 | M-3-75-3 | 0 | 75 | 0 | 0 | 0 | 25 |
File processing
While files are in fastq.gz format for 01_multispecies_MSAS.zip and 02_loper_MSAS.zip, we received files in fastq.bz2 format from the sequencing provider for 03_loper_GBS.zip. The latter files were decompressed in parallel using GNU parallel (Tange 2011) together with standard Unix utilities using the command find *bz2 | parallel bzip2 -d {}.
For 01_multispecies_MSAS.zip, 02_loper_MSAS.zip, and 03_loper_GBS.zip (after decompression), files can be processed with software like fastQC (Andrews 2010), multiQC (Ewels et al. 2016), cutadapt (Martin 2011), minimap2 (Li 2018), SAMtools (Danecek et al. 2021), and Picard (Broad Institute 2019), as described in the connected article.
References:
Andrews, S. 2010. “FastQC: A Quality Control Tool for High Throughput Sequence Data.” https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Broad Institute. 2019. “Picard Toolkit, Github Repository.” http://broadinstitute.github.io/picard/.
Danecek, P., J. Bonfield, J. Liddle, et al. 2021. “Twelve Years of SAMtools and BCFtools.” GigaScience 10, no. 2: 1–4. https://doi.org/10.1093/gigascience/giab008.
Ewels, P., M. Magnusson, S. Lundin, and M. Käller. 2016. “MultiQC: Summarize Analysis Results for Multiple Tools and Samples in a Single Report.” Bioinformatics 32, no. 19: 3047–3048. https://doi.org/10.1093/bioinformatics/btw354.
Li, H. 2018. “Minimap2: Pairwise Alignment for Nucleotide Sequences.” Bioinformatics 34, no. 18: 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.
Martin, M. 2011. “Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads.” EMBnet.Journal 17, no. 1: 10–12. https://doi.org/10.14806/ej.17.1.200.
Tange, O. 2011. “GNU Parallel—The Command-Line Power Tool.” Usenix Magazine 36, no. 1: 42–47. https://doi.org/10.5281/zenodo.16303.
