Skip to main content

Raw data: multispecies amplicon sequencing (Loera, Studer, and Kölliker, 2021, Molecular Ecology Resources)

Cite this dataset

Loera-Sánchez, Miguel; Studer, Bruno; Kölliker, Roland (2022). Raw data: multispecies amplicon sequencing (Loera, Studer, and Kölliker, 2021, Molecular Ecology Resources) [Dataset]. Dryad.


Grasslands cover close to two fifths of Earth's land. They provide many ecosystem services related to the maintenance of soil integrity, and the regulation of water, carbon and nitrogen flows. Grasslands constitute the basis for sustainable roughage production for ruminant feeding. In Switzerland, grasslands cover more than 70% of the total agricultural land, which highlights their importance in the domestic food production chains.

Plant genetic diversity (PGD), a component of biodiversity, influences ecosystem functioning in grasslands. High levels of grassland PGD are related to resistance against invasive plants and yield stabilization during environmental stress (e.g., drought or frost). The PGD of grasses and legumes —the two most economically relevant plant families found in grasslands, which naturally grow in a wide climate spectrum— harbors valuable genetic resources for forage breeding. Nevertheless, most PGD studies of natural or semi-natural grasslands (i.e., grasslands that are not sown) focus on a single or a few related species. Traditional PGD monitoring methods (e.g., simple sequence repeats, or SSRs) are ill-suited for large-scale, multispecies assessments.  This limits our ability to study the ecological effects of grassland PGD, its spatiotemporal patterns, and its significance for grassland management.

Looking to provide cost-effective tools for multispecies PGD monitoring in grasslands, we performed a sequence capture assay targeting 611 single-copy nuclear loci, followed by multispecies amplicon sequencing (i.e., amplicon sequencing using primer pairs that can be used in multiple species) on eleven selected loci.

Our results indicate that multispecies amplicon sequencing is a cost-effective tool for genetic diversity assessment in grassland plant species. Furthermore, the sequence capture data provides the means to extend the number of multispecies amplicons for further research.


We surveyed the genetic diversity of 611 single-copy nuclear loci that are shared across grass and legume species. We followed a sequence capture approach to perform targeted sequencing of such loci in 5 genotypes of 16 grass and legume species of economic relevance in temperate grasslands: Alopecurus pratensis L., Arrhenaterum elatius L., Cynosurus cristatus L., Dactylis glomerata L., Festuca pratensis Huds., Festuca rubra L., Lolium perenne L., Lolium multiflorum Lam., Lotus corniculatus L., Medicago sativa L., Onobrychis viciifolia Scop., Phleum pratense L., Poa pratensis L., Trifolium pratense L., Trifolium repens L., and Trisetum flavescens L.

We then ranked the target loci according to their nucleotide diversity (π) and normalized k-mer richness (NKR) within each species. We finally selected eleven loci for amplicon sequencing. Those loci showed a combination of moderate to high genetic diversity estimates (i.e., π and NKR) and suitability for multispecies primer design. The selected loci were amplified and sequenced in test populations of D. glomerataF. pratensisL. perenneT. pratense, and T. repens. The test populations consisted of 16 genotypes per species (three cultivars per species) and included single- and pooled-plant samples.

Usage notes

File Description
01 FORAGE-611 Contains a FASTA file with the probe sequences for the sequence capture assay.
02 Discovery phase raw Contains the FASTQ files with the raw reads of the sequence capture assay.
03 Sequence capture loci Contains the FASTA files of the assemblies produced from the data of the sequence capture assay. The assemblies are grouped by locus.
04 Validation phase raw Contains the FASTQ files with the raw reads of the multispecies amplicon sequencing assay.
05 Haplotyping Contains the scripts used to determine SNP-based haplotypes with data from the multispecies amplicon sequencing assay. Contains a detailed description of all the previous files, including sample information (i.e., species and cultivars), sample names, locus names, loci orthologues in reference genomes, and the naming pattern of the loci assemblies.