Genomic insights into Lolium multiflorum diversity for forage breeding in Andean livestock systems
Data files
Jul 24, 2025 version files 5.21 MB
-
Identification_of_experimental_lines.csv
2.55 KB
-
Passport_information_of_64_Lolium_multiflorum_genotypes.csv
6.68 KB
-
README.md
3.12 KB
-
SNP_matrix_coded_ryegrass.csv
5.20 MB
Mar 06, 2026 version files 276.02 KB
-
final_matriz_Lm27.csv
266.41 KB
-
Passport_information_of_Lolium_multiflorum.csv
6.19 KB
-
README.md
3.42 KB
Abstract
Understanding the genetic diversity of Lolium multiflorum is essential for strengthening breeding programs and enhancing the sustainability of Andean livestock systems. This study evaluated the genomic variability and population structure of 27 accessions of Lolium multiflorum from the Cajamarca region and the INIA Amazonas germplasm bank (Peru), using the genotyping-by-sequencing (GBS) technique. DNA extracted from young leaves was sequenced on an Illumina NovaSeq 6000 platform, and reads were processed with bioinformatic tools (FastQC, BWA, GATK, VCFtools, BCFtools, and scikit-allel). After stringent filtering, 2,070 SNPs were obtained across seven chromosomes, with heterogeneous SNP distribution across the seven chromosomes. Principal Coordinate Analysis (PCoA) and phylogenetic analysis (UPGMA) revealed two distinct genetic groups, indicating a complex structure shaped by gene flow and local selection. AMOVA showed that 90.01% of the genetic variation occurs within populations, whereas 9.99% corresponds to interregional differences (PhiPT = 0.099, p < 0.006). The negative FIS values (Cajamarca = -0.2312; Amazonas = -0.5489) indicate an excess of heterozygotes, a pattern typically associated with predominantly allogamous species. Moreover, the high levels of observed heterozygosity (Ho > 0.57) point to possible hybrid vigor and suggest that these populations may be maintaining a stable genetic equilibrium. These results confirm that Peruvian L. multiflorum maintains a broad genetic base, shaped by historical germplasm exchange and local environmental adaptation. This genetic diversity provides essential insights for conservation planning and supports breeding initiatives to improve forage resilience and productivity in high-Andean ecosystems.
This dataset supports the genomic analyses presented in the manuscript "Genomic insights into Lolium multiflorum diversity for forage breeding in Andean livestock systems", which investigates genetic diversity and population structure of ryegrass germplasm from the northern Peruvian Andes using genotyping-by-sequencing (GBS).
The dataset comprises a curated single nucleotide polymorphism (SNP) matrix derived from 27 unique Lolium multiflorum accessions, representing native ecotypes from Cajamarca and commercial cultivars evaluated in the Amazonas region (Peru).
Dataset overview
- Species: Lolium multiflorum Lam.
- Number of accessions: 27 (non-redundant genotypes)
- Sequencing method: Genotyping-by-sequencing (GBS)
- Sequencing platform: Illumina NovaSeq 6000 (paired-end, 100 bp)
- Reference genome: Lolium multiflorum cv. Rabiosa (GCA_030979885.1)
- Final SNP dataset: 2,070 high-quality SNPs across seven chromosomes
Biological replicates were merged to retain a single representative genotype per accession, avoiding pseudo-replication.
Data structure and file description
final_matriz_Lm27.csv
Final SNP genotype matrix used for population genetic analyses.
- Format: Comma-delimited CSV file
- Rows: SNP loci (2,070 total)
- Columns: 27 Lolium multiflorum accessions
Genotype encoding:
0/0= homozygous reference0/1= heterozygous1/1= homozygous alternate./.= missing data
Passport_information_of_Lolium_multiflorum.csv
Passport and geographic metadata for the 27 Lolium multiflorum accessions.
- Format: Semicolon-delimited CSV file
- Content includes:
- Collection site (district, province, department)
- Geographic coordinates (latitude and longitude)
- Germplasm source (INIA Cajamarca or Amazonas experimental station)
- Accession classification (native ecotype or commercial cultivar)
SNP calling and filtering criteria
- Alignment: BWA v0.7.18
- Variant calling: GATK v4.6.2.0
Retained variants:
- Biallelic SNPs only
- QUAL ≥ 30
- 10 ≤ depth (DP) ≤ 100
- QD ≥ 2.0
- FS ≤ 60.0
- MQ ≥ 40.0
- Minor allele frequency (MAF) > 0.10
- Missing data per SNP ≤ 10%
- Minor allele count (MAC) ≥ 3
- Linkage disequilibrium pruning: r² = 0.85
After filtering, 2,070 SNPs were retained.
Associated analyses
The dataset was used for:
- Population structure analysis (ADMIXTURE)
- Principal Coordinates Analysis (PCoA)
- Phylogenetic analysis (UPGMA)
- Genetic diversity metrics (Ho, He, FIS, Shannon index)
- Molecular Analysis of Variance (AMOVA)
Code and software
FastQC v0.11.7, fastp v0.22.0, BWA v0.7.18, SAMtools v1.22.1, Picard v3.3.0, GATK v4.6.2.0, BCFtools v1.22, VCFtools v0.1.17, cyvcf2 v0.31.2, scikit-allel v1.3.13, PLINK v1.9, R v4.5.1 (vcfR, adegenet, hierfstat, ape, ggplot2, pheatmap, ggExtra)
Data reuse
This dataset provides a genomic baseline for Lolium multiflorum from the Peruvian Andes and can be reused for:
- Population genomics
- Forage breeding and conservation
- Genomic-assisted selection
- Studies of adaptation to high-altitude environments
Users are encouraged to cite the associated manuscript when using these data.
Leaf tissue samples from Lolium multiflorum accessions collected in the northern Peruvian Andes were used for genomic analysis. Genomic DNA was extracted from young leaves using the NucleoSpin® Plant II kit following the manufacturer’s protocol. DNA quality and concentration were assessed by agarose gel electrophoresis and fluorometric quantification. Genotyping-by-sequencing (GBS) libraries were prepared using the ApeKI restriction enzyme and sequenced on an Illumina NovaSeq 6000 platform to generate paired-end reads.
Raw sequencing reads were quality-checked using FastQC and filtered with fastp. Reads were aligned to the Lolium multiflorum reference genome (cultivar Rabiosa) using BWA, and SNP calling was performed with GATK following best-practice pipelines. Only high-quality biallelic SNPs were retained after filtering for read depth, quality metrics, minor allele frequency, missing data, and linkage disequilibrium. Biological replicates were merged to obtain a final non-redundant dataset of unique accessions. Population structure, genetic diversity, and phylogenetic relationships were analyzed using ADMIXTURE, PLINK, VCFtools, scikit-allel, and R-based statistical packages.
Changes after Jul 24, 2025:
- Updated and corrected passport information fields for Lolium multiflorum accessions.
- Minor corrections in accession identifiers and associated metadata.
- No changes were made to the original experimental measurements or raw data values.
