Data from: Wheat genotypic and phenotypic data for multivariate genomic prediction

Published Jul 29, 2025 on Dryad. https://doi.org/10.5061/dryad.6wwpzgn71

Data files

Jul 29, 2025 version files 23.39 MB

Genotypic_data.csv

23.35 MB
Phenotypic_data.txt

33.46 KB
README.md

3.37 KB

Abstract

The water absorption capacity (WAC) of hard wheat flour affects end-use quality characteristics, including loaf volume, bread yield, and shelf life. Despite its importance, improving WAC through phenotypic selection is challenging. Phenotyping for WAC is time-consuming and, as such, is often limited to evaluation in the latter stages of the breeding process, resulting in the retention of suboptimal lines longer than desired. This study investigates the potential of univariate and multivariate genomic predictions as an alternative to phenotypic selection for improving WAC. A total of 497 hard winter wheat genotypes were evaluated in multi-environment advanced yield and elite trials over eight years (2014-2021). Phenotyping for WAC was done via the solvent retention capacity (SRC) using water as a solvent (SRC-W). Traits that exhibited a significant correlation (r ≥ 0.3) with SRC-W and were evaluated earlier than SRC-W were included in the multivariate genomic prediction models. Kernel hardness and diameter were obtained using the single kernel characterization system (SKCS), and break flour yield (B-Flour) and total flour yield (T-Flour) were included. Cross-validation showed the mean univariate genomic prediction accuracy of SRC to be r = 0.69 ± 0.005, while bivariate and multivariate models showed an improved prediction accuracy of r = 0.82 ± 0.003. Forward validation showed a prediction accuracy up to r = 0.81 for a multivariate model that included SRC-W + All traits (SRC-W, Diameter, SKCS hardness and Diameter, F-Flour, and T-Flour). These results suggest that incorporating correlated traits into genomic prediction models can improve early-generation prediction accuracy.

We have submitted our raw phenotypic data (Phenotypic_data.txt), genotypic data (Genotypic_data.csv) and R-script used to analyze the data.

Files Submitted

Phenotypic_data.txt: Contains the phenotypic data.
Genotypic_data.csv: Contains the genotypic data.
**R-script_MVGS: **Contains the R-script used to analyze the data.

Descriptions

Genotypic Data

Filename: Genotypic_data.csv, obtained via genotyping by sequencing (GBS).

rs#: Identifier for each SNP.
Chrom: Chromosome number where the SNP is located.
Pos: Physical position of the SNP on the chromosome.
GID1, GID2, ..., GIDN: Genotype information for each sample.

Phenotypic Data

Filename: Phenotypic_data.txt

Sample_ID: Unique identifier for each sample.
SRC-water: Phenotypic value for water absorption capacity, measured by solvent retention capacity test using water as a solvent.
Diameter: Phenotypic value for grain diameter and hardness, measured by single kernel characterization system.
B-Flour: Phenotypic value for break flour yield, the amount of flour obtained from the initial breaking of the grain in the milling process.
T-Flour: Phenotypic value for total flour yield, the total amount of flour produced from the entire milling process.

Note: The phenotypic data is a Best Linear Unbiased Estimate (BLUE) calculated across trial-location-year interaction.

Relationships Between Data Files

The Genotypic_data.csv and Phenotypic_data.txt files are linked through the sample identifiers. Each sample in the phenotypic data corresponds to a specific sample in the genotypic data based on the Sample_ID. No missing data is found in the dataset.

Dataset Information

This dataset comprises genotypic and phenotypic data of 337 hard winter wheat (Triticum aestivum L.) genotypes collected for a multivariate genomic selection (GS) study focusing on wheat end-use quality traits. The study involved the genotyping of wheat samples using Genotyping-by-Sequencing (GBS) and the measurement of various phenotypic traits related to wheat quality, including water absorption capacity, grain diameter and hardness, break flour yield, and total flour yield. The experimental procedures were designed to identify genetic markers associated with these key traits to enhance wheat breeding programs.

Key Information Sources

Genotyping Data: Generated via Genotyping-by-Sequencing (GBS) and processed using the TASSEL software.
Phenotypic Measurements: Conducted using standard protocols for solvent retention capacity, single kernel characterization, and milling processes.

Code/Software

R is required to run R-script_MVGS; the script was created using version 4.2.3. Annotations are provided throughout the script for:

Library loading
Dataset loading and cleaning
Analyses
Figure creation

Usage Notes

Phenotypic_data.txt and Genotypic_data.csv can be viewed using any text editor or spreadsheet software like Microsoft Excel. R is required to run the analysis script provided in the repository.

Additional Information

The data and script used in this study can be found in the GitHub repository: https://github.com/MeseretAW/MVGS.

Data from: Wheat genotypic and phenotypic data for multivariate genomic prediction

Data files

Abstract

README: Data from: Wheat genotypic and phenotypic data for multivariate genomic prediction

Methods