Data from: Wheat genotypic and phenotypic data for multivariate genomic prediction
Data files
This dataset is embargoed and will be released when the associated article is published. Contact gro.dayrdatad@pleh to notify us of article publication.
Lists of files and downloads will become available to the public when released.
Abstract
The water absorption capacity (WAC) of hard wheat flour affects end-use quality characteristics, including loaf volume, bread yield, and shelf life. Despite its importance, improving WAC through phenotypic selection is challenging. Phenotyping for WAC is time-consuming and, as such, is often limited to evaluation in the latter stages of the breeding process, resulting in the retention of suboptimal lines longer than desired. This study investigates the potential of univariate and multivariate genomic predictions as an alternative to phenotypic selection for improving WAC. A total of 497 hard winter wheat genotypes were evaluated in multi-environment advanced yield and elite trials over eight years (2014-2021). Phenotyping for WAC was done via the solvent retention capacity (SRC) using water as a solvent (SRC-W). Traits that exhibited a significant correlation (r ≥ 0.3) with SRC-W and were evaluated earlier than SRC-W were included in the multivariate genomic prediction models. Kernel hardness and diameter were obtained using the single kernel characterization system (SKCS), and break flour yield (B-Flour) and total flour yield (T-Flour) were included. Cross-validation showed the mean univariate genomic prediction accuracy of SRC to be r = 0.69 ± 0.005, while bivariate and multivariate models showed an improved prediction accuracy of r = 0.82 ± 0.003. Forward validation showed a prediction accuracy up to r = 0.81 for a multivariate model that included SRC-W + All traits (SRC-W, Diameter, SKCS hardness and Diameter, F-Flour, and T-Flour). These results suggest that incorporating correlated traits into genomic prediction models can improve early-generation prediction accuracy.
We have submitted our raw phenotypic data (Phenotypic_data.txt), genotypic data (Genotypic_data.csv
) and R-script used to analyze the data.
Files Submitted
- Phenotypic_data.txt: Contains the phenotypic data.
- Genotypic_data.csv: Contains the genotypic data.
- **R-script_MVGS: **Contains the R-script used to analyze the data.
Descriptions
Genotypic Data
Filename: Genotypic_data.csv, obtained via genotyping by sequencing (GBS).
- rs#: Identifier for each SNP.
- Chrom: Chromosome number where the SNP is located.
- Pos: Physical position of the SNP on the chromosome.
- GID1, GID2, …, GIDN: Genotype information for each sample.
Phenotypic Data
Filename: Phenotypic_data.txt
- Sample_ID: Unique identifier for each sample.
- SRC-water: Phenotypic value for water absorption capacity, measured by solvent retention capacity test using water as a solvent.
- Diameter: Phenotypic value for grain diameter and hardness, measured by single kernel characterization system.
- B-Flour: Phenotypic value for break flour yield, the amount of flour obtained from the initial breaking of the grain in the milling process.
- T-Flour: Phenotypic value for total flour yield, the total amount of flour produced from the entire milling process.
Note: The phenotypic data is a Best Linear Unbiased Estimate (BLUE) calculated across trial-location-year interaction.
Relationships Between Data Files
The Genotypic_data.csv and Phenotypic_data.txt files are linked through the sample identifiers. Each sample in the phenotypic data corresponds to a specific sample in the genotypic data based on the Sample_ID. No missing data is found in the dataset.
Dataset Information
This dataset comprises genotypic and phenotypic data of 337 hard winter wheat (Triticum aestivum L.) genotypes collected for a multivariate genomic selection (GS) study focusing on wheat end-use quality traits. The study involved the genotyping of wheat samples using Genotyping-by-Sequencing (GBS) and the measurement of various phenotypic traits related to wheat quality, including water absorption capacity, grain diameter and hardness, break flour yield, and total flour yield. The experimental procedures were designed to identify genetic markers associated with these key traits to enhance wheat breeding programs.
Key Information Sources
- Genotyping Data: Generated via Genotyping-by-Sequencing (GBS) and processed using the TASSEL software.
- Phenotypic Measurements: Conducted using standard protocols for solvent retention capacity, single kernel characterization, and milling processes.
Code/Software
R is required to run R-script_MVGS; the script was created using version 4.2.3. Annotations are provided throughout the script for:
- Library loading
- Dataset loading and cleaning
- Analyses
- Figure creation
Usage Notes
Phenotypic_data.txt and Genotypic_data.csv can be viewed using any text editor or spreadsheet software like Microsoft Excel. R is required to run the analysis script provided in the repository.
Additional Information
The data and script used in this study can be found in the GitHub repository: https://github.com/MeseretAW/MVGS.