SNP genotype and hyperspectral reflectance data from: Ensembles of genomic and hyperspectral imaging-based prediction enable selection for reduced deoxynivalenol content in wheat grains
Data files
Jul 29, 2025 version files 110.29 MB
-
2021_Phenomic_Data_VIS-NIR.csv
729.38 KB
-
2021-2022_Across_Years_Phenomic_Data_VIS-NIR.csv
1.37 MB
-
2022_Phenomic_Data_VIS-NIR.csv
697.61 KB
-
AMAT_IMPUTED_NUMERIC_3117ind_15456snp_0.05MAF_0.70coverage_0.10het.csv
107.49 MB
-
README.md
3.85 KB
Abstract
Breeding for low deoxynivalenol (DON) mycotoxin content in wheat is challenging due to the complexity of the trait and phenotyping limitations. Since phenomic prediction relies on non-additive effects and genomic prediction on additive effects, their complementarity can improve selection accuracy. In this study DON-infected wheat kernels were imaged using a hyperspectral camera to generate reflectance values across the spectrum of visible and near infrared light that were used in phenomic predictions. Five Bayesian generalized linear regression models and two machine learning models were trained using phenomic and genomic predictions from advanced soft winter wheat breeding lines evaluated in 2021 and 2022. Across all training sets and models, phenomic predictions using wavebands in the visible light spectrum (400-700 nm) had higher predictive ability than genomic predictions or phenomic predictions using the full waveband range (400-1000 nm). Forward prediction was peformed using model ensembles on two sets of F4:5 selection candidates evaluated independently in 2022 and 2023. The phenotypic and genetic correlations, as well as indirect selection accuracies, of the model averages of phenomic predictions and combined phenomic and genomic predictions were higher than genomic predictions alone. Accuracies depended on the combination of training set and selection candidates. Unsupervised K-Means clustering using the ensembles of predicted values partitioned selection candidates into two groups with high and low mean observed DON content. This study demonstrates the potential of hyperspectral imaging-based phenomic prediction to complement genomic prediction and highlights considerations for prediction-based selection of low DON in wheat.
https://doi.org/10.5061/dryad.d2547d8bx
Description of the SNP data and file structure
Eric Olson, Michigan State University, eolson@msu.edu
File name: AMAT_IMPUTED_NUMERIC_3117ind_15456snp_0.05MAF_0.70coverage_0.10het.csv
This is a .csv file of genotypic data used in genomic prediction of the mycotoxin DON in the publication "Ensembles of genomic and hyperspectral imaging-based predictions enable selection for reduced deoxynivalenol content in wheat grains".
SNPs were developed using a double digest RAD seq method. SNPs were initially filtered for 70% coverage on 3,117 individuals, 10% heterozygocity (as F4-derived and inbred lines were genotyped) and 5% minor allele frequency. Nucleotides were then coverted to numeric format. The genotype data is in numeric format with 1=major allele, -1= minor allele, 0=heterozygous. All filtering steps and conversion to numeric format was done in TASSEL5.
Missing numeric genotypes were imputed with the population mean using the A.mat function in the R package, rrblulp (Endelman, 2011).
Description of hyperspectral reflectance data and file structure
------
Ensembles of Genomic and Hyperspectral Imaging-Based Prediction Enable Selection for Reduced Deoxynivalenol Content in Wheat Grains
------
Hyperspectral Imaging (Phenomic) Data
------
Hyperspectral Imaging Sensor: Specim IQ (Specim, Oulo, Finland)
Image Processing: Region of interest (seeds) were separated from the background in QGIS 3.10.2. Processed image was saved as ESRI shapefile (.shp)
Spectral reflectance value generation: Mean for each of the 2024 wavebands in the visible light to near infrared region was generated by averaging the spectral reflectance values from all the pixels in the region of interest. Mean spectral reflectance value was calculated using "raster" package in R.
Number of Trials: 2 (2021, 2022)
Total Number of Genotypes: 558 (2021: n=298, 2022: n=285)
Number of replicates (images) per genotype: two to three
------
For individual trials, BLUEs (best linear unbiased estimates) across replicates (images) of the spectral reflectance values of each genotype
was generated using "lsmeans", and was fitted in linear model with genotype as fixed effect.
------
Across trials (years), BLUEs across replicates (images) of spectral reflectance values of each genotype was generated using "lsmeans", and
was fitted in a mixed linear model using "lme4", with genotype as fixed effect, year (trial) as random effect, and genotype-by-year interaction as random effect.
-----
2021_Phenomic_Data_VIS-NIR.csv : Spectral reflectance values at the VIS/NIR (visible to near-infrared region), 397.2 nm to 1003.58 nm, of the 298 genotypes evaluated in 2021 for Deoxynivalenol content. Values are the mean (BLUEs) spectral reflectance values of two to three images (replicates) per genotype
2022_Phenomic_Data_VIS-NIR.csv : Spectral reflectance values at the VIS/NIR (visible to near-infrared region), 397.2 nm to 1003.58 nm, of the 285 genotypes evaluated in 2022 for Deoxynivalenol content. Values are the mean (BLUEs) spectral reflectance values of two to three images (replicates) per genotype
2021-2022_Across_Years_Phenomic_Data_VIS-NIR.csv : Spectral reflectance values at the VIS/NIR (visible to near-infrared region), 397.2 nm to 1003.58 nm, of the 558 genotypes evaluated in 2021 and 2022 for Deoxynivalenol content. Values are the BLUEs (best linear unbiased estimates) of the genotypes across years, with genotype as fixed effect, year as random effect, and genotype-by-year interaction as fixed effect.