Skip to main content
Dryad

Dataset to evaluate the impact of environmental kernels in genomic prediction models

Data files

Feb 10, 2026 version files 7.05 MB

Click names to download individual files

Abstract

Integrating genomic and environmental information holds the potential for enhancing the predictive power of genomic prediction models when accounting for the genotype-by-environment interactions. Hence, incorporating environmental covariates (EC) into these models can significantly influence their predictive accuracy. In this study, we utilized 1379 genotypes from the SoyNAM dataset, evaluated across four environments and genotyped with 4611 single-nucleotide polymorphism markers, to compare models incorporating genotype-by-environment and genotype-by-environmental covariate interactions using different covariance matrices. We evaluated four approaches: summarizing EC by averaging (AVG), filtering ECs based on a coefficient of determination criterion (FILT), segmenting ECs by crop phenology (STG), and a naïve approach that utilized all available information (ALL). Predictive ability was assessed as the Pearson correlation between the genomic estimated breeding values and the adjusted phenotypes, considering 10 replicates of three cross-validation scenarios (CV2: predicting tested genotypes in observed environments; CV1: untested genotypes in observed environments; CV0: tested genotypes in novel environments). Incorporating EC information into the models increased average predictive ability from 0.42 to 0.56 for CV1 and CV2. In these cases, the predictive ability was lower when EC information was averaged to compute the environmental kinship matrix, with slight differences observed with respect to the other approaches. Regarding the CV0 scheme, the model incorporating only genotype-by-environment information performed better (0.33). The naïve method, which utilized all available EC information (ALL), proved to be a promising approach, as it effectively improved the results in these scenarios while eliminating the need for additional steps in selecting variables.