Skip to main content

Rice grain weight and candidate gene marker dataset


C., Anilkumar (2023), Rice grain weight and candidate gene marker dataset, Dryad, Dataset,


It is hypothesized that the genome-wide genic markers may increase the prediction accuracy of genomic selection for quantitative traits. To test this hypothesis, a set of candidate gene based markers for yield and grain traits-related genes cloned across the rice genome were custom-designed. A multi-model, multi-locus genome-wide association study (GWAS) was performed using new genic markers developed to test their effectiveness for gene discovery. Two multi-locus models, FarmCPU and mrMLM, along with a single-locus mixed linear model (MLM), identified 28 significant marker trait associations. These associations revealed novel causative alleles for grain weight and pleiotropic associations with other traits. For instance, the marker YD91 derived from the gene OsAAP3 on chromosome 1 was consistently associated with grain weight, while the gene has a significant effect on grain yield. Furthermore, nine genomic selection methods, including regression-based and machine learning-based models, were used to predict grain weight using a leave-one-out five-fold cross validation approach to optimize the genomic selection model with genic markers. Among nine prediction models, Kernel Hilbert Space Regression (RKHS) is the best among regression-based models, and Random Forest Regression (RFR) is the best among machine learning-based models. Genomic prediction accuracies with and without GWAS significant markers were compared to assess the effectiveness of markers. The rapid decreases in prediction accuracy upon dropping GWAS significant markers indicate the effectiveness of new genic markers in genomic selection. Apart from that, the candidate gene-based markers were found to be more effective in genomic selection programs for better accuracy.