Data from: Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP

van den Berg, Sanne, Wageningen University & Research

Calus, Mario P. L., Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, The Netherlands

Meuwissen, Theo H. E., Norwegian University of Life Sciences

Wientjes, Yvonne C. J., Wageningen University & Research

Published Dec 04, 2016 on Dryad. https://doi.org/10.5061/dryad.rq80k

Cite this dataset

van den Berg, Sanne; Calus, Mario P. L.; Meuwissen, Theo H. E.; Wientjes, Yvonne C. J. (2016). Data from: Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP [Dataset]. Dryad. https://doi.org/10.5061/dryad.rq80k

Abstract

Background: The use of information across populations is an attractive approach to increase the accuracy of genomic prediction for numerically small populations. However, accuracies of across population genomic prediction, in which reference and selection individuals are from different populations, are currently disappointing. It has been shown for within population genomic prediction that Bayesian variable selection models outperform GBLUP models when the number of QTL underlying the trait is low. Therefore, our objective was to identify across population genomic prediction scenarios in which Bayesian variable selection models outperform GBLUP in terms of prediction accuracy. In this study, high density genotype information of 1033 Holstein Friesian, 105 Groningen White Headed, and 147 Meuse-Rhine-Yssel cows were used. Phenotypes were simulated using two changing variables: (1) the number of QTL underlying the trait (3000, 300, 30, 3), and (2) the correlation between allele substitution effects of QTL across populations, i.e. the genetic correlation of the simulated trait between the populations (1.0, 0.8, 0.4). Results: The accuracy obtained by the Bayesian variable selection model was depending on the number of QTL underlying the trait, with a higher accuracy when the number of QTL was lower. This trend was more pronounced for across population genomic prediction than for within population genomic prediction. It was shown that Bayesian variable selection models have an advantage over GBLUP when the number of QTL underlying the simulated trait was small. This advantage disappeared when the number of QTL underlying the simulated trait was large. The point where the accuracy of Bayesian variable selection and GBLUP became similar was approximately the point where the number of QTL was equal to the number of independent chromosome segments (M e ) across the populations. Conclusion: Bayesian variable selection models outperform GBLUP when the number of QTL underlying the trait is smaller than M e . Across populations, M e is considerably larger than within populations. So, it is more likely to find a number of QTL underlying a trait smaller than M e across populations than within population. Therefore Bayesian variable selection models can help to improve the accuracy of across population genomic prediction.

Usage notes

Genotypes_26503SNPs

File with ID and genotypes for 26503 SNPs for 1285 animals (from three breeds).

ID_Breed

File with the breeds for each of the 1285 animals. File contains: ID, breedcode (HF=Holstein Friesian, GWH=Groningen White Headed or MRY=Meuse-Rhine-Yssel).

Phenotypes_GenCor_1

Files with the simulated phenotypes (100 replicates) for the scenario using a genetic correlation of 1 between the breeds. Each of the files contains: ID, Phenotype with 3000 QTL, Phenotype with 300 QTL, Phenotype with 30 QTL, Phenotype with 3 QTL.

Phenotypes_GenCor_0.8

Files with the simulated phenotypes (100 replicates) for the scenario using a genetic correlation of 0.8 between the breeds. Each of the files contains: ID, Phenotype with 3000 QTL, Phenotype with 300 QTL, Phenotype with 30 QTL, Phenotype with 3 QTL.

Phenotypes_GenCor_0.4

Files with the simulated phenotypes (100 replicates) for the scenario using a genetic correlation of 0.4 between the breeds. Each of the files contains: ID, Phenotype with 3000 QTL, Phenotype with 300 QTL, Phenotype with 30 QTL, Phenotype with 3 QTL.