Subspeciesspecific BayesR posterior probabilities of inclusion for a composite population of tropically adapted beef heifers
Data files
Mar 27, 2024 version files 291.45 MB

README.md

warburton_control_bayesr.txt

warburton_subspecies_bayesr.txt
Abstract
Many of the world’s agriculturally important plant and animal populations consist of hybrids of subspecies. Cattle in tropical and subtropical regions for example, originate from two subspecies, Bos taurus indicus (Bos indicus) and Bos taurus taurus (Bos taurus). Methods to derive the underlying genetic architecture for these two subspecies are essential to develop accurate genomic predictions in these hybrid populations. We propose a novel method to achieve this. First, we use haplotypes to assign SNP alleles to ancestral subspecies of origin in a multibreed and multisubspecies population. Then we use a BayesR framework to allow SNP alleles originating from the different subspecies differing effects. Applying this method in a composite population of B. indicus and B. taurus hybrids, our results show that there are underlying genomic differences between the two subspecies, and these effects are not identified in multibreed genomic evaluations that do not account for subspecies of origin effects. The method slightly improved the accuracy of genomic prediction. More significantly, by allocating SNP alleles to ancestral subspecies of origin, we were able to identify four SNP with high posterior probabilities of inclusion that have not been previously associated with cattle fertility and were close to genes associated with fertility in other species. These results show that haplotypes can be used to trace subspecies of origin through the genome of this hybrid population and, in conjunction with our novel Bayesian analysis, subspecies SNP allele allocation can be used to increase the accuracy of QTL association mapping in genetically diverse populations.
README: Subspeciesspecific BayesR Posterior Probability of Inclusion
https://doi.org/10.5061/dryad.q2bvq83rk
Th warburton_subspecies_bayesr dataset contains the BayesR posterior probabilities for subspecies specific SNP from the subspecies BayesR analysis using a customised Xmatrix. This analysis has been designed in tropical beef populations which consist of two subspecies, Bos indicus and Bos taurus. Haplotypes are assigned to a subspecies of origin and each SNP that falls within the haplotype window is assigned to that subspecies of origin. As there are two haplotypes per haplotype window, SNP can be classified into homozygous Bos indicus (Bi), homozygous Bos taurus (Bt) and a composite Bos indicus x Bos taurus (Bx) depending upon the ancestral origins of both haplotypes within the haplotype window. A customised Xmatrix was developed that had the dimensions nanim x 3nsnp where nanim is the number of animals and 3nsnp is 3 times the number of SNP on the marker array. This customised Xmatrix was used in a BayesR analysis to estimate the posterior probability of each of the subspecies specific SNP simultaneously.
The posterior probability of inclusion can be calculated for each SNP in this file. This is the probability that the effect of the SNP is nonzero, i.e the probability that the SNP has an effect upon the trait. Posterior probability of inclusion can be calculated by adding posterior probability 2 (PIP2), posterior probability 3 (PIP3) and posterior probability 4 (PIP4) together for each SNP. Alternatively posterior probability of inclusion can be calculated as 1  posterior probability 1 (PIP1).
The warburton_control_bayesr contains the posterior probabilities of the BovineHD markers from the control analysis, with no accounting for subspecies of origin.
Description of the data and file structure
The warburton_subspecies_bayesr dataset contains the data from the subspecies BayesR analysis using the 728,785 SNP on the Bovine HD array. In this data the headings are Subspecies_SNP, PIP1, PIP2, PIP3, PIP4 and beta. In order to use this file to replicate the results discussed in this article, posterior probability of inclusions need to be calculated for each SNP. This can be calculated as described above, for example 1  PIP1 for each SNP. There are three copies of each of the BovineHD array SNP in this file. In this dataset a Subspecies_SNP name with no suffix is the Bos indicus subspecies SNP, the Subspecies_SNP name with the "_A" suffix is the Bos taurus subspecies SNP and the Subspecies_SNP name with the "_B" is the Bos indicus x Bos taurus subspecies SNP. After calculating the posterior probability of inclusion for each Subspecies_SNP the results can be graphed using software of choice.
The warburton_control_bayesr dataset contains posterior probabilities of the 728,785 SNP on the BovineHD array. It has the headings SNP, PIP1, PIP2, PIP3, PIP4 and beta. As for the subspecies BayesR dataset, posterior probability of inclusion needs to be calculated for each SNP and may be graphed using software of choice.
Code/Software
The Xmatrix used in the Subspecies BayesR analysis was created using Julia code which is publicly available on Github
https://github.com/cwarburton85/Subspecies\_Xmatrix.
Methods
This dataset is the posterior probabilities of inclusion from the subspecies specific Bayes R analysis using the subspecies specific Xmatrix using the Bovine HD array. Each Bovine HD array SNP is repeated 3 times in the file. The subspecies SNP can be differentiated by SNP name, the first SNP is the Bos indicus subspecies SNP, the "_A" SNP is the Bos taurus subspecies SNP and the "_B" SNP is the Bos indicus x Bos taurus subspecies SNP (Bx).