SNP data set of the Peruvian Creole cattle from southern Peru
Data files
May 27, 2024 version files 7.80 MB
-
Peruvian_Creole_cattle_Figueroa_et_al.vcf
7.80 MB
-
README.md
1.90 KB
Abstract
The Peruvian creole cattle (PCC) was originated after the introduction of cattle into the American continent about five centuries ago, and is an important source of power for agriculture, meat, and milk in the Peruvian highlands, as well as part of cultural traditions. However, little is known about the genetics of the PCC. In order to determine the genetic diversity and structure of the PCC, 69 DNA samples from four southern regions of Peru (Apurimac, Ayacucho, Cusco and Puno) were genotyped using a 100K SNP bead chip. After quality control and LD pruning, 24,200 SNPs were retained for further analysis. Animals were grouped into two clusters (C1: Apurimac, Ayacucho and Cusco, C2: Puno) using principal component analysis and UPGMA dendrogram. STRUCTURE analysis showed that individuals from Puno grouped in one cluster. Expected heterozygosity ranged from 0.399 (Apurimac) to 0.418 (Ayacucho). Negative inbreeding coefficient (FIS) values for PCC from Puno and Ayacucho were also found, possibly due to admixture. The lowest FST (0.005) was estimated for Ayacucho and Cusco cattle populations, and the highest FST (0.028) was reported for Puno and Apurimac cattle population. Small genetic variation among populations (3.65%) but higher variation within populations was found using AMOVA. To the best of our knowledge, this is the first study employing SNP markers in PCC, and as such it is hoped that this helps to pave the way towards its genetic improvement and the urgent sustainable management of creole animals in Peru.
We submitted our raw data (“Peruvian Creole cattle_Figueroa et al.vcf”)
Hair samples of the 74 individuals were genotyped using the Illumina GGP Bovine 100K SNP array by a commercial genotyping service provider, following standard procedures of the manufacturer. SNPs quality control was performed using PLINK v1.9 (Purcell et al., 2007). Only SNPs located on autosomes and with known genomic positions were considered for analysis. Individuals with missing genotypes in more than 10%, SNPs with missing rate higher than 10%, and minor allele frequency (MAF) lower than 0.05 were excluded. Additionally, for diversity and population structure analysis linkage disequilibrium pruning was done, using the PLINK parameter –indep 50 5 2.
From an initial set of 95,256 SNPs and 74 Creole individuals, quality control started by removing SNPs assigned to sex chromosomes and those without genomic locations where 90349 SNPs were left. Then, 5732 SNPs and five Creole individuals were removed due to low call rate and minor allele frequency (MAF); and low genotyping, respectively. As a last step, linkage disequilibrium pruning was performed. Finally, a set of 24200 SNPs and 69 Creole individuals from Cusco (n=17), Apurimac (n=18), Puno (n=23) and Ayacucho (n=11) were retained.
References:
- Sempéré, G., Moazami-Goudarzi, K., Eggen, A., Laloë, D., Gautier, M., & Flori L. WIDDE: a Web-Interfaced next generation Database for genetic Diversity Exploration, with a first application in cattle, BMC Genomics. 2015, 16:940.
- Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P. de Bakker, P. I. W., Daly, M. J. & Sham, P. C. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. doi:10.1086/519795.