Skip to main content
Dryad logo

Population structure and genetic diversity of sheep breeds in the Kyrgyzstan


Deniskova, Tatiana et al. (2020), Population structure and genetic diversity of sheep breeds in the Kyrgyzstan, Dryad, Dataset,


Sheep are a main livestock species of Kyrgyzstan, a Central Asian country with predominating mountain terrain. The current gene pool of local sheep resources has been forming under diverse climate conditions from the era of the trading caravans of the Great Silk Road, through the Soviet period of large-scale livestock improvements, which was followed by the deep crisis at the end of the 20th century, up to now. However, not much is known about the genetic background and variability of the local sheep populations. Therefore, our aims were to provide a characterization of the population structure and genetic relations within the Kyrgyz sheep breeds and to study their genetic connections with the global sheep breeds using SNP analysis. Samples of the Alai (n=31), Gissar (n=30), Kyrgyz coarse wool (n=13), Aykol (n=31), and Tien-Shan (n=24) breeds were genotyped with the OvineSNP50 BeadChip or the Ovine Infinium HD BeadChip (Illumina Inc., USA). The measure of inbreeding based on runs of homozygosity showed a minimum value in the Aykol breed (FROH = 0.034), while the maximum was found in the Alai breed (FROH = 0.071). Short ROH segments (ROH≤4Mb) were predominant in all breeds. Long ROH segments (ROH>16Mb) were absent in the Gissar breed. The Gissar and Aykol breeds had the highest values of the effective population sizes estimated for five generations ago (Ne5=660 and 563), whereas the Alai and Kyrgyz coarse wool displayed lower values (Ne5 =176 and 128, respectively). The synthetic origin of the Aykol breed was clearly evidenced by all analyses applied. Based on the network and admixture analyses of the Kyrgyz and global sheep breeds, the Tien-Shan and the Russian semi-fine wool breeds demonstrated a common ancestry that most likely is due to a contribution of the Lincoln breed. The Gissar, Aykol and Kyrgyz coarse wool breeds showed a genetic background predominating in sheep populations from Iran and China whereas the Alai demonstrated the different ancestry type. The revealed admixture patterns probably resulted from the exchange and trade during the era of the Great Silk Road, which partly overlapped with historical and archeological findings.


Data collection.

This study does not involve any endangered or protected species. The animal tissue samples were collected by trained personnel under strict veterinary rules. Sampling was performed in accordance with the ethical guidelines of the L.K. Ernst Federal Science Center for Animal Husbandry. The protocol was approved by the Commission on the Ethics of Animal Experiments of the L.K. Ernst Federal Science Center for Animal Husbandry.



Preparation of the genomic DNA

DNA was extracted from ear tissue samples using Nexttec columns (Nexttec Biotechnology GmbH, Germany) according to the manufacturer's instructions. DNA quality was checked by 1% agarose gel electrophoresis.The concentrations of the dsDNA solutions were measured with a Qubit 3.0 fluorimeter (Life Technologies, USA). The OD260/OD280 ratio of DNA solutions was determined with a NanoDrop-2000 (Thermo Fisher Scientific, Wilmington, DE, USA).

SNP genotyping and Quality control

SNP genotyping was performed using the OvineSNP50 BeadChip (Illumina, San Diego, CA, United States) or the Ovine Infinium HD BeadChip (Illumina, San Diego, CA, United States) (Kijas et al., 2014). Genotype quality control (QC) procedures were performed using PLINK v1.90 (Chang et al., 2015). To consider the accuracy and efficiency of SNP genotyping, valid genotypes for each SNP were determined by setting a cut-off of 0.5 for the GenCall (GC) and GenTrain (GT) scores (Fan et al., 2003). Samples that did not pass the quality criteria (missing genotype call rate 0.1) were excluded from the analysis.

After merging the genotypic data from the 600K and 50K arrays, a total of 42 230 autosomal SNPs that overlapped between the two arrays were left in the analysis. SNPs with a call rate below 0.90, a minor allele frequency (MAF) lower than 0.05, or that were located on sex chromosomes were discarded.

Genetic diversity

The observed heterozygosity (Ho), unbiased expected heterozygosity (HE(u)) (Nei, 1978), rarefied allelic richness (AR) and the inbreeding coefficient (FIS) based on the unbiased expected heterozygosity were calculated using the R package “diveRsity” (Keenan et al., 2013).

Runs of homozygosity (ROH) and genomic inbreeding (FROH)

A window-free method for consecutive SNP-based detection (consecutive runs method (Marras et al., 2015)) implemented in the R package “detectRUNS” (Biscarini et al., 2018) was used. One SNP with missing genotype and up to one possible heterozygous genotype was allowed in the run. The minimum ROH length was 1000 kb.

The genomic inbreeding coefficient based on ROH (FROH) was computed as the sum of the length of all ROH per animal as a proportion of the total autosomal SNP coverage (2.44 Gb).

Effective population sizes

Trends of effective population size (Ne) were estimated from linkage disequilibrium (LD) as implemented in SNeP (Barbato et al., 2015).

Genetic relationships and population structure

Pairwise genetic differentiation (fixation index, FST) (Weir and Cockerham, 1984) between all pairs of sheep breeds were calculated using the R package “diversity” (Keenan et al., 2013). Neighbor-Net graphs based on the matrix of pairwise FST values were constructed using SplitsTree 4.14.5 software (Huson and Bryant, 2006).

A multidimensional scaling analysis (MDS) based on pairwise identical-by-state (IBS) distances and Principal Component Analysis (PCA) were performed with PLINK v1.90 and visualized with the R package “ggplot2” (Wickham, 2009).

Genetic admixture was inferred using Admixture v1.3 software (Alexander et al., 2009) and plotted with the R package “pophelper” (Francis, 2017).

Inference of population splits and mixtures was performed using the TreeMix program (Pickrell and Pritchard, 2012).

R version 3.3.2 was used to create the input files (R Core Team, 2018).

Usage Notes

The uploaded data included the data set of 129 animals representing five sheep breeds of the Kyrgyzstan, including  Alai (n=32), Aykol (n=31), Gissar (n=30), Kyrgyz coarse wool (n=13), and Tien-Shan (n=24). The SNP array data .bed, .bim and .map files include a total of 42 230 autosomal SNPs that overlapped between the the 600K and 50K Illumina arrays.


Russian Scientific Foundation (RSF), Award: 19-16-00070

Ministry of Science and Higher Education of Russia, Award: 0445-2019-0026 (АААА-А18-118021590138-1)