Single nucleotide polymorphism (SNP) genotypes of Cashmere goat (Capra hircus) populations from Mongolia
Data files
Sep 05, 2024 version files 165.26 MB
-
Mongolian_Goat_Axiom_FinalReport_SNP_Genotype_Data_02_05_2022.txt
-
README.md
Abstract
README: Single nucleotide polymorphism (SNP) genotypes of Cashmere goat (Capra hircus) populations from Mongolia
Description of the data and file structure
The descriptions of the columns are as follows:
- probeset_id: The Affymetrix unique identifier for the set of probes used to detect a particular Single Nucleotide Polymorphism (SNP).
- Columns starting with "MNG_GAU": Individual identification of genotyped goats
- Affy_SNP_ID: The Affymetrix unique identifier for the set of probes used to detect a particular SNP.
- Chr_id: The chromosome on which the SNP is located
- Start: The nucleotide base start position where the SNP is found. The genomic coordinates given are in relation to the current genome version and may shift as subsequent genome builds are released.
- Stop: The nucleotide base stop position where the SNP is found. The genomic coordinates given are in relation to the current genome version and may shift as subsequent genome builds are released.
- Strand: Genomic strand that the SNP resides on.
- dbSNP_RS_ID: The dbSNP ID that corresponds to this probe set or SNP.
- Flank: The nucleotide sequence surrounding the SNP. This is a 33-mer sequence with 16 nucleotides on either end of the SNP position. The alleles at the SNP position are provided in the brackets.
- Allele_A: Allele A following the naming convention (see details below).
- Allele_B: Allele B following the naming convention (see details below).
- Associated_Gene: Associations (if any) with human genes by comparing the genomic locations of the SNPs to genomic alignments of human mRNA sequences
- Ordered_Alleles: A list of alleles alphabetically ordered by abstract allele code.
- affy_snp_id: The Affymetrix unique identifier for the set of probes used to detect a particular SNP
- CR: Call rate (CR) is the percentage of samples with a genotype call other than "No Call" for the SNP.
- FLD: Fisher's Linear Discriminant (FLD) is a measure of the cluster quality of a probeset.
- HomFLD: HomFLD is a version of FLD computed for the homozygous genotype clusters. HomFLD is undefined for probesets without two homozygous clusters.
- HetSO: Heterozygous Strength Offset measures how far the heterozygous cluster center sits above the homozygous cluster centers in the Size dimension (Y position).
- HomRO: Homozygote Ratio Offset is the distance to zero in the Contrast dimension (X position) from the center of the homozygous cluster that is closest to zero.
- Nclus: The number of genotype clusters.
- n_AA: The number of AA calls.
- n_AB: The number of AB calls.
- n_BB: The number of BB calls.
- n_NC: The number of NoCall calls, including NoCall_1 (haploid).
- hemizygous: Hemizygous flag is 1 if the probeset measures chromsome Y or mitochondrial DNA, indicating that diploid genotypes are not possible. Otherwise the flag is 0.
- gender_metrics: Also knowns a s "sex metrics". The metrics are calculated by sex depending on the chromosomes. What is displayed is restricted to the sex in the sex_metrics column. For chromosomes MT and CP: all sexes, no splitting by sex. Chromosome X: all metrics on females, some metrics on males. Chromosome Y: all metrics on males, a small number of metrics on females.
- ConversionType: Probeset classification
- BestProbeset: BestProbeset flag is available when multiple probesets are mapped to the same SNP (affy_snp_id) by a ps2snp file. A probeset is selected based on the priority order of the conversion types. BestProbeset flag is 1 when it is the best or only probeset for a SNP. Otherwise the flag is 0.
- BestandRecommended: BestandRecommended flag is 1 if BestProbeset is 1 and the ConversionType belongs to the Recommended set of conversion types. Otherwise the flag is 0.
- HomHet: HomHet flag is 1 if, when two diploid genotype clusters are present, one cluster is homozygous and the other is heterozygous. Otherwise the flag is 0.
- MinorAlleleFrequency: The proportion of the less frequent allele.
- H.W.p-Value: Hardy Weinberg p-value significance measure
- Call Modified: Call Modified flag is True if any calls for this probeset are changed since the batch results were first created. If no calls are changed the flag is False.
A-B SNP naming convention
the following naming convention is used to
assign allele nucleotide bases to the "Abstract" allele codes "A" and "B":
- SNPs are fixed on the forward strand of the design-time reference genome.
- For AT or CG SNPs (SNP alleles are A/T or C/G), the alleles are named in alphabetical order (A and C are the "A" alleles, in these cases);
- For non-AT and non-CG SNPs, allele A is A or T, allele B is C or G;
- For indels, allele A is -, allele B is the insertion.
- For multi-base alleles, the alleles are named in alphabetical order. (For [AGT/TTA], AGT would be "Allele A". For [GGT/TTA], GGT would be "Allele A".)
More information in the "Probeset summary table definitions" chapter of Axiom Analysis Suite USER GUIDE.
Code/Software
The data was prepared using the Axiom Analysis Suite.
Methods
The data set contains raw Affymetrix (Axiom Goat 60K SNP array) single nucleotide polymorphism (SNP) genoptypes, used for genetic characterization of 14 cashmere goat populations in Mongolia. The data set consists of 41007 SNP genotypes for 1256 goats in nucleotide ACGT coding. The missing genotypes are denoted as "---".
The following goat populations of Mongolia are represented:
Breeds
Alai Ulaan (ATU), Ulgiin Ulaan (ULU), Zavkhan Buural (ZBL), Erchim Khar (ERK), Zalaa Jinst Edren (ZJE), Bayandelger Ulaan (BDU), Govi Gurvan Saikhan (GGS), Uuliin Bor (UBR); Lines: (Galshar Ulaan (GLU) and Bumbugur Ulaan (BUU)
Local populations
Mungun Sort Khar (MSK), Khuvchiin Ulaan (KHU), Tsagaan Ovoo Khar (TSK) and Teeliin Ulaan (TEU).