Data from: Improving the resolution of canine genome-wide association studies using genotype imputation: a study of two breeds
Data files
May 17, 2021 version files 60.72 MB
-
AHT_WGS_Array_Markers_Extracted.7z
-
Border_Collie_Axiom_Dataset.7z
-
Italian_Spinone_Axiom_Dataset.7z
Abstract
Genotype imputation using a reference panel that combines high-density array data and publicly available whole genome sequence consortium variant data is potentially a cost-effective method to increase the density of extant lower-density array datasets. In this study three datasets (two Border Collie; one Italian Spinone) generated using a legacy array (Illumina CanineHD, 173,662 SNPs) were utilised to assess the feasibility and accuracy of this approach and to gather additional evidence for the efficacy of canine genotype imputation. The cosmopolitan reference panels used to impute genotypes comprised dogs of 158 breeds, mixed breed dogs, wolves, and Chinese indigenous dogs as well as breed-specific individuals genotyped using the Axiom Canine HD array. The two Border Collie reference panels comprised 808 individuals including 79 Border Collies and 426,326 or 426,332 SNPs; and the Italian Spinone reference panel comprised 807 individuals including 38 Italian Spinoni and 476,313 SNPs. A high accuracy for imputation was observed, with the lowest accuracy observed for one of the Border Collie datasets (mean R2 = 0.94) and the highest for the Italian Spinone dataset (mean R2 = 0.97). This study’s findings demonstrate that imputation of a legacy array study set using a reference panel comprising both breed-specific array data and multi-breed variant data derived from whole genomes is effective and accurate. The process of canine genotype imputation, using the valuable growing resource of publicly available canine genome variant datasets alongside breed-specific data, is described in detail to facilitate and encourage use of this technique in canine genetics.
Methods
Three genotype datasets for use in a reference panel for genome-wide imputation of dog genotype data. Two single dog breed datasets (Border Collie and Italian Spinone) generated using the Axiom Canine HD array (>710k markers). One dataset containing genotype data for dogs of multiple breeds, the genotypes for markers on the Axiom array extracted from whole genome sequence variant data and converted to ped and map format using VCFtools.
Usage notes
The three compressed folders were created using 7-Zip 19.00. Each contains genotype data in plink ped and map format. Italian_Spinone_Axiom_Dataset.7z and Border_Collie_Axiom_Dataset.7z contain data generated on the Axiom CanineHD array. The ped and map files in AHT_WGS_Array_Markers_Extracted.7z consist of data extracted from whole genome sequence VCF files for the markers present on the Axiom array, and this folder also contains AHT_WGS_Array_Markers_Extracted_Breed_Info.txt which states the breed of each dog in the dataset. The markers in AHT_WGS_Array_Markers_Extracted.map are identified by their chromosome and position, those in Italian_Spinone_Axiom_Dataset.map and Border_Collie_Axiom_Dataset.map are identified by their Axiom Probe Set ID.