Five genetic variants explain over 70% of hair coat pheomelanin intensity variation in purebred and mixed breed domestic dogs - Supporting information
Slavney, Andrea et al. (2021), Five genetic variants explain over 70% of hair coat pheomelanin intensity variation in purebred and mixed breed domestic dogs - Supporting information, Dryad, Dataset, https://doi.org/10.5061/dryad.ttdz08kxt
In mammals, the pigment molecule pheomelanin confers red and yellow color to hair, and the intensity of this coloration is caused by variation in the amount of pheomelanin. Domestic dogs exhibit a wide range of pheomelanin intensity, ranging from the white coat of the Samoyed to the deep red coat of the Irish Setter. While several genetic variants have been associated with specific coat intensity phenotypes in certain dog breeds, they do not explain the majority of phenotypic variation across breeds. In order to gain further insight into the extent of multigenicity and epistatic interactions underlying coat pheomelanin intensity in dogs, we leveraged a large dataset obtained via a direct-to-consumer canine genetic testing service. This consisted of genome-wide single nucleotide polymorphism (SNP) genotype data and owner-provided photos for 3,057 pheomelanic mixed breed and purebred dogs from 63 breeds and varieties spanning the full range of canine coat pheomelanin intensity. We first performed a genome-wide association study (GWAS) on 2,149 of these dogs to search for additional genetic variants that underlie intensity variation. GWAS identified five loci significantly associated with intensity, of which two (CFA15 29.8 Mb and CFA20 55.8 Mb) replicate previous findings and three (CFA2 74.7 Mb, CFA18 12.9 Mb, CFA21 10.9 Mb) have not previously been reported. In order to assess the combined predictive power of these loci across dog breeds, we used our GWAS data set to fit a linear model, which explained over 70% of variation in coat pheomelanin intensity in an independent validation dataset of 908 dogs. These results introduce three novel pheomelanin intensity loci, and further demonstrate the multigenic nature of coat pheomelanin intensity determination in domestic dogs.
Cheek cell samples were collected by dog owners with buccal swabs, and DNA was extracted by Illumina, Inc. and genotyped at 221,188 biallelic autosomal and X chromosome markers on the Embark Veterinary custom Illumina CanineHD SNP array. Several filtering steps were performed to remove individuals and markers that did not meet study requirements (these steps are detailed in the linked manuscript). After these steps, the total genotyping rate was 99.9% across 204,571 markers in 3,057 dogs from 63 different breeds and varieties. Dogs were phenotyped from owner-provided photographs (see linked manuscript for details). Genotype and phenotype data were used as input to a series of genome-wide association studies to identify genomic regions associated with pheomelanin intensity variation, which were run using GEMMA v.0.98.
Descriptions of Supplementary Information files:
S1 Figure: Phenotyping validation on 350 randomly selected dogs: Strip plot showing original versus re-scored 6 point phenotypes for a random sample of 350 dogs from the discovery sample. The correlation coefficient (Pearson’s Rho) between the original and new phenotype scores is shown in the upper left hand corner of the plot.
S2 Figure: Manhattan plots for additional GWAS: A. 6pt phenotype, no covariates. B. Binary phenotype, with covariates. C. Binary phenotype, no covariates. The data used to generate these plots are available in S1 File and S3 File.
S3 Figure: Detailed view of regions surrounding the top CFA2 (A), 15 (B), 18 (C), 20 (D), and 21 (e) GWAS markers. Each panel shows the genomic region defined by the positions of the first upstream marker and last downstream marker with r2 ≥ 0.2 with the most significant GWAS marker on the chromosome (indicated by a red “x”). The top panel of each figure shows the GWAS -log10(p-value) and physical position of all GWAS markers in the region, colored by their r2 with the top GWAS marker. The bottom panel of each figure shows the canFam3.1 locations of known dog transcripts in this region. Transcription ranges are shown as dark blue rectangles, each of which is labelled with its Ensembl Genes (version 95)  transcript or gene name and its strand orientation (“>” = plus strand, “<” = minus strand).
S4 Figure: CFA15 top marker genotype correlates with sequencing coverage in known CNV: A. Boxplots overlaid with strip plots show the distribution of mean normalized depth of coverage across the CFA15 CNV characterized in Weich et al. 2020  (CFA15: 29,821,450-29,832,950 bp) for dogs with each possible BICF2G630433130 genotype. Each point represents a single dog. Kruskal Wallis test p-values are shown for each pair of genotypes. B. SRA run ID and sample name, breed, BICF2G630433130 genotype (coded as number of red-associated alleles), and CFA15 CNV mean normalized depth of coverage for all dogs shown in A.
S1 Table (TSV): Full breed ancestry breakdown in discovery and validation datasets.
S2 Table (XLXS): Summary of replication of top associations across GWAS using different phenotype encoding, with or without covariates. Columns show, for each top SNP in each GWAS, the marker ID, physical position, gene (if applicable), and the GWAS -log10(p-value), with the red-associated allele and the GWAS beta in parentheses.
S3 Table (TSV): Wild canid WGS SRA accession info and genotypes at top GWAS SNPs used to generate Fig3A.
S4 Table (XLXS): Data used for Fig 4. The “Fig4A” tab contains genetic dosage values at the top five GWAS SNPs for all dogs in the discovery dataset, as well as their six point phenotype values. The “Fig4B” tab includes the following for each top GWAS SNP: number of dogs genotyped (N), allele 1 frequency (p), allele 0 frequency (q), observed mean standardized 6pt phenotype values for each genotype class, the observed homozygote midpoint phenotype, additive effect (a), dominance effect (d), and the allelic substitution effect.
S5 Table (TSV): Data shown in S4 Figure B, including SRA run ID and sample name, breed, BICF2G630433130 genotype, and CFA15 CNV mean normalized depth of coverage for 23 previously published whole genome sequence datasets.
S1 File (TSV): Table with phenotype, phenotyping method (photo or breed average), E locus genotype, A locus genotype, K locus genotype, furnishings genotype, and CFA20:55855406 genotype for all dogs in the discovery and validation samples. These data were used to generate Fig3B.
S2 File (TSV): Phenotyping validation data used to generate S1 Fig.
S3 File (ZIP): Plink binary dataset containing genome-wide genotypes for all 3,057 dogs.
S4 File (ZIP): GWAS outputs for discovery sample: GEMMA .assoc and .log files (.txt) for all GWAS using quantitative and binary phenotype encoding.
S5 File (TSV): Model coefficients and performance metrics for all predictive models incorporating dominance and epistasis.
Embark Veterinary, Inc.
Embark Veterinary, Inc.