Skip to main content

Associated data files and scripts for 'Ancestry-inclusive dog genomics challenges popular breed stereotypes'

Cite this dataset

Morrill, Kathleen et al. (2022). Associated data files and scripts for 'Ancestry-inclusive dog genomics challenges popular breed stereotypes' [Dataset]. Dryad.


Behavioral genetics in dogs has focused on modern breeds, isolated subgroups <200 years old with distinctive physical, and, purportedly, behavioral characteristics. We interrogated breed stereotypes by surveying owners of 18,371 purebred and mixed-breed dogs, and densely genotyping (~45 million markers) a subset of 2,155 dogs. Most behavioral traits are heritable (h2>25%), and admixture patterns in mixed-breed dogs can reveal breed propensities. However, breed poorly predicts an individual purebred dog’s behavioral phenotype, explaining just 9% of variation. Using genome-wide association, we identify 11 loci significantly associated with howling and other behaviors, and show characteristic breed behaviors are genetically complex. Behavior-associated loci are not unusually differentiated in modern breeds, but breed propensities do align, albeit weakly, with ancestral function. We propose behaviors now perceived as characteristic of modern breeds likely derive from thousands of years of polygenic adaptation predating breed formation, with modern breeds distinguished primarily by aesthetic, not behavioral, traits.


Survey Data. Upon enrollment in Darwin’s Ark (, owners were asked to provide consent for participation and information about their dog’s approximate birth date, sex and spay/neuter status, suspected or known breed(s), purebred registration, and/or photograph. We presented owners with 22 surveys composed of 10-12 questions, for which any number or order can be answered. The majority offered response choices of agreement with statements presented or the frequency of behavior in question on a 5-point Likert scale. Among these surveys, 123 questions were sourced from published and validated canine behavioral and health surveys, including the Dog Personality Questionnaire (DPQ / DPQL), the Canine Health-related Quality of Life Survey (CHQLS), the Dog Impulsivity Assessment Scale (DIAS), the Canine Cognitive Dysfunction Rating scale (CCDR), the Certified Dog Trainer Test (CDTT, International Association of Canine Professionals), and the Dog Obesity Risk and Appetite questionnaire (DORA). The remainder of surveys were developed in collaboration with the International Association of Animal Behavior Consultants (IAABC) (MBT) or include original questions about personality (NEO Five-Factor Inventory), allergies, environment, and physical traits or morphology. We performed a data freeze on sureys obtained until November 15th, 2019.

Genetic Data. We sent owners saliva collection kits (DNA Genotek PG-100 saliva swabs) to sample their dogs. We preferentially sampled dogs with complete survey data as well as dogs from several underrepresented breeds to expand the breed calling panel to include the 100 most common breeds in the US. 159 samples (7.4% of 2,155 dogs included in the genetic data set) had sequencing funded by owner donations to the Darwin’s Ark Foundation. 1,715 dogs were sequenced at coverages of 0.5x to 1.1x depth on the Gencove sequencing platform. Sequencing reads were processed into imputed autosomal (chromosomes 1-38) variant calls through Gencove’s loimpute software and an imputation reference panel containing publicly available whole genome sequence data (mean coverage 22.9x (SD: 14.2x)) for 435 canids and representing 287 dogs of known pure breed ancestry, 6 dogs of unknown ancestry, 100 worldwide indigenous or village dogs, 36 wolves, and 6 other wild canids provided by Elaine Ostrander. 440 dogs underwent genotyping on the Axiom Canine Genotyping Array Set A & B for 1,268,920 variant call sites (1,267,416 SNPs and 1,504 indels), and had genotyped imputed using the same method. For each sample processed by low-pass sequencing or genotyping array with imputation, genotypes with genotype probability below 70% were removed. Then, all VCFs were merged by BCFtools and converted to a PLINK data set. SNPs below a minor allele frequency of 2% and missing in over 20% of individuals were filtered out. Only biallelic SNPs with extreme deviation from Hardy-Weinberg equilibrium, given p-values below 1e-20 in the exact test with mid-p adjustment and at observed/expected heterozygosity ratios under 0.25 or above 1.0, were excluded. After filtering data, 8,518,951 SNPs and 2,155 dogs remained with a total genotyping rate of 97.5%. Owner-reported sexes were encoded in the sample information file, confirmed by relative X-chromosome coverage for sequencing data and the autosomal genotypes of X for genotyping data; in total, 1084 males and 1071 females. Variant IDs were assigned to include chromosome, position, reference allele, and alternate allele.

MuttMix Data. In order to assess perceptions of breed ancestry in mixed-breed dogs by non-owner observers, we designed a survey hosted at by which participants guess the three breeds detected at the largest percentage in each dog. Images and videos were collected from the owners of 30 mixed-breed dogs and one undeclared purebred dog with ancestry assignments. In addition to visual aids, participants were provided with each dog’s size relative to an average person and other physical descriptors such as coat texture or markings. Participants indicated whether they belong to the general public or are a dog professional and/or breeder. Participants were provided with 59 breed options to select as well as a “no choice” option for the third breed slot. Dogs were displayed in random order to participants. Participants were permitted to exit the survey at any time, return later, or leave the survey incomplete but could not skip dogs. The survey launched on April 16th, 2018 and closed on June 16th, 2018, collecting responses from 26,639 people over a two-month period. For genetic breed ancestry, any call below 5% was removed before analysis against survey data, and only breeds offered as survey options were examined.


National Cancer Institute, Award: R01CA255319

National Cancer Institute, Award: R37CA218570

National Human Genome Research Institute, Award: R01HG008742

National Institute on Aging, Award: U19AG057377

National Institute of Mental Health, Award: R21MH109938

Office of the Director, Award: R24OD018250

National Cancer Institute, Award: F32CA247088

National Science Foundation, Award: EF-2022007

Broad Institute, Award: BroadIgnite

Broad Institute, Award: Next10