Genetic testing predicts appearance but not behavior in dogs
Data files
Aug 11, 2025 version files 19.49 GB
-
darwins_dogs_genetic_set.zip
2.67 GB
-
darwins_dogs_gwas_input_files.zip
424.30 KB
-
darwins_dogs_gwas_output_files.zip
14.60 GB
-
darwins_dogs_heritability_output_files.zip
116.34 KB
-
README.md
6.12 KB
-
supplemental_information_gwas.zip
2.22 GB
Aug 11, 2025 version files 19.49 GB
-
darwins_dogs_genetic_set.zip
2.67 GB
-
darwins_dogs_gwas_input_files.zip
424.30 KB
-
darwins_dogs_gwas_output_files.zip
14.60 GB
-
darwins_dogs_heritability_output_files.zip
116.34 KB
-
README.md
6.13 KB
-
supplemental_information_gwas.zip
2.22 GB
Abstract
Genetic tests for behavioral and personality traits in dogs are now being marketed to pet owners, but their predictive accuracy has not been validated. To evaluate the reliability of such tests, we analyzed data from Darwin’s Ark, a community science initiative that includes over 3,000 dogs with both genetic data and individual-level behavioral phenotypes. None of the candidate variants had significant associations or predictive power for behavioral traits as previously reported. However, we found strong associations with aesthetic traits that differentiate breeds, such as height, leg length, and ear shape. Our results suggest that earlier studies using breed-average phenotypes, rather than individually measured phenotypes, were confounded by population structure. Behavior in dogs is polygenic and complex, and thus cannot be accurately predicted using tests that consider only a few genetic variants. Furthermore, behavior in dogs is only moderately heritable, and environmental influences inherently limit the potential accuracy of genomic predictions. Developing meaningful, accurate genetic predictions for complex traits that can improve dog health and welfare will require very large cohorts of individually phenotyped dogs.
The following archives contain data files associated with the manuscript:
Kathryn A. Lord, Vista Sohrab, Kasia Bryc, Michelle E. White, Brittney Kenney, Kathleen Morrill Pirovich, Frances L. Chen, Elinor K. Karlsson, Genetic testing predicts appearance but not behavior in dogs. PNAS. 2025.
Data Files
Genetic Data:
darwins_dogs_genetic_set.zip Contains a subset of the gene dataset.
Darwin's Ark Cohort
A PLINK1 bfile set (.bed/.bim/.fam) with sample IDs encoded as dog IDs matching individuals from the Darwin's Ark survey data files, including genotypes from 3,277 dog whole genomes sequenced under BioProject PRJNA675863.
DarwinsDogs_2024_N-3277_canfam4_gp-0.70_biallelic.bed
DarwinsDogs_2024_N-3277_canfam4_gp-0.70_biallelic.bim
DarwinsDogs_2024_N-3277_canfam4_gp-0.70_biallelic.fam
The following plink set above is then further filtered using minor allele frequency and Hardy-Weinberg p-value thresholds for genetic analysis. See the Methods section of the manuscript for more details.
Genetic analysis input files
darwins_dogs_gwas_input_files.zip contains the phenotype input files and covariate input files used to run the analysis.
phenotype_input_files
This folder contains the following files:
DarwinsArk_4Q_Morphology_N-3277_rerun_20241107.tsvContains level phenotypic data for morphological questions that were recoded differently from the raw survey questions.DarwinsArk_13Q_Q243_coat_color_N-1930.tsvContains level-phenotypic data for the coat color question.DarwinsArk_34Q_Priorities1-3_Behavior_9Q_Morphology_N-3277.tsvContains level-phenotypic data for behavioral questions and morphological questions (raw data in this file and no recoding for morphological questions).DarwinsArk_8Factors_N-3277.tsvContains I-level behavioral factor scores.
Each phenotype input file requires a required dog_id column referring to the dog_id used in both our phenotypic and genetic datasets, followed y columns for the phenotype of interest.
The workflow used to run our genetic analysis creates an individual phenotype file from the following input files above.
Workflow for analysis can be found at https://github.com/VistaSohrab/dog-gwas-heritability-nextflow
FDarwinrwi.n's Ark questio.n key from abbreviation to full text (ie, Q121 refers to dog height), see Supplemental Table 5 (Table S5) of the manuscript.
covariate_input_files
This folder contains the following files:
DarwinsArk_AgeComposed_Height_N-3277.tsvcontains individual-level ages and heights used in all GWAS analyses as a quantitative covariate, except for height GWASDarwinsArk_AgeComposed_N-3277.tsvcontains individual-level ages used as a quantitative covariate in height GWASDarwinsArk_Sex_NeuterStatus_N-3277.tsvcontains individual-level sex status (sex and neuter status) as a single discrete covariate used in all the GWAS
Genetic analysis output files (GWAS)
darwins_dogs_gwas_output_files.zip contains all GCTA leave-one-chromosome-out mixed linear model association output files (.loco.mlma)
This folder contains GWAS output files for behavioral and morphological questions, as well as behavioral factors.
Heritability output files (GCTA-GREML)
darwins_dogs_heritability_output_files.zip contains heritability output files from GCTA-GREML
This folder contains GCTA-GREML output files (both LD-corrected and non-LD-corrected) (both constrained and unconstrained)
.lds.hsqrefers to LD-corrected and constrained (most reliable and reported in the manuscript).lds.no-constraint.hsqrefers to LD-corrected and no constraint.no-lds.hsqrefers to non-LD-corrected and constrained (can refer to this file in case the LD-corrected and constrained run was unsuccessful).no-lds.no-constraint.hsqrefers to non-LD-corrected and no constraint
Supplemental information: GWAS analysis
supplemental_information_gwas.zip contains GCTA leave-one-chromosome-out mixed linear model association output files (.loco.mlma) from 5 phenotypes (2 behavioral questions, 2 behavioral factors, and height as morphological question) pera formed on mixed breed random sampling of mixed-breed and single-breed dogs from the full dataset.
random_dogsIn the GWAS output file name or the phenotype input file column name refers to dogs randomly sampled from the full datasetmixed_dogsIn the GWAS output file or the phenotype input file column name refers to mixed-breed dogs
The phenotype input files can be found in the supp_gwas_phenotype_input_files folder. Covariate files are the same as other GWAS (age and height, sex status) or (age, sex status) for height.
Data availability
FASTQ files for all genetic information in this study are available in the NIH Sequence Read Archive (project PRJNA675863) at https://www.ncbi.nlm.nih.gov/sra?term=PRJNA675863. The genetic data, as well a,s the GWAS and heritability input and output files are avail, are available on Dryad at (DOI: 10.5061/dryad.83bk3jb4r). The code for curating the phenotypes and predicting SNP effects on phenotype is available on GitHub at https://github.com/broadinstitute/dog_behavioral_gwas_paper. The pipeline for GWAS and heritability is available at https://github.com/VistaSohrab/dog-gwas-heritability-nextflow/ with the necessary software packaged in a Docker container at https://hub.docker.com/r/vistasohrab/terra-dogagingproject-gwas.
Contact
If you have further questions about the data file archive, then contact the authors, Kathryn L. Ord (kathryn.lord@umassmed.edu) and Elinor Karlsson (elinor.karlsson@umassmed.edu), and we will resolve them as soon as possible.
