Data from: Blood biomarkers and breed genetics of aging in pet dogs
Data files
Mar 30, 2026 version files 68.04 GB
-
baseline_phenotype_datasets.zip
54.89 MB
-
baseline_phenotype_kurtosis_survey_response_distribution_files.zip
44.62 KB
-
dap_genetic_plinkset.zip
347.83 MB
-
dap_gwas_input_files.zip
8.68 MB
-
dap_gwas_output_files.zip
67.61 GB
-
DAP_supp_data.rds
1.63 MB
-
longitudinal_Rdata_files_for_survival_analysis.zip
7.89 MB
-
README.md
12.35 KB
-
S_GWAS_RAW_OVERLAP.tsv
6.13 MB
-
S_HUMAN_TRAITS.tsv
1.13 MB
-
S_TRAIT_MATCH.tsv
1.10 MB
Abstract
Aging trajectories vary widely among individuals, yet how genetic variation shapes lifespan remains poorly understood. Pet dogs share human-like environments while aging on a compressed timescale, making them a powerful translational model. Using genomic and phenotypic data from 7,627 dogs in the Dog Aging Project, including 976 profiled for 159 blood metabolites and clinical analytes, we generated the first genome-wide association study (GWAS) catalog in dogs. Most blood traits map to orthologous loci in dogs and humans, indicating conserved pathways. Breed explains substantial variance in blood traits, and selection on visible characteristics, such as fur type, has pleiotropic metabolic effects. Leveraging mosaic ancestry in mixed-breed dogs and longitudinal mortality data, we identify blood traits elevated in short-lived breeds that predict mortality risk, including globulin and potassium, and protective traits enriched in long-lived breeds, such as ethanolamine. Although some aging-associated traits relate to growth hormone pathways, many do not, indicating that aging in dogs is multifactorial. Pet dogs uniquely combine genetic and environmental advantages for identifying blood-based biomarkers of aging and testing interventions.
The following archives contain data files associated with the manuscript:
Vista Sohrab, Michelle E. White, Benjamin R. Harrison, Rob Bierman, Abbey Marye, Kathleen Morrill Pirovich, Diane Genereux, Kate Megquier, Xue Li, Brittney Kenney, Cindy Reichel, Dog Aging Project Consortium, Noah Snyder-Mackler, Joshua M. Akey, Daniel E.L. Promislow, Frances Chen, Elinor K. Karlsson, Blood biomarkers and breed genetics of aging in pet dogs. bioRxiv. 2026.
Data Files
Genetic Data:
dap_genetic_plinkset.zip contains the 2023 genetic dataset of the Dog Aging Project in PLINK1 format. This genetic dataset was aligned to the canfam4 reference genome and imputed using the Dog10K imputation reference panel with GLIMPSE2.
Darwin's Ark Cohort
A PLINK1 bfile set (.bed/.bim/.fam) with sample IDs encoded as dog IDs used by the Dog Aging Project, including genotypes from 7,627 dogs enrolled in the Dog Aging Project (raw low-pass whole genome sequencing results found on SRA at BioProject PRJNA800779)
DogAgingProject_2023_N-7627_canfam4_gp-0.70_biallelic.bed
DogAgingProject_2023_N-7627_canfam4_gp-0.70_biallelic.bim
DogAgingProject_2023_N-7627_canfam4_gp-0.70_biallelic.fam
The following plink set above can be further filtered using minor allele frequency and Hardy-Weinberg equilibrium p-value thresholds for genetic analysis. See the Methods section of the manuscript for more details.
Baseline phenotype dataset files
baseline_phenotype_datasets.zip contains the original Rdata files from Dog Aging Project data releases used to generate the baseline phenotypes or covariate files.
This folder contains all the raw original files and intermediate files used to get the baseline phenotypes for genetic studies.
-
DAP_2022_AFUS_dog_owner_v1.0.Rdata,DAP_2022_CSLB_v1.0.Rdata,DAP_2022_HLES_dog_owner_v1.0.Rdataare the 2022 survey Rdata files with baseline DORA and CSLB phenotypes extracted from AFUS and CSLB Rdata files.
The baseline 2022 HLES data file was used to extract all other baseline phenotypes. -
DAP_2023_HLES_dog_owner_v1.0.Rdatawas used to extract dog weight andDAP_2023_DogOverview_v1.0.Rdatawas used to extract sex status covariate for dogs. -
DAP_2024_SamplesResults_CBC_v1.0.csv,DAP_2024_SamplesResults_ChemistryPanel_v1.0.csv,DAP_2024_SamplesResults_Metadata_v1.0.csvwas used to extract baseline blood phenotypes. Columns with "Modifier" in their name have responses that are NA, "<", ">" and were therefore excluded from analysis. -
Precision_techAdjustedDatais the original Rdata file containing baseline metabolite phenotypes. To load data file into R:load("Precision_techAdjustedData")
HLES_CSLB_DORA_2022_DAP_response_codebook_20231219_191Q_subset_forSurveyDataCleaning.txtwas the codebook used to convert DAP_coded_response to coded_response for phenotypes extracted from original 2022 raw dataset.SampledDogCohortKey_20250129.csvwas used to extract dogs specific to the Precision dataset for subsetting dogs that make it into the blood and metabolite datasets.scaled_cleaned_DAP2022_baseline_survey_20231219.tsvis the cleaned dataset of baseline phenotypes from 2022.- The Dog Aging Project codebook describing the variables can be found at https://github.com/dogagingproject/dataRelease/blob/master/Codebooks/DAP_2024_CODEBOOK_v1.0.csv, with each data release having its own codebook at https://github.com/dogagingproject/dataRelease/tree/master/Codebooks.
- Many of the NAs in the survey questions indicate "not applicable", with many cases arising from nested survey questions that were not shown to dog owners based on their response to a prior question. NAs in clinical lab data indicate "not available", meaning that no measurement was reported for that variable.
- The clinical lab data units and reference intervals for blood chemistry and complete blood count phenotypes can be found at https://github.com/dogagingproject/dataRelease/blob/master/Supporting_Documents/DAP_Clinical_Lab_Reference_Intervals.md
Kurtosis results of baseline phenotypes and response distributions of survey questions
baseline_phenotype_kurtosis_survey_response_distribution_files.zip includes kurtosis results for all quantitative phenotypes (survey questions and blood traits) as well as survey response distributions used to exclude ordinal survey questions with low phenotypic variability (ie 85% of responses for a single option)
blood_cbc_sqrt_chem_ln_transformed_kurtosis.tsvhas kurtosis results for blood phenotypes (complete blood count and blood chemistry traits).DAP_N-13_quantitative_survey_question_kurtosis_results.tsvhas kurtosis results for 13 quantitative survey questions considered in our genetic analysis.DAP_N-185_survey_question_response_stats.tsvcontains the response distributions of survey questions being considered for our genetic analysis.DAP_retained_quantitative_survey_questions_N-8_kurtosis_results.tsvcontains the kurtosis results of the 8 quantitative survey questions included based on our threshold of excluding questions with kurtosis > 7.DAP_retained_survey_questions_N-146_response_stats.tsvcontains the response distributions of survey questions where no single answer has at least 85% of the responses.finalized_survey_questions_in_genetic_analysis_N-154.tsvcontains the 154 survey questions retained in our genetic analysis.metabolites_excluded_kurtosis.tsvcontains the kurtosis results for the 10 metabolites with heavy-tailed distribution (kurtosis > 10) that led to the 123 metabolites analyzed in our study, instead of 133 metabolite phenotypes available in the original dataset.metabolites_kurtosis.tsvcontains kurtosis results for all 133 metabolites provided in the original dataset.
Genome-wide association studies input files (GWAS)
dap_gwas_input_files.zip contains the annotation file, phenotype input files, and covariate input files used to run genetic analysis.
The phenotype input files were used as input for GWAS, ANOVA, and LMER (linear mixed effects regression) analyses.
UU_Cfam_GSD_1.0_ROSY.refSeq.ensformat.genes.validchr.bed is the annotation file provided to the Nextflow GWAS and heritability pipeline.
This directory is divided into 3 subdirectories for the broad category of phenotype analyzed:
blood_cbc_chemcontains input files for complete blood count and blood chemistry phenotypes.Sex_Class_at_HLES_972_precision_dogs.tsvis the discrete covariate input file for complete blood count and blood chemistry phenotypes.age_at_sample_collection_weight_kg_hours_fasted_cbc_935_precision_dogs.tsvis the quantitative covariate input file for complete blood count phenotypes.age_at_sample_collection_weight_kg_hours_fasted_chem_966_precision_dogs.tsvis the quantitative covariate input file for blood chemistry phenotypes.blood_chemistry_21pheno_971_precision_dogs.tsvis the blood chemistry phenotype input file provided to the Nextflow GWAS and heritability pipeline.complete_blood_count_21pheno_962_precision_dogs.tsvis the complete blood count phenotype input file provided to the Nextflow GWAS and heritability pipeline.
metabolitescontains input files for metabolite phenotypes.age_weight_hoursFasted_937_precision_dogs.tsvis the quantitative covariate file for all metabolite phenotypes.Sex_Class_at_HLES_937_precision_dogs.tsvis the discrete covariate file for all metabolite phenotypes.- There are 7 metabolite phenotype input files. Each phenotype file was submitted 1 at a time to run the metabolite GWAS in batches to ensure all jobs for a batch complete successfully.
- The 123 metabolite phenotypes are also present across 2 files (used as input files for ANOVA and LMER) called
precision2024_120metabolites_937dogs_phenotypes.tsvandprecision2024_3metabolites_937dogs_phenotypes.tsv.
survey_questionscontains input files for survey-based phenotypes.baseline_AFUS_DORA2022_33questions_N-43517.tsvis the phenotype file for baseline eating behavior (DORA) survey questions.baseline_AFUS_MDORS2022_25questions_N-43517.tsvis the phenotype file for baseline owner-dog relationship (MDORS) survey questions.baseline_AFUS2022_GeneticSet2023_Estimated_Age_Years_at_baseline_AFUS_afus_dd_dog_weight_lbs_N-7627.tsvis the quantitative covariate file used for baseline questions extracted from AFUS surveys (DORA and MDORS).baseline_CSLB2022_4questions_N-43517.tsvis the phenotype file for baseline cognitive (CSLB) survey questions.baseline_CSLB2022_GeneticSet2023_Estimated_Age_Years_at_baseline_CSLB_dd_weight_lbs_N-7627.tsvis the quantitative covariate file for baseline cognitive (CSLB) survey question GWAS.- There are 7 survey question input files from the baseline HLES survey submitted in batches based on the survey section starting with prefix "HLES2022_" and the corresponding survey section abbreviation.
HLES2022_GeneticSet2023_Estimated_Age_Years_at_HLES_dd_weight_lbs_N-7627.tsvis the quantitative covariate file for all baseline HLES survey questions GWAS (except for dog weight GWAS).HLES2022_GeneticSet2023_Estimated_Age_Years_at_HLES_N-7627.tsvis the quantitative covariate file for the baseline dog weight GWAS.HLES2022_GeneticSet2023_Sex_Class_at_HLES_N-7627.tsvis the discrete covariate file used for all survey-based questions.
Each phenotype input file is required to have a dog_id column referring to the dog_id used in both our phenotypic and genetic datasets followed by individual columns for phenotypes of interest (v1.0.0).
In future versions of the workflow, the term dog_id will be changed to IID to account for species flexibility of the pipeline.
The workflow used to run our genetic analysis creates an individual phenotype file from the following input files above.
Workflow for analysis can be found at https://github.com/VistaSohrab/dog-gwas-heritability-nextflow
GWAS output files
dap_gwas_output_files.zip contains all GCTA leave-one-chromosome-out mixed linear model association output files (.loco.mlma)
This folder contains all 313 GWAS output files for blood traits and survey questions.
Longitudinal RData files used to conduct survival analysis
longitudinal_Rdata_files_for_survival_analysis.zip contains the longitudinal Rdata files for blood phenotypes and metabolites.
longitudinalData2contains the longitudinal results for all metabolites. To load this data file into R:
load("longitudinalData2")
addingCBC_chem_forVista_data5contains the longitudinal results for all complete blood count and blood chemistry traits analyzed in our genetic analysis.
load("addingCBC_chem_forVista_data5")
Supplemental data for manuscript
All supplemental data for the manuscript are available in DAP_supp_data.rds. The titles of the tables and data files can be found in the Supplementary Materials and Methods of the manuscript.
Human and dog GWAS overlap
S_TRAIT_MATCH.tsvcontains pairwise trait scores on a 1–5 scale describing the biological relationship between two blood-measured traits, with 1 referring to the same measurement and 5 being unrelated.S_GWAS_RAW_OVERLAP.tsvincludes all the overlaps identified between dog and human blood traits.S_HUMAN_TRAITS.tsvincludes the GWAS catalog names and canonical names for human traits included in the dog/human overlap analysis
Contact
If you have further questions about the data file archive, contact the corresponding authors
Vista Sohrab (vista.sohrab@umassmed.edu), Frances Chen (francesl.chen@umassmed.edu), and Elinor Karlsson (elinor.karlsson@umassmed.edu), and we will resolve them as soon as possible.
