Canine genome-wide association study identifies DENND1B as an obesity gene in dogs and humans
Data files
Mar 01, 2025 version files 341.95 MB
-
FCR_phenotypes.txt
15.81 KB
-
FCR_PRSgenotypes.txt
17.64 KB
-
labrador_GWAS_imputed_genetic_data.zip
55.24 MB
-
labrador_GWASresults_gcta-MLMA-loco-LDM_imputed.loco.mlma
286.61 MB
-
labrador_owner_control.txt
9.84 KB
-
labrador_phenotypes.txt
21.10 KB
-
labrador_replicationset_PRSgenotypes.txt
14.44 KB
-
pug_phenotypes.txt
4.41 KB
-
pug_PRSgenotypes.txt
8.06 KB
-
README.md
12.29 KB
Abstract
Obesity is a far-reaching, heritable disease, but its genetic basis is incompletely understood. Canine population history facilitates trait mapping. We performed the first successful canine genome-wide association study for body condition score, a measure of obesity, in 241 Labrador retrievers, with a polygenic score replicated in 350 more. Using an innovative cross-species approach, we showed new canine obesity genes are also associated with rare and common forms of obesity in humans. The lead association in dogs was with rs24430444 within the gene DENN domain containing 1B (DENND1B). Each copy of the alternate allele was associated with ~7% greater body fat per allele. We demonstrate a novel role for this gene in regulating signaling and trafficking of melanocortin 4 receptor, a critical controller of energy homeostasis. Thus, canine genetics identified novel obesity genes and mechanisms relevant to both dogs and humans.
https://doi.org/10.5061/dryad.0vt4b8h85
Description of the data and file structure
This README file explains the input data used for project outlined in manuscript: Canine genome-wide association study identifies DENND1B as an obesity gene in dogs and humans (Science - ads2145).
Each data file is described below:
labrador_GWAS_imputed_genetic_data.zip
- Genetic data for the population of 241 Labrador retrievers used in the primary BCS GWAS for the study
- The data is in plink binary format (.bim, .bed, .fam)
- This dataset is from analysis in only pet dogs
- This data is imputed, utilising a pipeline which imputes genetic array data up to whole genome level
- The pipeline of this data is available here: https://github.com/GOdogs-Project/Imputation/releases/tag/version-0.1 and described in the Supplementary Materials and Methods which is provided alongside the manuscript.
labrador_GWASresults_gcta-MLMA-loco-LDM_imputed.loco.mlma
- Results from the genome wide mixed linear model association analysis used on the GWAS for BCS in 241 Labrador retrievers
- GCTA mlma-loco is the statistical algorithm used
- Explanation of this output format (including detail on each column) can be found on the GCTA website here: https://yanglab.westlake.edu.cn/software/gcta/
- Importantly the p values provided have not been adjusted for lambda and hence we recommend this if being used for further analyses
labrador_replicationset_PRSgenotypes.txt
- Genotypes for 16 SNPs used in the canine polygenic risk score (PRS) - in the replication set of 350 Labrador retrievers
- Format of the genotypes is generated using plink
- This dataset includes both pet dogs and assistance dogs
pug_PRSgenotypes.txt
- Genotypes for 16 SNPs used in the canine polygenic risk score (PRS) - in the replication set of pugs
- This dataset includes only pet dogs
- Format of the genotypes is generated using plink
pug_phenotypes.txt
- Phenotypes for the PRS replication set of pugs
- This dataset includes only pet dogs
labrador_owner_control.txt
- Owner control 'scores' for all Labrador retrievers used in the study.
- Owner control scores are calculated using the Dog Obesity Risk and Appetite (DORA) questionnaire
- Full details are provided in the Supplementary Materials and Methods along with the original DORA paper (E. Raffan, S. P. Smith, S. O’Rahilly, J. Wardle, Development, factor structure and application of the Dog Obesity Risk and Appetite (DORA) questionnaire. PeerJ 3, e1278 (2015))
labrador_phenotypes.txt
- Phenotypes for the Labrador retrievers used In the study, including both the 241 used in the GWAS and the 350 Labradors used a PRS replication set
- This dataset includes both pet dogs and assistance dogs
FCR_PRSgenotypes.txt
- Genotypes for 16 SNPs used in the canine polygenic risk score (PRS) - in the replication set of flat-coated retrievers (FCR)
- This dataset includes only pet dogs
FCR_phenotypes.txt
- Phenotypes for the PRS replication set of flat-coated retrievers
- This dataset includes only pet dogs
Files and variables
File: FCR_phenotypes.txt
Variables
- DogID_anon: anonymised dog ID
- Age_sample_returned: Age of dog in years
- Professional_Weight: Weight of dog in kg
- Professional_BCS: Body Condition Score of dogs according to the Purina 9 point BCS scale
- F_CombinedGreedFactors: Dog food motivation 'score' calculated using the Dog Obesity Risk and Appetite (DORA) questionnaire. Full details are provided in the Supplementary Materials and Methods along with the original DORA paper (E. Raffan, S. P. Smith, S. O’Rahilly, J. Wardle, Development, factor structure and application of the Dog Obesity Risk and Appetite (DORA) questionnaire. PeerJ 3, e1278 (2015))
- sex: Coded as 0/1 where 0 = female and 1 = male
- neuter: coded as 0/1 where 0 = entire/not neutered and 1 = neutered
File: FCR_PRSgenotypes.txt
Variables
- DogID_anon: anonymised dog ID
- All other columns are in format: chr:base-pair-position_allele
- Where 0/1/2 indicates the count for that allele
File: labrador_phenotypes.txt
Variables
- DogID_anon: anonymised dog ID
- sex: Coded as 0/1 where 0 = female and 1 = male
- neuter: coded as 0/1 where 0 = entire/not neutered and 1 = neutered
- Age_sample returned: Age of dog in years
- Professional_Weight: Weight of dog in kg
- Professional_BCS: Body Condition Score of dogs according to the Purina 9 point BCS scale
- FM_score: Dog food motivation 'score' calculated using the Dog Obesity Risk and Appetite (DORA) questionnaire. Full details are provided in the Supplementary Materials and Methods along with the original DORA paper (E. Raffan, S. P. Smith, S. O’Rahilly, J. Wardle, Development, factor structure and application of the Dog Obesity Risk and Appetite (DORA) questionnaire. PeerJ 3, e1278 (2015))
File: labrador_owner_control.txt
Variables
- DogID_anon: anonymised dog ID
- owner_control: Owner control 'scores' per Labrador retriever are calculated using the Dog Obesity Risk and Appetite (DORA) questionnaire Full details are provided in the original DORA paper (E. Raffan, S. P. Smith, S. O’Rahilly, J. Wardle, Development, factor structure and application of the Dog Obesity Risk and Appetite (DORA) questionnaire. PeerJ 3, e1278 (2015))
File: pug_phenotypes.txt
Description:
Variables
- DogID_anon: anonymised dog ID
- Sex: Coded as 0/1 where 0 = female and 1 = male
- Neutered: coded as 0/1 where 0 = entire/not neutered and 1 = neutered
- Age: Age of dog in years
- Weight: Weight of dog in kg
- BCS: Body Condition Score of dogs according to the Purina 9 point BCS scale
File: pug_PRSgenotypes.txt
Description:
Variables
- DogID_anon: anonymised dog ID
- All other columns are in format: chr:base-pair-position_allele
- Where 0/1/2 indicates the count for that allele
File: labrador_replicationset_PRSgenotypes.txt
Description:
Variables
- DogID_anon: anonymised dog ID
- All other columns are in format: chr:base-pair-position_allele
- Where 0/1/2 indicates the count for that allele
File: labrador_GWASresults_gcta-MLMA-loco-LDM_imputed.loco.mlma
Description: Results from the genome wide mixed linear model association analysis used on the GWAS for BCS in 241 Labrador retrievers
- GCTA mlma-loco is the statistical algorithm used
- Explanation of this output format (including detail on each column) can be found on the GCTA website here: https://yanglab.westlake.edu.cn/software/gcta/
- Importantly the p values provided have not been adjusted for lambda and hence we recommend this if being used for further analyses
File: labrador_GWAS_imputed_genetic_data.zip
Description: Genetic data for the population of 241 Labrador retrievers used in the primary BCS GWAS for the study
- The data is in plink binary format (.bim, .bed, .fam)
- Plink file format is explained here: https://www.cog-genomics.org/plink/1.9/formats
- This data is imputed, utilising a pipeline which imputes genetic array data up to whole genome level
- The pipeline of this data is available here: https://github.com/GOdogs-Project/Imputation/releases/tag/version-0.1 and described in the Supplementary Materials and Methods which is provided alongside the manuscript.
"NA" is used to denote, for example, missing clinical data. For example, some dogs have a body condition score but no food motivation score.
Code/software
All data is on the CanFam3.1 reference assembly (GCA_000002285.2)
IMPUTATION:
The in-house pipeline which was made up of three wrapper scripts: generating an imputation reference panel, preparing the genotypic data (to be used for the GWAS) and the imputation process itself can be accessed via GitHub (https://github.com/GOdogs-Project/Imputation/releases/tag/version-0.1). Thepipeline was developed to run on the University of Cambridge High Performance Computing (HPC) system, with a SLURM workload manager, so is not directly transferable but all the programs used within the pipeline are freely available to download and adapt.
Software used:
- PLINK v.1.9.
- SHAPEIT v2.r904
- IMPUTE2 v2
- BEDtools v.2.20.1
- BCFtools v.1.9
GWAS:
Genome-wide Efficient Mixed Model Analysis (GEMMA) software v0.98.1 (69) was used to generate a relatedness matrix which was transformed to a distance matrix using R v.4.2.2.
GWAS and heritability analysis performed using Genome-wide Complex Trait Analysis (GCTA) v.1.93.2.
Access information
The final imputation panel represented 676 individuals of 91 breeds, including 31 Labrador retrievers (BioProject accession PRJNA648123 and PRJNA726547). The panel was enriched with whole genome sequence from 7 Labradors extracted from the European Nucleotide Archive (ENA https://www.ebi.ac.uk/ena/ accession codes SRR7120183, SRR13340562, SRR13340566, SRR13340565, SRR13340564, SRR13340563, SRR13340570) and 5 from in-house WGS data (BioSample IDs SAMEA115942716, SAMEA115942718, SAMEA115942720, SAMEA115942723, SAMEA115942717, SAMEA115942719, SAMEA115942721, SAMEA115942722).
Access information
Other publicly accessible locations of the data:
- Data for new canine WGS data used in the imputation panel are provided via ENA accession codes (https://www.ebi.ac.uk/ena/) detailed on pg. 4 of the Supplementary materials and methods and below.
- The final imputation panel represented 676 individuals of 91 breeds, including 31 Labrador retrievers (BioProject accession PRJNA648123 and PRJNA726547). The panel was enriched with whole genome sequence from 7 Labradors extracted from the European Nucleotide Archive accession codes SRR7120183, SRR13340562, SRR13340566, SRR13340565, SRR13340564, SRR13340563, SRR13340570) and 5 from in-house WGS data (BioSample IDs SAMEA115942716, SAMEA115942718, SAMEA115942720, SAMEA115942723, SAMEA115942717, SAMEA115942719, SAMEA115942721, SAMEA115942722).
Data was derived from the following sources:
- Data obtained from the Golden Retriever Lifetime Study (GRLS) can be accessed via their data Commons website: https://datacommons.morrisanimalfoundation.org/.
- Several of the human data are reused from existing studies, all of which are referenced throughout the main and supplementary text files. Data for human analyses (G2G, UKBB, rare exome association, UKB, SCOOP, gnomAD) are publicly available to download directly or through creation of an account. Data can be accessed through the original study sources listed below.
- Human WHR and BMI GWAS data GIANT consortium: https://portals.broadinstitute.org/collaboration/giant/index.php?title=GIANT_consortium_data_files&oldid=1066
- Human HDL and triglyceride GWAS via the Global Lipids Genetics Consortium (GLGC) - https://www.lipidgenetics.org/#data-downloads-title
- Human UKBB data: https://ams.ukbiobank.ac.uk/ams/
- Human SCOOP case data: European Genome-Phenome Archive (EGA; https://ega-archive.org; Study ID: EGAS00001000124; Dataset ID: EGAD00001000432)
- gnomAD webportal dataset: https://gnomad.broadinstitute.org/downloads
- Mouse HypoMap data: https://www.nature.com/articles/s42255-022-00657-y
If you have any questions, please email the corresponding author of the paper Dr Eleanor Raffan er311@cam.ac.uk
To summarise, we studied Labrador retriever dogs kept as pets or working assistance dogs. Only adult dogs (age 1-10 years, mean 6 years) were included, free of known or suspected systemic illness and not being treated with medications likely to affect obesity status. Body fat mass was assessed using a well-validated measure of adiposity, Body Condition Score (BCS) which uses a combination of haptic and visual cues to assign dogs to BCS categories 1-9 according to standardized descriptors. DNA was extracted from blood or saliva samples and direct genotyping was performed on the CanineHD Genotyping BeadChip (Illumina) array and data were then imputed to 9.4 million single nucleotide polymorphisms (SNPs). For the GWAS, we retained SNPs called with 70% confidence and which were called in >95% of dogs with an allele frequency > 5% and a Hardy Weinberg equilibrium test p >0.001%. There were 4.5 million SNPs included in the GWAS. We performed a GWAS for BCS in 241 Labrador retriever dogs applying a linear mixed effects model (GCTA MLMA-LOCO). Regression modeling was used to identify factors significantly affecting BCS in the population which were then included as covariates for the GWAS. These included sex, neuter status, and sex: neuter status interaction term. We constructed a PRS comprising 16 SNPs weighted for GWAS effect size on BCS using the ‘clumping and thresholding’ technique and applied the PRS to determine its utility as a predictor of BCS in Labradors and other breeds (FCR, pugs, golden retrievers).
This data could not have been collected, nor research conducted without several collaborative efforts and. We are indebted to the owners who volunteered for the study and to the dogs who took part, the Northern England Flatcoated Retriever Association and Kennel Club for helping with recruitment, Guide Dogs UK for sample contributions, Nai-Cheih Liu for gathering phenotypes in pugs.
We also used data from the Golden Retriever Lifetime Study in this research, data for which is available via their 'Data Commons' website.
