Skip to main content
Dryad

Data from: An integrative approach to prioritize candidate causal genes for complex traits in cattle

Data files

Jun 04, 2025 version files 16.75 MB

Abstract

Genome-wide association studies (GWAS) have identified many quantitative trait loci (QTL) associated with complex traits, predominantly in non-coding regions, posing challenges in pinpointing the causal variants and their target genes. Three types of evidence can help identify the gene through which QTL act: (1) proximity to the most significant GWAS variant, (2) correlation of gene expression with the trait, and (3) the gene’s physiological role in the trait. However, there is still uncertainty in the success of these methods in identifying the correct genes. Here we test the ability of these methods in a comparatively simple series of traits associated with the concentration of polar lipids in milk.

We conducted single-trait GWAS for ~14 million imputed variants and 56 individual milk polar lipid (PL) phenotypes in 336 cows. A meta-analysis of multi-trait GWAS identified 10,063 significant SNPs at FDR ≤ 10% (P ≤ 7.15E-5). Transcriptome data from blood (~12.5K genes, 143 cows) and mammary tissue (~12.2K genes, 169 cows) were analysed using the genetic score omics regression (GSOR) method. This method links observed gene expression to genetically predicted phenotypes and was used to find associations between gene expression and 56 PL phenotypes. GSOR identified 2,186 genes in blood and 1,404 in mammary tissue associated with at least one PL phenotype (FDR ≤ 1%). We partitioned the genome into non-overlapping windows of 100 Kb to test for overlap between GSOR-identified genes and GWAS signals. We found a significant overlap between these two datasets, indicating that GSOR significant genes were more likely to be located within 100 Kb windows that have GWAS signals compared to those without (P = 0.01; odds ratio = 1.47). These windows included 70 significant genes expressed in mammary tissue and 95 in blood. Compared to all expressed genes in each tissue, these genes were enriched for lipid metabolism gene ontology (GO). That is, 7 of the 70 significant mammary transcriptome genes (P < 0.01; odds ratio = 3.98) and 5 of the 95 significant blood genes (P < 0.10; odds ratio = 2.24) were involved in lipid metabolism GO. The candidate causal genes include DGAT1, ACSM5, SERINC5, ABHD3, CYP2U1, PIGL, ARV1, SMPD5, and NPC2, with some overlap between the two tissues.

The overlap between GWAS, GSOR, and GO analyses suggests that together these methods can identify genes mediating QTL, though their power remains limited, as reflected by modest odds ratios. Larger sample sizes would enhance the power of these analyses, but issues like linkage disequilibrium would remain.