Data from: An integrative approach to prioritize candidate causal genes for complex traits in cattle
Data files
Jun 04, 2025 version files 16.75 MB
-
Polar_Lipids_Phenotypes_HDGenotypes.zip
16.75 MB
-
README.md
1.38 KB
Abstract
Genome-wide association studies (GWAS) have identified many quantitative trait loci (QTL) associated with complex traits, predominantly in non-coding regions, posing challenges in pinpointing the causal variants and their target genes. Three types of evidence can help identify the gene through which QTL act: (1) proximity to the most significant GWAS variant, (2) correlation of gene expression with the trait, and (3) the gene’s physiological role in the trait. However, there is still uncertainty in the success of these methods in identifying the correct genes. Here we test the ability of these methods in a comparatively simple series of traits associated with the concentration of polar lipids in milk.
We conducted single-trait GWAS for ~14 million imputed variants and 56 individual milk polar lipid (PL) phenotypes in 336 cows. A meta-analysis of multi-trait GWAS identified 10,063 significant SNPs at FDR ≤ 10% (P ≤ 7.15E-5). Transcriptome data from blood (~12.5K genes, 143 cows) and mammary tissue (~12.2K genes, 169 cows) were analysed using the genetic score omics regression (GSOR) method. This method links observed gene expression to genetically predicted phenotypes and was used to find associations between gene expression and 56 PL phenotypes. GSOR identified 2,186 genes in blood and 1,404 in mammary tissue associated with at least one PL phenotype (FDR ≤ 1%). We partitioned the genome into non-overlapping windows of 100 Kb to test for overlap between GSOR-identified genes and GWAS signals. We found a significant overlap between these two datasets, indicating that GSOR significant genes were more likely to be located within 100 Kb windows that have GWAS signals compared to those without (P = 0.01; odds ratio = 1.47). These windows included 70 significant genes expressed in mammary tissue and 95 in blood. Compared to all expressed genes in each tissue, these genes were enriched for lipid metabolism gene ontology (GO). That is, 7 of the 70 significant mammary transcriptome genes (P < 0.01; odds ratio = 3.98) and 5 of the 95 significant blood genes (P < 0.10; odds ratio = 2.24) were involved in lipid metabolism GO. The candidate causal genes include DGAT1, ACSM5, SERINC5, ABHD3, CYP2U1, PIGL, ARV1, SMPD5, and NPC2, with some overlap between the two tissues.
The overlap between GWAS, GSOR, and GO analyses suggests that together these methods can identify genes mediating QTL, though their power remains limited, as reflected by modest odds ratios. Larger sample sizes would enhance the power of these analyses, but issues like linkage disequilibrium would remain.
https://doi.org/10.5061/dryad.bcc2fqzph
Description of the data and file structure
This dataset is publicly available upon acceptance of the paper titled: “An integrative approach to prioritize candidate causal genes for complex traits in cattle”. Also, a Read_Me.txt file is included in the zipped data file.
Genotype File
The genotype data consists of three PLINK binary files: GenotypesHD.bed, GenotypesHD.bim, and GenotypesHD.fam.
These files include high-density (HD) imputed genotypes for 336 cows, based on Run7 of the 1000 Bull Genomes Project.
Phenotype File
The PhenoPLs.csv file contains the phenotypes of 56 milk polar lipid species measured in the 336 cows.
This file includes a header row with information about the group each phenotype belongs to.
The polar lipid groups represented are:
-
Phosphatidylcholine (PC)
-
Phosphatidylethanolamine (PE)
-
Phosphatidylserine (PS)
-
Phosphatidylinositol (PI)
-
Sphingomyelin (SM)
-
Lactosylceramide (LacCer)
-
Glucosylceramide (GluCer)
Note: Outliers have not been removed from the phenotype data.
Fixed Effects File
The FixedEffectPLs.txt file provides fixed effects data, including the combined effect of year and batch (with 9 levels from B1 to B9) for the 336 cows.
The first two columns represent the Family ID (FID) and Individual ID (IID).