Supplemental material for: Genome-wide association study and fine-mapping using imputed sequences to prioritize candidate genes for 30 complex traits in 50,309 Holstein bulls
Data files
Sep 12, 2025 version files 114.51 MB
-
finemapping_summary_stats_dairy_2025.csv
106.48 MB
-
README.md
3.95 KB
-
Supplemental_Figures.pdf
2.91 MB
-
Supplemental_Tables.xlsx
5.12 MB
Sep 12, 2025 version files 114.51 MB
-
finemapping_summary_stats_dairy_2025.csv
106.48 MB
-
README.md
3.98 KB
-
Supplemental_Figures.pdf
2.91 MB
-
Supplemental_Tables.xlsx
5.12 MB
Abstract
Identifying causal genetic variants underlying economically important traits in dairy cattle is essential for understanding their genetic basis and optimizing breeding programs. The growing availability of sequenced reference genomes and individuals with both phenotypic and genotypic data notably enhances our ability to detect genetic associations and further pinpoint causal effects. This comprehensive genome-wide association study of dairy cattle utilized de-regressed breeding values as phenotypes and analyzed 11,292,243 quality-controlled, imputed DNA sequence variants from 50,309 Holstein bulls. The number of bulls with available phenotypes ranged from 23,121 to 50,309 across 30 complex traits categorized into production and yield, type, and longevity and health. We performed GWAS using our SLEMM-GWA approach, which accounts for the varying reliability of de-regressed breeding values across individuals and demonstrates computational efficiency for large sample sizes and sequence data. This analysis identified 381 significant association peaks (P < 5E-8), of which 126 represent novel findings. Subsequent Bayesian fine-mapping provided statistical prioritization by assigning posterior conditional inclusion probabilities to individual variants and genes, yielding a list of credible candidate genes—an advancement over conventional GWAS reporting of all proximal genes. This prioritization offered direct statistical support for previously reported genes, and, more importantly, identified credible candidate genes within the 126 newly discovered peaks, including AOPEP, GC, E2F6, MGST1, VPS13B, ZNF652, ASPH, SFMBT1, and MAPRE2. These findings enhance the understanding of the genetic architecture of these complex dairy traits and provide valuable insights for the refinement of genomic selection strategies and breeding programs in Holstein cattle.
https://doi.org/10.5061/dryad.vmcvdnd3q
Description of the data and file structure
Overview
This repository contains supplementary data and fine-mapping summary statistics from a comprehensive genome-wide association study (GWAS) and Bayesian fine-mapping analysis of 30 complex dairy traits in 50,309 Holstein bulls. The study utilized 11,292,243 quality-controlled, imputed sequence variants to identify genetic associations and prioritize candidate genes through statistical fine-mapping.
File: Supplemental_Tables.xlsx
Description: Supplementary tables referenced in the main manuscript.
Contents:
- Supplemental Table S1: Trait definitions and corresponding trait names from Cattle QTLdb used for comparison with previously reported associations across all 30 analyzed dairy traits.
- Supplemental Table S2: Log-likelihood values demonstrating improved model fitting when incorporating deregressed PTA reliability information into GWAS models versus models without reliability weighting.
- Supplemental Table S3: Genomic inflation factors (λ) for all 30 traits.
- Supplemental Table S4: Number of genome-wide significant peaks per trait (P < 5×10⁻⁸) and count of high-confidence fine-mapped signals.
- Supplemental Table S5: Detailed comparison of all 381 significant association peaks with Cattle QTLdb and previous studies.
- Supplemental Table S6: Number of candidate regions and number of signals (P < 5×10⁻⁵) identified per trait in the fine-mapping analysis.
- Supplemental Table S7: Candidate regions and corresponding SNP counts subjected to fine-mapping analysis across the 30 traits.
- Supplemental Table S8: High-confidence fine-mapped signals.
- Supplemental Table S9: Complete gene-level prioritization results.
File: Supplemental_Figures.pdf
Description: Supplemental figures referenced in the main manuscript, including Manhattan plots for model comparisons and for all 30 dairy traits.
Contents:
- Supplemental Figure S1: Manhattan plots comparing GWAS results for milk yield (A-D), foot angle (E-H), and livability (I-L) using genomic relationship matrices (GRMs) constructed with different SNP sets: 30K random SNPs (A, E, I), 50K random SNPs (B, F, J), 70K random SNPs (C, G, K), and the 70K SNP chip panel (D, H, L).
- Supplemental Figure S2: Manhattan plots of GWAS results for 30 complex traits in Holsteins.
File: finemapping_summary_stats_dairy_2025.csv
Description: Complete fine-mapping summary statistics from the BFMAP analysis for all significant signals across the 30 dairy traits.
Variables
- signal: Each independent signal
- SNPindex: An integer starting from 0 for specifying a variant in an association signal
- SNPname: Variant ID
- Chr: Chromosome number (1-29)
- Pos: Physical position on the chromosome in base pairs
- Allele1: Allele 1
- Allele2: Allele 2
- MAF: Minor allele frequency
- HWE_Pval: Hardy-Weinberg equilibrium P-value
- sample_size: Number of individuals with both genotype and phenotype data
- effect: Effect size estimate of a variant conditional on other association signals
- log_sBF: Logarithm of scaled Bayes factor (H~~~0~~~: The tested variant has no effect.)
- lambda: A number determining the null distribution of scaled Bayes factor (=1)
- Pval: P-value corresponding to H~~~0~~~
- logProb: Logarithm of P(Data|Model)
- penalty: Logarithm of P(Model), equal to 0 for equal prior of models
- penalized_logProb: Logarithm of posterior probability
- rel_logProb: Penalized_logProb of a model relative to the variants-excluded model
- normedProb: Posterior conditional inclusion probability (PCIP)
- R: Genotype correlation between a variant and the lead variant
- Trait: Trait name
