Skip to main content

A multi-ethnic epigenome-wide association study of leukocyte DNA methylation and blood lipids

Cite this dataset

Assimes, Themistocles; Jhun, Min A (2021). A multi-ethnic epigenome-wide association study of leukocyte DNA methylation and blood lipids [Dataset]. Dryad.


We examined the association between DNA methylation in circulating leukocytes and blood lipids in a multi-ethnic sample of 16,265 subjects. We identified 148, 35, and 4 novel associations among Europeans, African Americans, and Hispanics, respectively, and an additional 186 novel associations through a trans-ethnic meta-analysis. We observed a high concordance in the direction of effects across racial/ethnic groups, a high correlation of effect sizes between high-density lipoprotein and triglycerides, a modest overlap of associations with epigenome-wide association studies of other cardio-metabolic traits, and a low overlap with lipid loci identified to date through genome-wide association studies. Thirty CpGs reached significance in at least 2 racial/ethnic groups including 7 that showed association with the expression of an annotated gene. CpGs annotated to CPT1A showed evidence of being influenced by triglycerides levels. DNA methylation levels of circulating leukocytes show robust and consistent association with blood lipid levels across multiple racial/ethnic groups.


A total of 15 cohorts (N=16,265) from the epigenetics working group in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium participated in this study. These included the Old Order Amish (OOA), Atherosclerosis Risk in Communities (ARIC), Bogalusa Heart Study (BHS), Cardiovascular Health Study (CHS), Framingham Heart Study (FHS), Genetic Epidemiology Network of Arteriopathy (GENOA), Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), Hypertension Genetic Epidemiology Network (HyperGEN), Cooperative health research in the Region of Augsburg (KORA), Normative Aging Study (NAS), Prospective Investigation of Vascularity of Uppsala Elders Study (PIVUS), Rotterdam Study (RS), UK Adult Twin Registry (TwinsUK), Women’s Health Initiative Broad Agency Announcement 23 (WHI-BA23), and the Women’s Health Initiative Epigenetic Mechanisms of PM-Mediated CVD (WHI-EMPC) cohorts. Four cohorts, BHS, CHS, WHI-BA23, and WHI-EMPC, examined more than one racial/ethnic group. The total number of cohorts in the European (EA), African (AA), and Hispanic (HA) study populations is 12 (N=11,114), 9 (N=4,452), and 2 (N=699), respectively. 

High-density-lipoprotein (HDL, mg per dl) and triglycerides (TG, mg per dl) were directly measured in blood samples taken from participants after at least an 8 hour fast. Low-density-lipoprotein (LDL, mg per dl) was inferred using the Friedewald’s formula in all cohorts except for GOLDN, HyperGEN, and KORA where LDL was measured directly.  We did not infer LDL in subjects with triglycerides > 400 mg per dL and we excluded lipid measure from subjects who did not fast for at least 8 hours. We also excluded outliers as defined by >5 standard deviations from the mean of a blood lipid in each cohort. To reduce skewness, HDL and triglycerides were natural log-transformed.

DNA methylation was produced by investigators from each cohort independently. Levels were measured from peripheral blood leukocytes isolated from whole blood in all studies except GOLDN where only CD4 + T cells were examined. The EZ DNA Methylation Gold Kit (Zymo Research, Orange CA) was used for bisulfite conversion. The Illumina® Infinium HumanMethylation450 BeadChip and the Illumina BeadXpress reader were used to perform the methylation assays. Either the SWAN method in the minfi R package, the Beta Mixture Quantile method (BMIQ), the DASEN method in the wateRmelon R package, or the GenomeStudio® Methylation Module was used for pre-processing and normalization of the data in each cohort. For each CpG site, a beta-value was calculated representing the percent methylation at that CpG site.   Any single value with a detection p-value > 0.01 was set to missing. In each cohort, we excluded probes with a detection p-value > 0.01 in greater than 5% of samples. In addition, we excluded samples with a detection p-value > 0.01 in greater than 5% of probes. To avoid spurious signals in DNA methylation data, we excluded 29,233 CpGs that co-hybridize to alternate genomic sequences (highly homologous to the intended targets)

Epigenome-wide association analyses (EWAS) were performed in each cohort stratified by racial/ethnic group (European, African, and Hispanic). For Model 1, a linear mixed effects model was used to study the association between the DNA methylation level of a CpG (dependent variable) and each of the lipid measures (independent variable; HDL, LDL, or TG), adjusting for age, sex (reference = male), smoking variable (never/previous/current, reference = never), lipid medication (Yes or No, reference = No), the top four principal components from genotypes (SNPs), and the proportion of 5 types of cells estimated with the Houseman method (CD8 T lymphocytes, CD4 T lymphocytes, natural killer cells, B cells, and monocytes). We added random effects for plate, row, and column. We also included family structure as a random effect among family-based studies. For Model 2, we further adjusted for BMI in addition to Model 1 covariates. We also ran Model 3 and Model 4 which were analogous to Model 1 and 2, respectively, in the subset of individuals not taking lipid lowering medication.

We performed meta-analyses of all the participating cohorts (N=16,265) and also stratified by racial/ethnic group: European Americans (12 cohorts, N=11,114), African Americans (7 cohorts, N=4,452), and Hispanics (2 cohorts, N=699). These meta-analyses were performed for each of the 4 models, respectively. We used a random effects meta-analysis implemented in METASOFT 33 to take into account the heterogeneity of the effect sizes of different cohorts while achieving a higher or comparable statistical power compared to fixed effects meta-analysis. To avoid spurious findings from population substructure, we applied genomic control.

We investigated the association between either imputed or genotyped SNPs located within 25kb upstream or downstream of each CpG and DNA methylation levels to identify cis-acting methylation quantitative trait loci (mQTL). For imputed SNP data, we restricted the mQTL analysis to SNPs with a good quality imputation (IMPUTE info>=0.4 or MACH r^2>=0.3). Subjects taking lipid lowering medications were excluded from this analysis. Beta-values of DNA methylation levels were inverse-normal transformed and regressed on age, sex, smoking (current/former/never), BMI, at least 4 SNP PCs, cell proportions (WBC count and/or estimated WBC proportions (granulocytes as a reference)), and technical covariates (plate, row, and column as random effects) (two-sided test). Family information was also included as a random effect if a cohort was a family-based study. We then regressed the residuals on each SNP of interests stratified by racial/ethnic group.  Five out of the 15 cohorts provided genetic data for this analysis including ARIC (NAA=1,717), GOLDN (NEA=713), KORA (NEA=1,379), WHI-BA23 (NEA=790, NAA=540, and NHISP=324), and WHI-EMPC (NEA=494, NAA=424, and NHISP=221). We restricted our analysis to SNPs located within 25 kilobases up- or downstream of the CpGs with a minor allele frequency (MAF) > 0.01 in each cohort and implemented a fixed effects meta-analysis within each of the three racial/ethnic groups.


Usage notes

Two sets of results files are included. 

Set 1. Txt files provide the results of the epigenomide association study for each of 3 lipid traits, 3 racial/ethnic subgroups followed by a trans-ethnic meta-analysis, and 4 models.   Files are labelled as lipid.model.racial/ethnic_group.GC. The three lipid trais are HDL: high-density lipoprotein; LDL: low-density lipoprotein; TG: triglycerides.  The three racial/ethnic groups are EA: European ancestry; AA: African ancestry; HA: Hispanic ancestry; ALL_wGOLDN: trans-ethnic.  The suffix "GC" indicates that a genomic-control correction has been applied.   Each file includes results for fixed effect meta-analysis as well as random effect meta-analysis where more than 1 cohort is available for analysis (p-value, beta, standard error).  Each file also includes heterogeneity statitics where more than 1 cohort is meta-analyzed.  Lastly, P values for each study included in the meta-analysis are provided.  Please note these P values are followed by M values labelled as NA.

Set 2.  CSV files provide the results of the methylation QTL association meta-analyses for 30 CpGs found to be associated with one of three lipid traits in more than one racial/ethnic group.  SNPs tested are those within 25kb upstream or downstream of each CpG.  Both fixed efffect and random effect meta-analysis results are provided as well as heterogeneity results when more than 1 study contributes to the meta-analysis.  Each file is labelled by CpG followed by the racial/ethnic group analysed: EA: European ancestry; AA: African ancestry; HA: Hispanic ancestry. 



National Heart Lung and Blood Institute, Award: HHSN268201300006C

National Heart Lung and Blood Institute, Award: R01HL105756

National Heart Lung and Blood Institute, Award: R01HL104135

National Institute for Environmental Studies, Award: R01ES020836