Data from: Correcting for cell-type effects in DNA methylation studies: reference-based method outperforms latent variable approaches in empirical studies
Data files
Jan 04, 2018 version files 4.57 GB
-
CellType_Means.csv.gz
1.08 GB
-
README_for_CellType_Means.csv.xlsx
24.56 KB
-
Sample1.tar.gz
647.80 MB
-
Sample2.tar.gz
680.40 MB
-
Sample3.tar.gz
587.37 MB
-
Sample4.tar.gz
490.20 MB
-
Sample5.tar.gz
515.02 MB
-
Sample6.tar.gz
563 MB
Abstract
Based on an extensive simulation study, McGregor and colleagues recently recommended the use of surrogate variable analysis (SVA) to control for the confounding effects of cell-type heterogeneity in DNA methylation association studies in scenarios where no cell-type proportions are available. As their recommendation was mainly based on simulated data, we sought to replicate findings in two large-scale empirical studies. In our empirical data, SVA did not fully correct for cell-type effects, its performance was somewhat unstable, and it carried a risk of missing true signals caused by removing variation that might be linked to actual disease processes. By contrast, a reference-based correction method performed well and did not show these limitations. A disadvantage of this approach is that if reference methylomes are not (publicly) available, they will need to be generated once for a small set of samples. However, given the notable risk we observed for cell-type confounding, we argue that, to avoid introducing false-positive findings into the literature, it could be well worth making this investment.