Data from: Estimating in-silico causal effects of DNA methylation on gene expression through genetic anchors in airway epithelium from youth with and without asthma
Data files
Mar 25, 2026 version files 1.21 GB
-
cis-eQTL.nocelltypes.fdr0.01.Original.TPM.csv
6.49 MB
-
cis-eQTM.nocelltypes.fdr0.01.Original.TPM.csv
3.39 MB
-
cis-meQTL.nocelltypes.fdr0.01.Original.TPM.csv
53.99 MB
-
README.md
4.73 KB
-
TableS12rev.csv
362.81 KB
-
TableS13rev.csv
546.81 KB
-
trans-eQTL.nocelltypes.fdr0.01.Original.TPM.csv
235.47 KB
-
trans-eQTM.nocelltypes.fdr0.01.Original.TPM.csv
1.14 GB
-
trans-meQTL.nocelltypes.fdr0.01.Original.TPM.csv
2.43 MB
Abstract
Dataset DOI: 10.5061/dryad.rxwdbrvpq
Description of the data and file structure
The study used a subset of the participants from the Epigenetic Variation and Childhood Asthma in Puerto Ricans Study (EVA-PR), originally designed to study methylation profiles associated with atopy in asthmatic children [1].
In the current study, we employed Expression Quantitative Trait Methylation (eQTM), Expression Quantitative Trait Locus (eQTL), and Methylation Quantitative Trait Locus (meQTL) analyses to identify potential causal relationships between DNA methylation and gene expression in pediatric asthma, using SNPs as genetic anchors. We tested both cis- and trans-associations and replicated our cis-eQTL and cis-meQTL results with those previouly reported in Soliai et al[2].
1- E. Forno et al., DNA methylation in nasal epithelium, atopy, and atopic asthma in children: a genome-wide study. The Lancet Respiratory Medicine 7, 336-346 (2019).
2- M. M. Soliai et al., Multi-omics colocalization with genome-wide association studies reveals a context-specific genetic mechanism at a childhood onset asthma risk locus. Genome Med 13, 157 (2021).
Files and variables
Files included in this repository are the full cis- and trans- eQTM, eQTL, and meQTL results with FDR<0.01. Additionally, the replication results of our cis-eQTL and cis-meQTL from Soliai et al. used an FDR threshold of <0.05.
Expression Quantitative Trait Methylation (eQTM) with FDR<0.01:
"cis-eQTM.nocelltypes.fdr0.01.Original.TPM.csv" and "trans-eQTM.nocelltypes.fdr0.01.Original.TPM.csv".
Expression Quantitative Trait Locus (eQTL) with FDR<0.01:
"cis-eQTL.nocelltypes.fdr0.01.Original.TPM.csv" and "trans-eQTL.nocelltypes.fdr0.01.Original.TPM.csv".
Methylation Quantitative Trait Locus (meQTL) with FDR<0.01:
"cis-meQTL.nocelltypes.fdr0.01.Original.TPM.csv" and "trans-meQTL.nocelltypes.fdr0.01.Original.TPM.csv".
Cis-meQTL replication in Soliai et al. with FDR <0.05:
"TableS12rev.csv"
Cis-eQTL replication in Soliai et al. with FDR <0.05:
"TableS13rev.csv"
Description of files:
"snp" or "snpschr": Genomic position of each SNP in "chromosome:position_assessed.allele" format (gh19).
"gene" or "Gene": Gene symbol from HGNC nomenclature.
"cpg" or "CpG": methylation probe identifier from Illumina platform.
"rs" or "Snp": rsid SNP identification from dbSNP.
"beta": the effect size estimate.
"beta_se": the standard error of the effect size estimate.
"MAF": Minor allele frequency calculated within the study population.
"TSS": Genomic position of the gene transcription start site of the gene (gh19).
"fdr": False discovery rate calculated using all tested pairs in cis- or trans-associations separately.
"statistic": t-statistic.
"dist" or "Distance": Genomic distance between features.
For eQTL: gene(TSS) to SNP distance.
For meQTL: methylation probe to SNP
For eQTM: gene(TSS) to methylation probe.
"analysis": indicates whether the results are cis- or trans-associations.
"Ensembl.id": Ensembl id of genes.
"Beta.corrected.EVA.PR": the effect size estimate from our analysis using the EVA-PR cohort.
"slope.Soliai": The effect size estimate from the Soliai et al. study.
"pvalue.EVA.PR": P-value of the association analysis using the EVA-PR cohort.
"pvalue.Soliai": P-value of the association analysis from the Soliai et al study.
"fdr.EVA.PR": False discovery rate calculated using all tested pairs in cis-eQTL using the EVA-PR cohort.
"fdr.EVA.PR": False discovery rate reported in Soliai et al.
Missing data are coded as NA.
Code/software
Any software that can read csv files can be used. Text editors like Notepad(windows), TextEdit(Mac), and OpenOffice are some examples.
Human subjects data
All data shared in this repository consist of summary-level statistics and do not contain any individual-level or directly identifiable participant information. The reported results are aggregated across participants and include only statistical measures, thereby minimizing the risk of re-identification.
The human samples were derived from the Epigenetic Variation and Childhood Asthma in Puerto Ricans (EVA-PR) study, which was approved by the institutional review boards of the University of Puerto Rico (Protocol #0160713) and the University of Pittsburgh (Protocol #20050135). Written parental consent and assent were obtained from participants younger than 18 years, and written consent was obtained from participants 18 years or older.
Epigenetic Variation and Childhood Asthma in Puerto Ricans Study (EVA-PR) is a case-control study of asthma in subjects aged 9 to 20 years [1]. From February 2014 through May 2017, 543 subjects with (cases, n=269) and without (controls, n=274) asthma were recruited from households in the metropolitan area of San Juan (Puerto Rico) using a multistage probability sampling design. The study protocol included questionnaires on respiratory health, measurement of serum allergen-specific IgEs, and collection of nasal epithelial samples for DNA and RNA extraction.
Asthma was defined as physician-diagnosed asthma and wheeze in the previous year. Control subjects had neither physician-diagnosed asthma nor wheeze in the previous year. Levels of IgEs specific to common allergens were measured in serum using the UniCAP 100 system (Pharmacia & Upjohn, Kalamazoo, MI). For each allergen, an IgE ≥0·35 IU/mL was considered positive. Atopy was defined as having at least one positive specific IgE test result. Atopic asthma was defined as asthma and atopy. Non-atopic control subjects had neither asthma nor atopy.
DNA and RNA were extracted from nasal specimens collected from the inferior turbinate. To account for the potential effects of different cell types, we implemented a protocol in a subset of nasal samples (n=31) to select CD326+ nasal epithelial cells before DNA and RNA extraction. Whole-genome methylation assays were done with HumanMethylation450 BeadChips (Illumina) while RNASeq was conducted with the Illumina NextSeq 500 platform (Illumina) reads were aligned to reference human genome (hg19).
We defined cis- associations as methylation probes(eQTM) or SNPs (eQTL/meQTL) located within 1 Mb from a gene's transcription start site (eQTM/eQTL) or methylation probes(meQTL). Trans-association included all pairs with a distance >1 Mb or located on a different chromosome.
For eQTM, we analyzed 14,578 genes and 407,478 CpGs from 455 subjects (219 asthma cases and 236 controls) for which DNAm and gene expression data were available. We fitted a multivariate linear regression with gene expression as the dependent variable and DNA methylation as the independent variable. The model was adjusted for age, sex, atopic asthma status, the top five principal components from genotypic data, RNA sample sorting protocol (i.e., whole-cells or CD326+ nasal epithelial cells), methylation and RNA-Seq batch, and latent factors that capture data heterogeneity from methylation and RNA-Seq (estimated from R package sva).
For eQTL, 469 subjects (228 with asthma and 241 controls) for which both SNPs and gene expression data were available. We tested 2,155,144 SNPs and 14,564 eGenes across 23,237. We fitted a multiple linear regression model where gene expression was the dependent variable and SNPs were the independent variable. the covariates included were asthma status, atopy status, age, sex, RNA sample sorting protocol, RNA batch effects, and a latent factor of gene expression estimated by the sva R package.
For meQTL, Genome-wide DNA methylation data were available for 455 subjects (219 subjects with asthma and 236 controls). We tested associations between 2,155,144 SNPs and 103,473 eCpGs. We fitted a multiple linear regression model where methylation was the dependent variable ans SNPs were the independent variable. The covariates included were asthma status, atopy status, age, sex, methylation batch effect, and a latent factor of methylation estimated using the sva R package.
All analyses used matrix eQTL package for the Quantitative Trait Analyses. False discovery rate (FDRs) was calculated separately for cis- and trans-pairs, which is the default in the matrix-eQTL package.
cis-eQTL and cis-meQTL replication
For replication, we used the results reported by Soliai et al. [2]. Their study involved 104 adults, of whom 49 had Chronic Rhinosinusitis (CRS) (27 had an asthma diagnosis), and 55 had non-CRS (16 had asthma). They used an upper airway epithelial cell (AEC) culture model to assess the transcriptional and epigenetic responses to rhinovirus (RV), an asthma-promoting pathogen. They conducted cis-eQTL and cis-meQTL analyses. They provided vehicle and RV results, but we used vehicle analysis for our replication analysis.
For replication, we checked whether the significant findings were replicated in our EVA-PR cohort. First, we identified all overlapping pairs between significant (FDR<0.05) pairs from Soliai et al. and tested pairs in EVA-PR. Because we pruned SNPs and the 450K DNAm array, not all SNPs and CpGs were tested in EVA-PR. Second, using only the overlapping pairs, we recalculated the FDR of our results and identified pairs using FDR-P <0.05. We checked how many were significant in the same direction of association. To ensure consistency of effect directions, we harmonized the summary statistics by aligning the effect alleles in EVA-PR with the reference alleles reported in Solia et al. When the effect allele was inverted, we reversed the sign of the effect size in the EVA-PR data before assessing directional concordance.
Citations:
1- E. Forno et al., DNA methylation in nasal epithelium, atopy, and atopic asthma in children: a genome-wide study. The Lancet Respiratory Medicine 7, 336-346 (2019).
2- M. M. Soliai et al., Multi-omics colocalization with genome-wide association studies reveals a context-specific genetic mechanism at a childhood onset asthma risk locus. Genome Med 13, 157 (2021).
