Genetic architecture of alcohol consumption identified by a genotype-stratified GWAS, and impact on esophageal cancer risk in Japanese people
Data files
Oct 24, 2023 version files 1.32 GB
-
1_Alcohol_intake_Unstratified.tsv.gz
-
2_Alcohol_intake_rs671_GA.tsv.gz
-
3_Alcohol_intake_rs671_GG.tsv.gz
-
4_Alcohol_intake_rs671_Interaction.tsv.gz
-
5_Drinking_Unstratified.tsv.gz
-
6_Drinking_rs671_GA.tsv.gz
-
7_Drinking_rs671_GG.tsv.gz
-
8_Drinking_rs671_Interaction.tsv.gz
-
header_definition_20230615.xlsx
-
README.md
Dec 04, 2023 version files 14.17 GB
-
1_Alcohol_intake_Unstratified.tsv.gz
-
10_Drinking_rs671_JMA.tsv.gz
-
11_EC_rs671_StratifiedAnalyses.csv
-
12_EC_rs671_RERI.csv
-
2_Alcohol_intake_rs671_GA.tsv.gz
-
3_Alcohol_intake_rs671_GG.tsv.gz
-
4_Alcohol_intake_rs671_Interaction.tsv.gz
-
5_Alcohol_intake_rs671_JMA.tsv.gz
-
6_Drinking_Unstratified.tsv.gz
-
7_Drinking_rs671_GA.tsv.gz
-
8_Drinking_rs671_GG.tsv.gz
-
9_Drinking_rs671_Interaction.tsv.gz
-
ade2780_header_definition.xlsx
-
README.md
Jan 02, 2024 version files 14.17 GB
-
1_Alcohol_intake_Unstratified.tsv.gz
-
10_Drinking_rs671_JMA.tsv.gz
-
11_EC_rs671_StratifiedAnalyses.csv
-
12_EC_rs671_RERI.csv
-
2_Alcohol_intake_rs671_GA.tsv.gz
-
3_Alcohol_intake_rs671_GG.tsv.gz
-
4_Alcohol_intake_rs671_Interaction.tsv.gz
-
5_Alcohol_intake_rs671_JMA.tsv.gz
-
6_Drinking_Unstratified.tsv.gz
-
7_Drinking_rs671_GA.tsv.gz
-
8_Drinking_rs671_GG.tsv.gz
-
9_Drinking_rs671_Interaction.tsv.gz
-
ade2780_header_definition.xlsx
-
README.md
Abstract
An East Asian-specific variant on aldehyde dehydrogenase 2 (ALDH2 rs671, G>A) is the major genetic determinant of alcohol consumption. We performed an rs671 genotype-stratified genome-wide association study meta-analysis of alcohol consumption in 175,672 Japanese individuals to explore gene-gene interactions with rs671 behind drinking behavior. The analysis identified three genome-wide significant loci (GCKR, KLB, and ADH1B) in wild-type homozygotes and six (GCKR, ADH1B, ALDH1B1, ALDH1A1, ALDH2, and GOT2) in heterozygotes, with five showing genome-wide significant interaction with rs671. Genetic correlation analyses revealed ancestry-specific genetic architecture in heterozygotes. Of the discovered loci, four (GCKR, ADH1B, ALDH1A1, and ALDH2) were suggested to interact with rs671 in the risk of esophageal cancer, a representative alcohol-related disease. Our results identify the genotype-specific genetic architecture of alcohol consumption and reveal its potential impact on alcohol-related disease risk.
README: Title of Dataset
Full summary statistics of GWAS meta-analysis and JMA for daily alcohol intake and drinking status, alongside aggregated data from individual studies for GWAS, JMA, and esophageal cancer case-control study
We performed an rs671 genotype-stratified GWAS meta-analysis of alcohol consumption based on the Japanese Consortium of Genetic Epidemiology studies (J-CGE), the Nagahama Study, and the BBJ Study. The J-CGE consisted of the following Japanese population-based and hospital-based studies: the HERPACC Study, the J-MICC Study, the JPHC Study, and the TMM Study. Individual study descriptions and an overview of the characteristics of the study populations are provided in the Supplementary Information and table S1. To validate the stratified approach, we applied a joint meta-analysis (JMA). Further, to validate the impact of the discovered loci on alcohol-related disease, we performed a meta-analysis of two esophageal cancer case-control studies within the HERPACC Study and the BBJ Study.
Description of the data and file structure
12 files listed below are included.
├── 1_Alcohol_intake_Unstratified.tsv.gz
├── 2_Alcohol_intake_rs671_GA.tsv.gz
├── 3_Alcohol_intake_rs671_GG.tsv.gz
├── 4_Alcohol_intake_rs671_Interaction.tsv.gz
├── 5_Alcohol_intake_rs671_JMA.tsv.gz
├── 6_Drinking_Unstratified.tsv.gz
├── 7_Drinking_rs671_GA.tsv.gz
├── 8_Drinking_rs671_GG.tsv.gz
├── 9_Drinking_rs671_Interaction.tsv.gz
├── 10_Drinking_rs671_JMA.tsv.gz
├── 11_EC_rs671_StratifiedAnalyses.csv
├── 12_EC_rs671_RERI.csv
Each file is tab-delimited, with header information described in ade2780_header_definition.xlsx.
Files including 'Alcohol_intake' in their filename contain meta-analysis summary statistics and aggregated data from individual studies on daily alcohol intake. Files with 'Drinking' in their filename include meta-analysis summary statistics and aggregated data from individual studies on drinking status (never vs. ever). Each set is categorized into 'Unstratified', 'rs671_GA', 'rs671_GG', 'Interaction', and 'JMA'. 'Unstratified' refers to the overall analysis not stratified by the rs671 genotype. 'rs671_GA' and 'rs671_GG' are analyses restricted to subjects with rs671 GA and GG genotypes, respectively. 'Interaction' involves evaluating the interaction between each SNP and rs671, excluding subjects with the rs671 AA genotype. 'JMA' pertains to the Joint Meta-Analysis. The values under the header names (SNP, CHR, POS, EA, NEA, EAF, BETA, SE, P, HetP, and N for 'Unstratified', 'rs671_GA', 'rs671_GG', and 'Interaction'; SNP, CHR, POS, EA, NEA, BETA_SNP, SE_SNP, BETA_Int, SE_Int, COV_SNP_Int, Pc, HetP, and N for 'JMA') represent meta-analysis summary statistics. Values under any other header names are aggregated data.
Files with 'EC' in their filename consist of aggregated data from individual studies for esophageal cancer case-control studies. Patterns are denoted as 'StratifiedAnalyses' and 'RERI'. 'StratifiedAnalyses' includes unstratified and rs671-stratified analyses, while 'RERI' examines the interaction between each SNP and rs671, excluding subjects with the rs671 AA genotype.
Sharing/Access information
Data and materials availability: The full meta-analysis summary statistics for GWAS and JMA, as well as aggregated data from individual studies for GWAS, JMA, and the esophageal cancer case-control study, are accessible at Dryad (DOI: 10.5061/dryad.tmpg4f546). The individual-level genotype or phenotype data cannot be made available due to restrictions imposed by the ethics approval.
Code/Software
GWAS
Quality control and genotype imputation: Quality control for samples and SNPs was performed based on study-specific criteria (table S2). Genotype data in each study were imputed separately based on the 1000 Genomes Project reference panel (Phase 3, all ethnicities) (1). Phasing was performed with the use of SHAPEIT (v2) (2) and Eagle (3), and imputation was performed using minimac3 (4), minimac4, or IMPUTE (v2) (5). Information on the study-specific genotyping, imputation, quality control, and analysis tools is provided in table S2. After genotype imputation, further quality control was applied to each study. SNPs with an imputation quality of r2 < 0.3 for minimac3 or minimac4, info < 0.4 for IMPUTE2 or an MAF of <0.01 were excluded.
Association analysis of SNPs with daily alcohol intake and drinking status: Association analysis of SNPs with daily alcohol intake and drinking status was performed on three different subject groups: the entire population, subjects with the rs671 GG genotype only, and subjects with the rs671 GA genotype only. Because the number of ever drinkers with the rs671 AA genotype was too small (table S3), association analysis in subjects with the rs671 AA genotype only was not conducted. Daily alcohol intake was base-2 log-transformed (log2 (grammes/day + 1)). The association of daily alcohol intake with SNP allele dose for each study was assessed by linear regression analysis with adjustment for age, age2, sex, and the first 10 principal components. For the BBJ Study, the affection status of 47 diseases was further added as covariates. The association of drinking status with SNP allele dose for each study was assessed by logistic regression analysis with adjustment for age, age2, sex, the first 10 principal components, and disease affection status of 47 diseases (for the BBJ Study). The effect sizes and standard errors estimated in the association analysis were used in the subsequent meta-analysis. The association analysis was conducted using EPACTS (http://genome.sph.umich.edu/wiki/EPACTS), SNPTEST (6), or PLINK2 (7). Association analysis, including interaction terms, was performed to evaluate the differential effects of each SNP on daily alcohol intake and drinking status between the GG and GA genotypes of rs671. Carriers of the AA genotype were excluded from the analysis. The effect sizes of the interaction term, ?interaction, and its standard errors estimated in the association analysis were used in the subsequent meta-analysis. The association analysis, including the interaction term, was conducted using PLINK2 (7). To identify studies with inflated GWAS significance, which can result from population stratification, we computed the intercept from LDSC (8). Before the meta-analysis, all study-specific results in the association analysis were corrected by multiplying the standard error of the effect size by the value of intercept from LDSC if the intercept of that study was greater than 1.
Meta-analysis: The meta-analysis was performed with all Japanese subjects in the six cohorts (table S1). The results of association analyses for each SNP across the studies were combined with METAL software (9) by the fixed-effects inverse-variance-weighted method. Heterogeneity of effect sizes was assessed by I2 and Cochran’s Q statistic. The meta-analysis included SNPs for which genotype data were available from at least three studies with a total sample size of at least 20,000 individuals for unstratified GWAS or interaction GWAS or 10,000 individuals for rs671-stratified GWAS. The genome-wide significance level α was set to a P value <5 × 10–8. P-values with <1.0×10−300 was calculated with Rmpfr of the R package. To assess the inflation of the test statistics for the meta-analysis, we computed the genomic inflation factor and intercept from LDSC (10).
JMA
We used the JMA approach (11,12). The JMA jointly tests both SNP main effects ?SNP and SNP × rs671 interaction effects ?interaction for spherical equivalent with a fixed-effects model, using ?SNP and ?interaction and a ?’s covariance matrix from each study. To perform the JMA, the same model as the interaction analysis for each study described above was analyzed using GEM v1.4 (13), which is capable of obtaining robust covariance matrices for ?SNP and ?interaction. To control false positives, only SNPs with MAF ≥ 0.05 were analyzed by the GEM for each study.
The JMA was conducted with the fixed effects method using METAL software (version 2010-02-08) (9) and patch source code provided by Manning et al. (11). A Wald’s statistic, following a ?2-distribution with two degrees of freedom (d.f.), was used to test the joint significance of the ?SNP and ?interaction. A Cochran’s Q-test was used to assess the heterogeneity of the ?-coefficients across studies for the ?SNP and ?interaction. The cor value was calculated by cor = (IntCov/StdErr × IntStdErr). IntCov is the covariance between ?SNP and ?interaction estimated by the JMA. StdErr and IntStdErr are standard errors of ?SNP and ?interaction estimated by the JMA, respectively. The JMA included SNPs for which genotype data were available from at least three studies with a total sample size of at least 20,000 individuals for interaction GWAS. To control false positives, SNPs with evidence of between study heterogeneity (HetP < 0.001) and cor < 0.7 were excluded (fig. S7). Genomic control correction was applied by calculating ? as the ratio of the observed and expected (2 d.f.) median ?2 statistics and dividing the observed ?2 statistics by ?. The genome-wide significance level ? for the JMA test was set to a P value <5 × 10–8.
Esophageal cancer case-control study
ORs for esophageal cancer per 1-allele change in eight SNPs (rs1260326, rs28712821, rs1229984, rs2228093, rs8187929, rs4648328, rs671, rs73550818) were estimated on three different subject groups (entire population, subjects with the rs671 GG genotype only, and subjects with the rs671 GA genotype only) using a logistic regression model adjusted for sex, age, the first 10 principal components (for the BBJ Study), and the study version (for the HERPACC Study). Study-specific ORs were then pooled using a random-effects model (14). We evaluated the extent of between-study heterogeneity by the Cochran Q-statistic and I2-statistic (15). The interaction between rs671 and each of the variants under study was evaluated on both additive and multiplicative scales. Carriers of the rs671 AA genotype were excluded from this analysis. First, we estimated the study-specific coefficients of each SNP (per 1-allele change), rs671 (GA vs. GG), and the product term of each SNP and rs671 using the logistic regression model. For the HERPACC Study, each SNP was coded as follows: the Ref/Ref genotype was coded as 0, the Ref/Alt genotype as 1, and the Alt/Alt genotype as 2. For the BBJ Study, each SNP was the imputed genotype coded as [0,2] for each SNP. For the rs671 phenotype, the GG genotype was coded as 0, and the GA genotype was coded as 1. We then obtained pooled estimates ?SNP, ?rs671 and ?interaction of the model coefficients corresponding respectively to the effect of the SNP, rs671, and their interaction term (the product term for each SNP and rs671) using multivariate meta-analysis (16) to account for the fact that coefficients estimated from the same study are correlated. Specifically, we conducted random-effects multivariate analyses based on likelihood maximization using the study-specific coefficients ?SNP, ?rs671 and ?interaction as well as their covariance matrix. Multiplicative interaction was measured by the summary OR associated with the interaction term with its corresponding 95% confidence interval. As the measure of additive interaction, we estimated the relative excess risk due to interaction (RERI) (17). Confidence intervals for RERI were estimated by the Delta Method (18) and P-values were based on the Wald test. Statistical significance was set at the Bonferroni corrected threshold of P < 0.05/8 (= 0.00625) and suggestive significance was set at P <0.05. RERI was considered to achieve suggestive significance (P < 0.05) when its confidence interval did not include 0. Analyses were performed with R version 4.1.2 (The R Foundation for Statistical Computing) or STATA version 17.0 (Stata Corporation, College Station, TX, USA).
References
- The 1000 Genomes Project Consortium, A. Auton, L. D. Brooks, R. M. Durbin, E. P. Garrison, H. M. Kang, J. O. Korbel, J. L. Marchini, S. McCarthy, G. A. McVean, G. R. Abecasis, A global reference for human genetic variation. Nature 2015;526:68-74
- Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nature methods 2013;10:5-6
- Loh PR, Palamara PF, Price AL. Fast and accurate long-range phasing in a UK Biobank cohort. Nature genetics 2016;48:811-6
- Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nature genetics 2016;48:1284-7
- Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS genetics 2009;5:e1000529
- Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature genetics 2007;39:906-13
- Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 2015;4:7
- B. K. Bulik-Sullivan, P. R. Loh, H. K. Finucane, S. Ripke, J. Yang, Schizophrenia Working Group of the Psychiatric Genomics Consortium, N. Patterson, M. J. Daly, A. L. Price, B. M. Neale, LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics 2015;47:291-5
- Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26:2190-1
- Freedman ML, Reich D, Penney KL, McDonald GJ, Mignault AA, Patterson N, et al. Assessing the impact of population stratification on genetic association studies. Nature genetics 2004;36:388-93
- Manning AK, LaValley M, Liu CT, Rice K, An P, Liu Y, et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP x environment regression coefficients. Genet Epidemiol 2011;35:11-8
- Aschard H, Hancock DB, London SJ, Kraft P. Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Hum Hered 2010;70:292-300
- Westerman KE, Pham DT, Hong L, Chen Y, Sevilla-Gonzalez M, Sung YJ, et al. GEM: scalable and flexible gene-environment interaction analysis in millions of samples. Bioinformatics 2021;37:3514-20
- DerSimonian R, Laird N. Meta-analysis in clinical trials revisited. Contemp Clin Trials 2015;45:139-45
- Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539-58
- White IR. Multivariate Random-effects Meta-analysis. Stata J 2009;9:40-56
- VanderWeele TJ, Knol MJ. A Tutorial on Interaction. Epidemiol Methods 2014;3:33-72
- Hosmer DW, Lemeshow S. Confidence interval estimation of interaction. Epidemiology 1992;3:452-6
Methods
We performed an rs671 genotype-stratified GWAS meta-analysis of alcohol consumption within six Japanese cohorts. To validate the stratified approach, we applied a joint meta-analysis (JMA). Further, to validate the impact of the discovered loci on alcohol-related disease, we performed a meta-analysis of two esophageal cancer case-control studies.
GWAS meta-analysis
Study subjects and genotyping: We performed a genome-wide meta-analysis based on the Japanese Consortium of Genetic Epidemiology studies (J-CGE) (1), the Nagahama Study (2), and the BBJ Study (3,4). The J-CGE consisted of the following Japanese population-based and hospital-based studies: the HERPACC Study (5), the J-MICC Study (6,7), the JPHC Study (8), and the TMM Study (9). Individual study descriptions and an overview of the characteristics of the study populations are provided in the Supplementary Information and table S1. Data and sample collection for the participating cohorts were approved by the respective research ethics committees. All participating studies obtained informed consent from all participants by following the protocols approved by their institutional ethical committees.
Phenotype: Information on alcohol consumption was collected by questionnaire in each study. Because the questionnaires were not homogeneous across the studies, we harmonized the two alcohol consumption phenotypes of drinking status (never versus ever drinker) and daily alcohol intake (g/day) in accordance with each study’s criterion. Details are provided in the Supplementary Information.
Quality control and genotype imputation: Quality control for samples and SNPs was performed based on study-specific criteria (table S2). Genotype data in each study were imputed separately based on the 1000 Genomes Project reference panel (Phase 3, all ethnicities) (10). Phasing was performed with the use of SHAPEIT (v2) (11) and Eagle (12), and imputation was performed using minimac3 (13), minimac4, or IMPUTE (v2) (14). Information on the study-specific genotyping, imputation, quality control, and analysis tools is provided in table S2. After genotype imputation, further quality control was applied to each study. SNPs with an imputation quality of r2 < 0.3 for minimac3 or minimac4, info < 0.4 for IMPUTE2 or an MAF of <0.01 were excluded.
Association analysis of SNPs with daily alcohol intake and drinking status: Association analysis of SNPs with daily alcohol intake and drinking status was performed on three different subject groups: the entire population, subjects with the rs671 GG genotype only, and subjects with the rs671 GA genotype only. Because the number of ever drinkers with the rs671 AA genotype was too small (table S3), association analysis in subjects with the rs671 AA genotype only was not conducted. Daily alcohol intake was base-2 log-transformed (log2 (grammes/day + 1)). The association of daily alcohol intake with SNP allele dose for each study was assessed by linear regression analysis with adjustment for age, age2, sex, and the first 10 principal components. For the BBJ Study, the affection status of 47 diseases was further added as covariates. The association of drinking status with SNP allele dose for each study was assessed by logistic regression analysis with adjustment for age, age2, sex, the first 10 principal components, and disease affection status of 47 diseases (for the BBJ Study). The effect sizes and standard errors estimated in the association analysis were used in the subsequent meta-analysis. The association analysis was conducted using EPACTS (http://genome.sph.umich.edu/wiki/EPACTS), SNPTEST (15), or PLINK2 (16). Association analysis, including interaction terms, was performed to evaluate the differential effects of each SNP on daily alcohol intake and drinking status between the GG and GA genotypes of rs671. Carriers of the AA genotype were excluded from the analysis. The effect sizes of the interaction term, ?interaction, and its standard errors estimated in the association analysis were used in the subsequent meta-analysis. The association analysis, including the interaction term, was conducted using PLINK2 (16). To identify studies with inflated GWAS significance, which can result from population stratification, we computed the intercept from LDSC (17). Before the meta-analysis, all study-specific results in the association analysis were corrected by multiplying the standard error of the effect size by the value of intercept from LDSC if the intercept of that study was greater than 1.
Meta-analysis: The meta-analysis was performed with all Japanese subjects in the six cohorts (table S1). The results of association analyses for each SNP across the studies were combined with METAL software (18) by the fixed-effects inverse-variance-weighted method. Heterogeneity of effect sizes was assessed by I2 and Cochran’s Q statistic. The meta-analysis included SNPs for which genotype data were available from at least three studies with a total sample size of at least 20,000 individuals for unstratified GWAS or interaction GWAS or 10,000 individuals for rs671-stratified GWAS. The genome-wide significance level α was set to a P value <5 × 10–8. P-values with <1.0×10−300 was calculated with Rmpfr of the R package. To assess the inflation of the test statistics for the meta-analysis, we computed the genomic inflation factor, l, and intercept from LDSC (19).
JMA
We used the JMA approach (20,21). The JMA jointly tests both SNP main effects ?SNP and SNP × rs671 interaction effects ?interaction for spherical equivalent with a fixed-effects model, using ?SNP and ?interaction and a ?’s covariance matrix from each study. To perform the JMA, the same model as the interaction analysis for each study described above was analyzed using GEM v1.4 (22), which is capable of obtaining robust covariance matrices for ?SNP and ?interaction. To control false positives, only SNPs with MAF ≥ 0.05 were analyzed by the GEM for each study.
The JMA was conducted with the fixed effects method using METAL software (version 2010-02-08) (18) and patch source code provided by Manning et al. (20). A Wald’s statistic, following a ?2-distribution with two degrees of freedom (d.f.), was used to test the joint significance of the ?SNP and ?interaction. A Cochran’s Q-test was used to assess the heterogeneity of the ?-coefficients across studies for the ?SNP and ?interaction. The cor value was calculated by cor = (IntCov/StdErr × IntStdErr). IntCov is the covariance between ?SNP and ?interaction estimated by the JMA. StdErr and IntStdErr are standard errors of ?SNP and ?interaction estimated by the JMA, respectively. The JMA included SNPs for which genotype data were available from at least three studies with a total sample size of at least 20,000 individuals for interaction GWAS. To control false positives, SNPs with evidence of between study heterogeneity (HetP < 0.001) and cor < 0.7 were excluded (fig. S7). Genomic control correction was applied by calculating ? as the ratio of the observed and expected (2 d.f.) median ?2 statistics and dividing the observed ?2 statistics by ?. The genome-wide significance level ? for the JMA test was set to a P value <5 × 10–8.
Esophageal cancer case-control study
Study sample: In the HERPACC Study, we included 692 cases and 995 age- and sex-matched controls who were selected from participants in the HERPACC-2 (2001–2005) (23) and HERPACC-3 (2005–2013) (24). Cases were first-visit outpatients at Aichi Cancer Center Hospital who were diagnosed with esophageal cancer within -3 to +12 months of the first visit. Controls were first-visit outpatients who were confirmed to have no cancer or history of neoplasm. The BBJ Study included 416 cases and 86,515 controls after excluding (1) outliers from the Japanese cluster, as estimated by principal component analysis with samples of the 1000 Genomes project (10); and (2) closely related individuals estimated by King (25) (specifically, King kinship coefficients > 0.09375). Cases were diagnosed with esophageal cancer within -3 to +12 months from the date of consent. Controls were those confirmed to have no cancer or history of neoplasm. In the HERPACC Study, esophageal cancer cases were identified using the International Classification of Diseases for Oncology, Third Edition (ICD-O-3) (26) topography code C15. As a sensitivity analysis, we performed an additional analysis restricted to cases with squamous cell carcinoma identified using the ICD-O-3 morphology codes of 8050–8078 and 8083–8084, resulting in 636 cases. In the BBJ Study, all participants had been diagnosed with at least one of 47 target diseases, including esophageal cancer, by physicians at the cooperating hospitals. Esophageal cancer histology was determined from excised tissue specimens, and missing histological data were complemented by cytological specimens, resulting in 348 cases of squamous cell carcinoma.
Genotyping and imputation procedure: In the HERPACC Study, genomic DNA was extracted from peripheral blood using a DNA Blood mini kit (Qiagen, Tokyo, Japan), and eight SNPs were genotyped using TaqMan Assays with the 7500 Real-Time PCR System (Applied Biosystems, Foster City, CA, USA) or SNPtype assays with JUNO and EP1 System (Fluidigm, San Francisco, CA, USA). We confirmed a 100% match between rs4648328 and rs79463616 genotypes in the 96 selected HERPACC samples using Sanger sequencing. The genotyping and imputation procedure in the BBJ Study is described in the ‘Details of studies’ section of the Supplementary Information.
Statistical analysis: ORs for esophageal cancer per 1-allele change in eight SNPs were estimated on three different subject groups (entire population, subjects with the rs671 GG genotype only, and subjects with the rs671 GA genotype only) using a logistic regression model adjusted for sex, age, the first 10 principal components (for the BBJ Study), and the study version (for the HERPACC Study). Study-specific ORs were then pooled using a random-effects model (27). We evaluated the extent of between-study heterogeneity by the Cochran Q-statistic and I2-statistic (28).
The interaction between rs671 and each of the variants under study was evaluated on both additive and multiplicative scales. Carriers of the rs671 AA genotype were excluded from this analysis. First, we estimated the study-specific coefficients of each SNP (per 1-allele change), rs671 (GA vs. GG), and the product term of each SNP and rs671 using the logistic regression model. For the HERPACC Study, each SNP was coded as follows: the Ref/Ref genotype was coded as 0, the Ref/Alt genotype as 1, and the Alt/Alt genotype as 2. For the BBJ Study, each SNP was the imputed genotype coded as [0,2] for each SNP. For the rs671 phenotype, the GG genotype was coded as 0, and the GA genotype was coded as 1. We then obtained pooled estimates ?SNP, ?rs671 and ?interaction of the model coefficients corresponding respectively to the effect of the SNP, rs671, and their interaction term (the product term for each SNP and rs671) using multivariate meta-analysis (29) to account for the fact that coefficients estimated from the same study are correlated. Specifically, we conducted random-effects multivariate analyses based on likelihood maximization using the study-specific coefficients ?SNP, ?rs671 and ?interaction as well as their covariance matrix. Multiplicative interaction was measured by the summary OR associated with the interaction term with its corresponding 95% confidence interval. As the measure of additive interaction, we estimated the relative excess risk due to interaction (RERI) (30). Confidence intervals for RERI were estimated by the Delta Method (31) and P-values were based on the Wald test. Statistical significance was set at the Bonferroni corrected threshold of P < 0.05/8 (= 0.00625) and suggestive significance was set at P <0.05. RERI was considered to achieve suggestive significance (P < 0.05) when its confidence interval did not include 0. Analyses were performed with R version 4.1.2 (The R Foundation for Statistical Computing) or STATA version 17.0 (Stata Corporation, College Station, TX, USA).
References
1. Suzuki S, Goto A, Nakatochi M, Narita A, Yamaji T, Sawada N, et al. Body mass index and colorectal cancer risk: A Mendelian randomization study. Cancer Sci 2021;112:1579-88
2. Funada S, Kawaguchi T, Terada N, Negoro H, Tabara Y, Kosugi S, et al. Nagahama Study Group, Cross-Sectional Epidemiological Analysis of the Nagahama Study for Correlates of Overactive Bladder: Genetic and Environmental Considerations. J Urol 2018;199:774-8
3. Hirata M, Kamatani Y, Nagai A, Kiyohara Y, Ninomiya T, Tamakoshi A, et al. BioBank Japan Cooperative Hospital Group, K. Matsuda, Cross-sectional analysis of BioBank Japan clinical data: A large cohort of 200,000 patients with 47 common diseases. J Epidemiol 2017;27:S9-S21
4. Nagai A, Hirata M, Kamatani Y, Muto K, Matsuda K, Kiyohara Y, et al. BioBank Japan Cooperative Hospital Group, M. Kubo, Overview of the BioBank Japan Project: Study design and profile. J Epidemiol 2017;27:S2-S8
5. Hamajima N, Matsuo K, Saito T, Hirose K, Inoue M, Takezaki T, et al. Gene-environment Interactions and Polymorphism Studies of Cancer Risk in the Hospital-based Epidemiologic Research Program at Aichi Cancer Center II (HERPACC-II). Asian Pac J Cancer Prev 2001;2:99-107
6. Hamajima N. J-MICC Study Group, The Japan Multi-Institutional Collaborative Cohort Study (J-MICC Study) to detect gene-environment interactions for cancer. Asian Pac J Cancer Prev 2007;8:317-23
7. Wakai K, Hamajima N, Okada R, Naito M, Morita E, Hishida A, et al. J-MICC Study Group, Profile of participants and genotype distributions of 108 polymorphisms in a cross-sectional study of associations of genotypes with lifestyle and clinical factors: a project in the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study. J Epidemiol 2011;21:223-35
8. Tsugane S, Sawada N. The JPHC study: design and some findings on the typical Japanese diet. Jpn J Clin Oncol 2014;44:777-82
9. Hozawa A, Tanno K, Nakaya N, Nakamura T, Tsuchiya N, Hirata T, et al. Study Profile of the Tohoku Medical Megabank Community-Based Cohort Study. J Epidemiol 2021;31:65-76
10. The 1000 Genomes Project Consortium, A. Auton, L. D. Brooks, R. M. Durbin, E. P. Garrison, H. M. Kang, J. O. Korbel, J. L. Marchini, S. McCarthy, G. A. McVean, G. R. Abecasis, A global reference for human genetic variation. Nature 2015;526:68-74
11. Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nature methods 2013;10:5-6
12. Loh PR, Palamara PF, Price AL. Fast and accurate long-range phasing in a UK Biobank cohort. Nature genetics 2016;48:811-6
13. Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nature genetics 2016;48:1284-7
14. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS genetics 2009;5:e1000529
15. Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature genetics 2007;39:906-13
16. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 2015;4:7
17. B. K. Bulik-Sullivan, P. R. Loh, H. K. Finucane, S. Ripke, J. Yang, Schizophrenia Working Group of the Psychiatric Genomics Consortium, N. Patterson, M. J. Daly, A. L. Price, B. M. Neale, LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics 2015;47:291-5
18. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26:2190-1
19. Freedman ML, Reich D, Penney KL, McDonald GJ, Mignault AA, Patterson N, et al. Assessing the impact of population stratification on genetic association studies. Nature genetics 2004;36:388-93
20. Manning AK, LaValley M, Liu CT, Rice K, An P, Liu Y, et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP x environment regression coefficients. Genet Epidemiol 2011;35:11-8
21. Aschard H, Hancock DB, London SJ, Kraft P. Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Hum Hered 2010;70:292-300
22. Westerman KE, Pham DT, Hong L, Chen Y, Sevilla-Gonzalez M, Sung YJ, et al. GEM: scalable and flexible gene-environment interaction analysis in millions of samples. Bioinformatics 2021;37:3514-20
23. Ito H, McKay JD, Hosono S, Hida T, Yatabe Y, Mitsudomi T, et al. Association between a genome-wide association study-identified locus and the risk of lung cancer in Japanese population. J Thorac Oncol 2012;7:790-8
24. Koyanagi YN, Ito H, Oze I, Hosono S, Tanaka H, Abe T, et al. Development of a prediction model and estimation of cumulative risk for upper aerodigestive tract cancer on the basis of the aldehyde dehydrogenase 2 genotype and alcohol consumption in a Japanese population. Eur J Cancer Prev 2017;26:38-47
25. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics 2010;26:2867-73
26. World Health Organization, International classification of diseases for oncology, 3rd ed. Geneva, Switzerland: World Health Organization. 2000
27. DerSimonian R, Laird N. Meta-analysis in clinical trials revisited. Contemp Clin Trials 2015;45:139-45
28. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539-58
29. White IR. Multivariate Random-effects Meta-analysis. Stata J 2009;9:40-56
30. VanderWeele TJ, Knol MJ. A Tutorial on Interaction. Epidemiol Methods 2014;3:33-72
31. Hosmer DW, Lemeshow S. Confidence interval estimation of interaction. Epidemiology 1992;3:452-6
Usage notes
No special software are required to open the files.