Causal inference and risk prediction of gestational diabetes mellitus based on case-control study and Mendel randomization
Data files
Oct 24, 2025 version files 71.92 KB
Abstract
Aim: To evaluate the causal determinants and their risk predictive efficacy of gestational diabetes mellitus (GDM) in Chinese population.
Methods: Genotyping data for candidate genetic variants were collected from 554 cases of GDM and 641 pregnant women with normal glucose tolerance. The associations between these variants and GDM risk were evaluated with the odds ratios (ORs) and their corresponding 95% confidence intervals (CIs). Multivariate mendelian randomization (MVMR) was employed to validate the GDM causal factors. Subsequently, a GDM early risk prediction nomogram model was developed based on the key clinical and genetic factors identified.
Result: After adjusting age and pre-pregnancy BMI (pre-BMI), the rs6127416 variant showed a significant association with susceptibility to GDM. Comparing the AA genotype to the TT genotype, the adjusted odds ratio (OR) was 2.20 (95%CI = 1.53-3.18, P <0.001), and comparing AA to TT/TA genotypes, the adjusted OR was 2.35 (95%CI = 1.68-3.30, P <0.001). MVMR analysis confirmed the positive causal effects of pre-BMI and fasting plasma glucose (FPG) on GDM (pre-BMI-ORMVMR = 1.80, FPG-ORMVMR = 12.37,* P* < 0.001). A nomogram risk predictive model incorporating pre-BMI, FPG, and rs6127416 demonstrated an area under the ROC curve of 0.808.
Conclusion: Pre-BMI and FPG were determined to be causal factors linked to GDM. The prediction model constructed using key clinical and genetic variables (such as rs6127416-preBMI-FPG) holds promising utility for personalized risk assessment of GDM in the initial trimester of pregnancy, with potential to support early identification of high-risk women and facilitate timely lifestyle or clinical interventions during antenatal care.
This dateset include 554 gestational diabetes mellitus (GDM) patients and 641 healthy pregnancies' baseline information obtained from unique questionnaire and medical records, and the genetic loci were genotyped by the Sequenom MassARRAY platform. The clinical indicators and genetic variants significantly statistical associated with GDM were used to construct nomogram model. Nomogram model is formulated a standard of scoring based on the regression coefficient (β) of indicators. Each level of the indicators will be given specific score and the scores of each factor are added up to get the total point, which can use to predict the the probability of GDM occurrence.
File: Causal_inference_and_risk_prediction_of_gestational_diabetes_mellitus_based_on_case-control_study_and_Mendel_randomization.csv
Subjects' baseline data involve systolic blood pressure (SBP, mmHg), diastolic blood pressure (DBP, mmHg), fasting plasma glucose (FPG, mmol/L), oral glucose tolerance test 1h plasma glucose (1hPG, mmol/L), oral glucose tolerance test 2h plasma glucose (2hPG, mmol/L), glycated hemoglobin (HbA1c, %), triglyceride (TG, mmol/L), total cholesterol (TC, mmol/L), high-density lipoprotein cholesterol (HDL-c, mmol/L), low-density lipoprotein cholesterol (LDL-c, mmol/L). The data of genetic polymorphism include rs6127416 T>A and their corresponding dominant and recessive genetic model.
The meaning of the different assignments in the dataset:
1)Group: 1=GDM; 0=Control group
2)Sample ID: Identification number of test sample
3)Variants and genotypes assignment
- rs6127416 T>A : 1=TT genotype; 2= TA genotype; 3=AA genotype
- rs6127416 Dominant model(TA/AA vs. TT): 1=TT genotype; 4=TA/AA genotypes
- rs6127416 Recessive model(AA vs. TT/TA): 3=AA genotype; 5=TT/TA genotypes
Human subjects data
All participants were provided written informed consent. The personal identity information data of the subjects have been anonymized.
Study population
In the initial discovery phase, a genome-wide association study (GWAS) was conducted to identify GDM-associated SNPs (GDM-SNPs). The study included 96 GDM patients and 96 age- and pre-BMI-matched healthy pregnant women, all recruited during the same period, using the Infinium Asian Screening Array (ASA, Illumina) BeadChip.
For the validation phase, a total of 1195 singleton pregnant women (554 GDM patients and 641 healthy controls) with similar characteristics were enrolled at a hospital over a two-year period for genotyping of candidate SNPs. GDM was based on the criteria of the 75g oral glucose tolerance test (OGTT) conducted at 24-28 weeks of gestation, with fasting plasma glucose (FPG) ≥ 5.1mmol/L or 1-hour plasma glucose (1hPG) ≥ 10.0 mmol/L, or 2hPG ≥ 8.5 mmol/L as per the guidelines established by the International Association of Diabetes and Pregnancy Study Group (IADPSG).(24)
Subjects meeting the following inclusion criteria were enrolled: residency in the the study region for over 2 years, singleton pregnancy, and absence of close familial ties. Exclusion criteria encompassed pregnancies with endocrine disorders, severe systemic illnesses, a history of pre-existing type 1 or type 2 diabetes mellitus, or prolonged use of medications affecting glucose metabolism prior to pregnancy. Approval of this research protocol was obtained from the institutional ethics committee.
Clinical and biochemical characteristics
Clinical data of the subjects was collected using a standardized questionnaire and medical records, encompassing age, height, weight, systolic pressure (SBP), diastolic blood pressure (DBP), FPG, 1hPG, 2hPG, glycosylated hemoglobin (HbA1c), triglyceride (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-c) and low-density lipoprotein cholesterol (LDL-c), etc. Besides, the pre-pregnancy body mass index (pre-BMI) was calculated as weight (kg) / height (m) 2.
Genomic DNA extraction, variants selection and genotyping
The genomic DNA was extracted from EDTA-treated peripheral whole blood using a DNA extraction kit (Aidlab Biotechnologies Co., Ltd, China) and stored at -80℃. Candidate functional SNPs were identified based on our prior analysis using the Infinium Asian Screening Array (ASA) BeadChip, with selection criteria set at a significance level of P ≤ 5*10-4. Subsequently, SNP Function Prediction (FuncPred) tool (http://manticore.niehs.nih.gov/snpinfo/snpfunc.html) was employed to screen for
potential functional variants in the Chinese Han population in Beijing (CHB) with minimum allele frequencies exceeding 0.5.
The SNP was genotyped by the Sequenom MassARRAY platform. The PCR mix consisted of 1.0 μL of template DNA (20100 ng/μL), 1.850 μL of ddH2O, 0.625 μL of 1.25×PCR buffer (15 mmol/L MgCl2), 0.325 μL of 25 mmol/L MgCl2~, 0.1 μL of 25 mmol/L dNTPs, 1 μL of 0.5 μmol/L primer mix, and 0.1 μL of 5 U/μL HotStar Taq polymerase. The PCR cycling conditions included an initial denaturation at 94℃ for 15 min, followed by 45 cycles of denaturation at 94℃ for 20 s, annealing at 56℃ for 30 s, extension at 72℃ for 1 min, and a final extension at 72℃ for 3 min. Finally, the original data and genotyping plots were generated using TYPER 4.0 software.
Statistical analysis
The data analysis was conducted using IBM SPSS Statistics 28 for Windows (IBM Corp., Armonk, NY, USA) and R 4.3.1 software. Hardy-Weinberg equilibrium was detected by a c2 goodness-of-fit test. The independent samples t-tests were employed to compare the distribution differences of clinical and biochemical variables between cases and controls, presented as the mean ± sd. The odds ratios (ORs) and their corresponding 95% confidence intervals (CIs) were calculated to evaluate the association between variants and GDM risk. Statistical significance was set at a two-sided P<0.05. Stratified analysis was conducted to evaluate the association between positive SNP and GDM risk among different subgroups, categorized by the mean value of variables. Additionally, false-positive reporting probability (FPRP) analysis was utilized to address chance associations that could potentially lead to false-positive association findings.
Given the observed interaction between clinical factors (pre-BMI, DBP, FPG and HbA1c) and genetic variants in the study, we hypothesized that they potentially influence the risk of GDM. Consequently, both univariate and multivariate MR analyses were performed. Exposure data were extracted from the IEU Open GWAS project (https://gwas.mrcieu.ac.uk/) with the following GWAS IDs: BMI (ukb-b-19953, DBP (ebi-a-GCST90018952), FPG (ebi-a-GCST90002232), and HbA1c (ieu-b-4842), while outcome data were obtained from FinnGen Consortium (GWAS ID: finngen_R10_GEST_DIABETES). Detailed information regarding the data sources is provided in Supplement Table 1.
We identified independent single-nucleotide polymorphisms with low linkage disequilibrium (r2<0.001) that showed significant associations with exposure factors (P<5×10-8) and calculated the F-statistic using the equation F-statistic=beta2/se2,(19) where an F-statistic>10 indicated adequate instrument strength. The primary MR analysis was conducted using the inverse variance weighting (IVW) method. Additional sensitivity analyses were performed using the MR-Egger, weighted median, simple mode and weighted mode.(25) The heterogeneity and pleiotropy were assessed by Cochran’s Q statistic and MR Pleiotropy RESidual Sum and Outlier (MR-PRESSO) methods. The directional validity of the causal relationships between exposure and outcome was assessed using the MR-Steiger test.(26) Additionally, multivariate MR (MVMR) was conducted to assess whether confounding factors influenced the causal relationship between validaexposure and outcome. All statistical analyses were carried out using R (v4.3.1) with the R packages “TwoSampleMR”, “MR-PRESSOR”, and “MendelianRandomization”.
A predictive nomogram integrating clinical risk factors and positive SNPs was developed to assess the risk of GDM. Scores corresponding to each risk factor were aggregated to derive a total score, facilitating risk evaluation. The subjects were randomly divided into training and validation sets at a ratio of 7:3. Receiver operating characteristic (ROC) curves and calibration plots were generated, and sensitivity and specificity were calculated to evaluate the predictive ability of the nomogram. Meanwhile, decision curve analysis (DCA) was performed to evaluate the clinical utility and benefit of the nomogram. The analyses were conducted using R packages (v4.3.3).
