Data from: The association between Vapor Pressure Deficit and arthritis: Evidence from a 10-year longitudinal study of middle-aged and elderly Chinese adults
Abstract
Objective: To investigate the association between vapor pressure deficit (VPD) and arthritis in middle-aged and older Chinese adults.
Methods: This study utilized data from the China Health and Retirement Longitudinal Survey (CHARLS) spanning the years 2011 to 2020. Participants without arthritis in 2011 were selected as the study population, with VPD designated as the primary exposure factor and newly diagnosed arthritis cases as the outcome variable. Logistic regression models were used to estimate the association between VPD and incident arthritis. Restricted cubic spline (RCS) analyses were conducted to assess potential nonlinearity. Subgroup analyses were performed to examine effect heterogeneity across population subgroups.
Results: A total of 4615 subjects were included, and a total of 1317 subjects were reported to be diagnosed with arthritis during approximately 10 years of follow-up (2011–2020). The VPD level in the arthritis group was lower than that in the non-arthritis group (5.184 ± 0.828 vs 5.291 ± 0.818, p<0.001). All logistic regression models showed that VPD was linearly related to the incidence of arthritis, and the relationship remained consistent even when VPD was categorized. RCS analysis showed that the incidence of arthritis decreased significantly with increasing VPD (p<0.05), especially when VPD was lower than 5.28. Subgroup analysis indicated that VPD exerted a stronger protective effect against arthritis among rural residents (p for interaction = 0.006).
Conclusion: VPD was found to be negatively associated with the incidence of arthritis among middle-aged and elderly populations, with a particularly stronger effect observed in rural residents. These findings highlight VPD as an environmental factor associated with arthritis and may help improve understanding of environmental influences on arthritis development.
Dataset DOI: 10.5061/dryad.bzkh189q9
Corresponding authors:
- Yongyang Wu (wuyyfj@126.com)
- Chaoyan Xu (charmyarn@sina.com)
Dataset overview
This Dryad package contains the primary data used to generate the results reported in the manuscript, including:
- The analytic dataset of 4,615 CHARLS participants (baseline 2011, followed through 2020) who were free of arthritis at baseline and had complete covariates after exclusions described in the manuscript.
- Individual-level outcome (incident physician-diagnosed arthritis during follow-up), demographic variables, lifestyle factors, comorbidities, sleep duration, and atmospheric humidity indices (including VPD) linked to participants' county of residence.
Description of the data and file structure
This dataset was generated through a longitudinal observational study investigating the association between atmospheric moisture conditions (particularly Vapor Pressure Deficit) and arthritis incidence in middle-aged and elderly Chinese populations. The experimental efforts involved:
1. Prospective Cohort Design
- Established a 9-year follow-up period (2011-2020) using the China Health and Retirement Longitudinal Study (CHARLS)
- Enrolled 17,705 baseline participants without arthritis in 2011
- Conducted biennial follow-up surveys (2013, 2015, 2018, 2020) to track arthritis incidence
2. Multi-source Data Integration
- Collected individual-level health, demographic, and lifestyle data through standardized CHARLS questionnaires
- Acquired high-resolution (1 km × 1 km) atmospheric data from HiMIC-Monthly satellite observations
- Spatially joined environmental data with participant residential locations using geographic information systems
3. Rigorous Quality Control
- Implemented strict inclusion/exclusion criteria to ensure data integrity
- Applied multiple imputation techniques for missing covariate data
- Validated environmental data against ground meteorological stations (R² > 0.96)
4. Advanced Statistical Analysis
- Employed Cox proportional hazards models to assess VPD-arthritis associations
- Conducted comprehensive sensitivity analyses and subgroup assessments
- Utilized machine learning (XGBoost with SHAP) to identify key predictive factors and non-linear relationships
The experimental design enabled robust examination of environmental determinants of arthritis while controlling for numerous potential confounders through sophisticated statistical modeling.
g.xlsx
- Description: Main analytic dataset used for all statistical analyses in the manuscript.
- Format: Microsoft Excel (.xlsx)
- Sheet(s): Sheet1 (one row per participant; n = 4,615; columns = 22)
- Notes on confidentiality: The variable 'id' is a de-identified participant identifier. No direct personal identifiers are included. Age and BMI have been binned into categorical ranges to protect participant privacy and mask outliers, in compliance with PLOS ONE Human Subjects guidelines.
Variables
- id: De-identified participant identifier.
- arth: Incident arthritis during follow-up (2013/2015/2018/2020) among participants without arthritis at baseline (2011), based on the CHARLS question: "Did the doctor tell you that you have arthritis?" Values: yes / no
- Sex: Biological sex. Values: male/female
- Age: Age at baseline (2011), binned into 5-year intervals to protect participant privacy. Values: 45-49 / 50-54 / 55-59 / 60-64 / 65-69 / 70-74 / 75-79 / ≥80
- MaritalStatus: Marital status at baseline. Values: Married / Non-Married
- EducationalLevel: Highest educational attainment at baseline. Values include:
- Elementary school and below
- Middle school
- High school and higher
- ResidencePlace: Place of residence at baseline. Values: urban / rural
- Race: Ethnicity. Values: Han / others
- BMI: Body mass index at baseline (kg/m^2), binned into categories based on Chinese obesity classification criteria (WGOC) to protect participant privacy. Values: <18.5 / 18.5-23.9 / 24.0-27.9 / ≥28.0 | Categories correspond to: Underweight / Normal weight / Overweight / Obese
- Smoke: Smoking status at baseline. Values: yes / no
- Alcohol: Alcohol drinking status at baseline. Values: yes / no
- Hypertension: Self-reported physician-diagnosed hypertension at baseline. Values: yes / no
- DM: Self-reported physician-diagnosed diabetes mellitus at baseline. Values: yes / no
- CVD: Self-reported cardiovascular disease at baseline. Values: yes / no
- night: Nighttime sleep duration at baseline (hours per night).
- RH: Relative humidity (annual average for January–December 2011), extracted from HiMIC-Monthly and linked to the participant's county of residence.
- AVP: Actual vapor pressure (annual average for January–December 2011), from HiMIC-Monthly.
- VPD: Vapor pressure deficit (annual average for January–December 2011), from HiMIC-Monthly. Definition (as used in the manuscript): VPD is the difference between saturated vapor pressure and actual vapor pressure; larger values indicate drier air.
- DPT: Dew point temperature (annual average for January–December 2011), from HiMIC-Monthly.
- MR: Mixing ratio (annual average for January–December 2011), from HiMIC-Monthly.
- SH: Specific humidity (annual average for January–December 2011), from HiMIC-Monthly.
- Agegroup: Age category at baseline. Values: < 60 / ≥60
Data sources (as described in the manuscript)
- Health and covariate data: China Health and Retirement Longitudinal Survey (CHARLS), baseline 2011 with follow-ups in 2013, 2015, 2018, and 2020.
- Atmospheric humidity indices (RH, AVP, VPD, DPT, MR, SH): HiMIC-Monthly (1 km × 1 km, monthly, 2003–2020). In this study, annual averages for January–December 2011 were calculated and linked to CHARLS participants at the county level using spatial overlay/county code matching as described in the manuscript.
License/reuse
This dataset is published under the CC0 1.0 Universal Public Domain Dedication. Please cite the associated PLOS ONE manuscript when using these data. Users are also encouraged to comply with the data use requirements of the original data sources (CHARLS: https://charls.pku.edu.cn; HiMIC-Monthly: http://data.tpdc.ac.cn).
CHARLS data use and CC0 license compatibility
The original CHARLS microdata are publicly available for academic use upon registration at https://charls.pku.edu.cn. The dataset deposited here is NOT the original CHARLS microdata; it is a derived, de-identified, minimal analytic dataset created specifically for reproducing the analyses reported in the associated manuscript. Key transformations include: (1) extensive sample exclusions (from 17,705 to 4,615 participants); (2) removal of all direct identifiers; (3) binning of continuous quasi-identifiers (Age and BMI) into categorical ranges; (4) merging with externally derived atmospheric humidity indices from HiMIC-Monthly; and (5) retention of only the minimal set of variables (22 columns) required for replication. Users of this dataset should also comply with the CHARLS Data Use Agreement where applicable and cite the CHARLS study accordingly.
Software and analysis
All statistical methods (logistic regression, restricted cubic spline analyses, and subgroup analyses) are described in the manuscript. The Dryad package provides the input data needed to reproduce these analyses.
Human subjects data
This study was performed in line with the principles of the Declaration of Helsinki. Peking University Biomedical Ethics Review Committee (approval number: IRB00001052-11015) granted CHARLS the ethical approval. Before joining the group, each participant provided a written statement of informed consent. Because the data is public, this study does not need further ethical approval. CHARLS project website provides downloadable data and information (http://charls.pku.edu.cn).
