Data from: Evaluating lipid-driven insulin resistance via TyG index in breast cancer patients: Toward effective secondary prevention
Abstract
Breast cancer is the most commonly diagnosed malignancy worldwide. Insulin resistance (IR) plays a key role in its progression by activating oncogenic signaling pathways. The triglyceride-glucose (TyG) index is a validated, cost-effective surrogate marker for IR. This study aims to evaluate the prevalence of IR in female breast cancer patients using the TyG index and to identify lipid parameters associated with increased IR, thereby supporting strategies for secondary prevention.
A cross-sectional study was conducted among non-diabetic, histopathologically confirmed female breast cancer patients. Demographic data, lipid profiles, and fasting glucose levels were collected. Participants were stratified into high-risk (TyG ≥ 8.87) and low-risk (TyG < 8.87) groups based on their TyG index. Logistic regression analysis was performed to identify significant predictors of elevated TyG index.
Among 122 patients, 44.3% demonstrated elevated insulin resistance. Triglycerides (TG), total cholesterol (TC), VLDL-C, and the TC/HDL-C ratio were significantly higher in the high-risk group. Logistic regression identified TC, TC/HDL-C ratio, and LDL-C as significant predictors of elevated IR (p < 0.05). The model is represented as: Logit(P) = −13.941 + 0.145X₁ + 1.558X₂ − 0.178X₃, where X₁, X₂, and X₃ correspond to TC, TC/HDL-C ratio, and LDL-C, respectively. The predictive model achieved 90.2% accuracy with an area under the ROC curve (AUROC) of 0.927.
Monitoring lipid parameters and managing insulin resistance are crucial for enhancing breast cancer prognosis and potentially reducing progression.
Dataset DOI: 10.5061/dryad.kd51c5bjn
Description of the data and file structure
This dataset contains clinical, biochemical, and lifestyle parameters relevant to cardiometabolic health. The variables are structured in a tabular format, where each row represents a single individual and each column corresponds to a clinical, biochemical, or derived parameter. The dataset is intended for research purposes in the fields of cardiovascular disease risk assessment, metabolic health, epidemiology, and machine learning–based health prediction.
Variables and Definitions
File: data.csv
- Age (years) – Chronological age of the participant.
- Cholesterol (mg/dL) – Total serum cholesterol level.
- Triglyceride (mg/dL) – Serum triglyceride concentration.
- HDL (mg/dL) – High-density lipoprotein cholesterol ("good cholesterol").
- LDL (mg/dL) – Low-density lipoprotein cholesterol ("bad cholesterol"). Calculated or measured.
- VLDL (mg/dL) – Very low-density lipoprotein cholesterol.
- CH/HDL ratio – Total cholesterol to HDL ratio, an indicator of cardiovascular risk.
- HDL/CH ratio – HDL to total cholesterol ratio, often used as a protective marker.
- LDL/HDL ratio – Ratio of LDL to HDL cholesterol, reflecting atherogenic balance.
- VLDL/HDL ratio – Ratio of VLDL to HDL cholesterol.
- HDL/VLDL ratio – Ratio of HDL to VLDL cholesterol.
- TG/HDL ratio – Triglyceride-to-HDL ratio, associated with insulin resistance and cardiometabolic risk.
- Fasting blood glucose (mg/dL) – Plasma glucose level after overnight fasting.
- TyG index – Triglyceride-glucose index, a logarithmic function combining fasting glucose and triglycerides, used as a surrogate marker for insulin resistance.
- TyG category (low/medium/high) – Risk category classification based on TyG index cut-off values.
- Hypertension (Yes/No) – Presence of hypertension (self-reported or clinically diagnosed).
- Smoking history (Yes/No) – History of tobacco smoking.
- Family history (Yes/No) – Positive family history of cardiovascular or metabolic disease.
Code/software
The dataset is provided as a comma-separated values (CSV) file. It can be opened directly in:
- Microsoft Excel (any recent version)
- Free alternatives such as LibreOffice Calc (≥7.0) or Google Sheets
No proprietary or additional software is required to view the file.
No scripts or code are included with this submission.
Access information
Other publicly accessible locations of the data:
- Not applicable
Data was derived from the following sources:
- Not applicable
Human subjects data
<p>We have obtained the explicit consent from the participants to publish the de-identified data in the public domain by taking only 3 indirect identifiers and removed the personal details of the participants.</p>
