Data from: Urinary metabolomics and proteomics for early detection of gastric cancer: Insights from a two-center multicenter study
Data files
Mar 30, 2026 version files 54.07 MB
-
Metabolomics_processed_data.xlsx.xlsx
3.98 MB
-
Metabolomics_raw_data.xlsx.xlsx
47.57 MB
-
metadata.xlsx
12.87 KB
-
Proteomics_processed_data.xlsx.xlsx
213.54 KB
-
Proteomics_raw_data.xlsx.xlsx
2.29 MB
-
README.md
4.23 KB
Abstract
This study explored a non-invasive strategy to detect gastric cancer by integrating urinary metabolomics and proteomics, aiming to uncover biomarkers and elucidate molecular mechanisms underlying disease progression. Urine samples were collected from 30 advanced gastric cancer (AGC) patients, 30 early gastric cancer (EGC) patients, and 30 healthy controls across two centers. Using UHPLC-MS, 350 differential metabolites were identified in AGC versus controls and 285 in EGC versus controls, mainly related to amino acid, bile acid, and energy metabolism. Key metabolites, including butyrate, indolelactic acid, D-ribose-5-phosphate, and serine, were selected through Random Forest and Boruta algorithms for diagnostic modeling. Proteomic profiling with TMT labeling revealed 376 differentially abundant proteins in AGC and 191 in EGC, enriched in immune response, cell adhesion, and protein hydrolysis pathways. Proteins such as TNFRSF12A, ITGB3, HSPA8, and FTL showed significant regulation, with TNFRSF12A upregulated and HSPA8 downregulated in AGC, while ITGB3 and FTL were upregulated in EGC. These proteins were linked to pathways including cell adhesion molecules, ECM–receptor interaction, platelet activation, HIF-1 signaling, glycolysis/gluconeogenesis, and antigen processing/presentation. Integrated KEGG analysis highlighted 43 enriched pathways in AGC and 30 in EGC, spanning amino acid metabolism, the TCA cycle, PI3K-Akt signaling, and immune response mechanisms. Overall, the combination of urinary metabolomics and proteomics demonstrated potential for non-invasive detection of gastric cancer, identifying biomarkers and pathways of diagnostic and clinical relevance, with further validation needed for translation into clinical practice.
Dataset DOI: 10.5061/dryad.zkh1893rf
Description of the data and file structure
1.Data Description
This dataset supports the study entitled “Urinary Metabolomics and Proteomics for Early Detection of Gastric Cancer: Insights from a Two-Center Multicenter Study.”
The dataset contains urinary metabolomics and proteomics data collected from three groups: advanced gastric cancer (AGC), early gastric cancer (EGC), and healthy controls (CG). Metabolomics data were generated using UHPLC-MS, and proteomics data were obtained using a TMT-based quantitative proteomics approach.
The dataset includes raw metabolomics data and processed proteomics data used for differential analysis and biomarker identification.
All human-derived data have been fully de-identified, and no personally identifiable information (PII) is included.
Files and variables
This dataset includes the following files:
-
Metabolomics_raw_data.xlsx
Raw metabolomics data acquired using UHPLC-MS. Each row represents a metabolite feature, and each column represents sample intensity values or annotations. -
Proteomics_processed_data.xlsx
Processed proteomics data including differential expression analysis results. -
metadata.xlsx
Description of variables, abbreviations, and units used in the dataset.Metabolomics data variables include:
- MetaboName: Name of metabolite
- Alignment ID: Alignment identifier
- KEGGID: KEGG database identifier
- HMDBID: Human Metabolome Database identifier
- SuperClass, Class, SubClass: Chemical classification
- RT(min): Retention time
- m/z: Mass-to-charge ratio
- Adduct type: Ion type detected
- Formula: Molecular formula
- Ontology: Functional classification
Sample columns include: - AGC-1 to AGC-30: Advanced gastric cancer samples
- EGC-1 to EGC-30: Early gastric cancer samples
- CG-1 to CG-30: Control samples
- QC1–QC10: Quality control samples
Proteomics data variables include: - Accession: Protein ID
- Gene Symbol: Gene symbol
- Protein Name: Protein name
- ENTREZID: Gene identifier
Statistical variables include: - ttestPvalue: P-value
- ControlMean, ControlSD, ControlSE, ControlCV: Control group statistics
- TestMean, TestSD, TestSE, TestCV: Test group statistics
- FC: Fold change
- Log2FC: Log2 fold change
- fdr: False discovery rate
- threshold: Differential expression classification
Code/software
Metabolomics data were processed using standard UHPLC-MS data processing workflows, including peak alignment, normalization, and filtering.
Proteomics data were analyzed using TMT-based quantification pipelines.
Statistical analysis and biomarker selection were performed using software such as R and GraphPad Prism. Functional enrichment analyses (GO and KEGG) were conducted using standard bioinformatics tools.
Detailed procedures are described in the associated manuscript.
Access information
This dataset is publicly available at: https://doi.org/10.5061/dryad.zkh1893rf
All data are anonymized and comply with ethical standards.
For further information, please contact:
Yadan Wang
wangyadan06@163.com
Human subjects data
All data involving human participants have been fully de-identified prior to submission. No personally identifiable information (PII) is included in this dataset.
All participants provided informed consent for their data to be used for research purposes and to be shared in anonymized form in the public domain.
De-identification procedures included the removal of direct identifiers (such as names, contact information, and identification numbers) and indirect identifiers that could potentially lead to re-identification. Where necessary, data were aggregated or generalized to further protect participant privacy.
This study was conducted in accordance with relevant ethical guidelines and regulations, and was approved by the appropriate institutional review board (IRB) or ethics committee.
