Elevated expression of TUBA1C in breast cancer predicts poor prognosis
Zhao, Yi (2022), Elevated expression of TUBA1C in breast cancer predicts poor prognosis, Dryad, Dataset, https://doi.org/10.5061/dryad.5x69p8d4r
α1C-tubulin (TUBA1C) is a member of the α-tubulin family and has served as a potential biomarker in a variety of cancers in many studies. In this study, the gene expression profile of TUBA1C in The Cancer Genome Atlas (TCGA) was extracted for analysis, and the prognostic value of TUBA1C in breast cancer was comprehensively evaluated. The Wilcoxon signed-rank test, Kruskal-Wallis test, and logistic regression analysis were performed to confirm the correlations between TUBA1C expression and the clinical characteristics of breast cancer patients. The effect of TUBA1C expression on the survival of breast cancer patients was assessed by Kaplan-Meier curve, Cox regression analysis, and the Kaplan-Meier plotter (an online database). The TCGA data set was used for the Gene Set Enrichment Analysis (GSEA). The results confirmed that high TUBA1C expression in breast cancer was closely correlated with survival time, survival status, and tumor size. In addition, elevated TUBA1C expression can predict poor overall survival (OS), recurrence-free survival (RFS), and distant metastasis-free survival (DMFS). Univariate and multivariate analyses (Cox regression analyses) confirmed that TUBA1C was an independent prognostic factor for the OS of breast cancer patients. The GSEA identified that the high TUBA1C expression phenotype was differentially enriched in cell cycle, basal transcription factor, P53 signaling pathway, pathways in cancer, TOLL-like receptor signaling pathway, and NOD-like receptor signaling pathway. In summary, high messenger RNA (mRNA) expression of TUBA1C is an independent risk factor for poor prognosis of breast cancer.
2.1 Ethical statement
This study was approved by the Ethics Committee of Qinghai University Affiliated Hospital. All experimental data were derived from public databases, thus ensuring that informed consent was obtained for all data used in the study.
2.2 RNA sequencing (RNA-seq) gene data for patients and bioinformatics analysis
The gene expression data in this study and the corresponding clinical patient data were obtained from the TCGA database(TCGA, http//gdc.cancer.gov/). After exclusion of incomplete data, the RNA-seq gene expression data and the corresponding clinical data for 1085 breast cancer patients were collected. The differential expression, correlation analysis of clinical characteristics, univariate Cox analysis, multivariate Cox analysis, and logistic regression analysis were performed using R software (version 4.0.3).
2.3 Gene Expression Profiling Interactive Analysis (GEPIA) dataset
GEPIA (http://gepia.cancer-pku.cn/) is a new advanced interactive web server for analyzing RNA-seq gene expression data, including data from 9736 tumor samples and 8587 normal samples. The included samples are all from the TCGA database and the Genotype Tissue Expression (GTEx) project. GEPIA has a variety of analytical functions, such as online analysis of differential expression between tumor and normal tissues, survival analysis, analysis based on different cancers or pathological stages, and the ability to search for similar genes.
2.4 Kaplan-Meier plotter
The Kaplan-Meier plotter (http://kmplot.com/analysis/) is a prognosis-related online analysis tool, which was used to analyze the prognostic value of the TUBA1C gene in breast cancer tissues. To analyze the prognostic indicators, i.e., OS, PPS, RFS, and DMFS, of breast cancer patients, breast cancer tissues were divided into high expression and low expression groups according to the median expression of TUBA1C messenger RNA (mRNA) and were evaluated using the Kaplan-Meier plotter. A p value < 0.05 indicated statistical significance.
GSEA is an analysis tool for whole-genome expression microarray data that can construct a molecular signature database based on information about gene location, function, and biological significance. Hybridization data of the expression profiles of a set of genes in two biological states were analyzed to determine statistical significance. In this study, raw data were processed in batches using GSEA to analyze the signaling pathways involved in the TUBA1C high expression group and the TUBA1C low expression group. TUBA1C expression was identified using phenotypic markers. The nominal p value and normalized enrichment score (NES) were used to sort the enriched pathways, with 1000 sorts per analysis.
2.6 The Human Protein Atlas (HPA)
The HPA database (https://www.proteinatlas.org/) provides information on the distribution of 24,000 human proteins in various tissues and cells and is an open public database[19, 20]. The expression and distribution of each protein in normal human tissues, cancer tissues, and cell lines are verified. All the obtained results are reviewed and labeled by professionals to ensure that the results are fully representative. In this study, the protein expression distribution of TUBA1C in different human tissues was analyzed through the HPA database, and differences in TUBA1C expression between normal breast tissue and cancer tissue were observed, which provides a foundation for subsequent experimental validation.
2.7 Statistical analysis
The correlations between TUBA1C expression and OS, PPS, DMFS, and RFS were determined using the Kaplan-Meier plotter, and other statistical analyses were completed using R software (version 4.0.3). The Wilcoxon signed-rank test, Kruskal-Wallis test, and logistic regression analysis were performed to analyze the correlations between TUBA1C expression and the clinical characteristics of patients in the TCGA database. The median expression of TUBA1C mRNA was used to divide patients into the high and low expression groups. Univariate Cox analysis was used to analyze potential prognostic factors. Multivariate Cox analysis was performed to verify the correlations between TUBA1C expression and clinicopathological features as well as survival. P < 0.05 was considered statistically significant.
The gene expression data in this study and the corresponding clinical patient data were obtained from the TCGA database (TCGA, http//gdc.cancer.gov/)