Data from: TIPE drives a cancer stem-like phenotype by promoting glycolysis via PKM2/HIF-1α axis in melanoma
Data files
Dec 13, 2024 version files 15.03 MB
-
Metabolomics_Data.xlsx
3.84 MB
-
README.md
3.12 KB
-
Transcriptomics_Data.xlsx
11.19 MB
Abstract
TIPE (TNFAIP8) has been identified as an oncogene and participates in tumor biology. However, how its role in the metabolism of tumor cells during melanoma development remains unclear. Here, we demonstrated that TIPE promoted glycolysis by interacting with pyruvate kinase M2 (PKM2) in melanoma. We found that TIPE induced PKM2 dimerization, thereby facilitating its translocation from the cytoplasm to the nucleus. TIPE-mediated PKM2 dimerization consequently promoted HIF-1α activation and glycolysis, which contributed to melanoma progression and increased its stemness features. Notably, TIPE specifically phosphorylated PKM2 at Ser 37 in an ERK-dependent manner. Consistently, the expression of TIPE was positively correlated with the levels of PKM2 Ser37 phosphorylation and cancer stem cell markers in melanoma tissues from clinical samples and tumor bearing mice. In summary, our findings indicate that the TIPE/PKM2/HIF-1α signaling pathway plays a pivotal role in promoting cancer stem cell properties by facilitating the glycolysis, which would provide a promising therapeutic target for melanoma intervention.
README: Data from: TIPE drives a cancer stem-like phenotype by promoting glycolysis via PKM2/HIF-1α axis in melanoma
https://doi.org/10.5061/dryad.ghx3ffc05
Description of the data and file structure
Transcriptomics Data
We conducted RNA sequencing analysis by overexpressing TIPE in G361 cells, which comprised three experimental groups and three control groups.
Metabolomics Data
We conducted a metabolomic analysis after interfering with TIPE in A375 cells, which consisted of six experimental groups and six control groups.
Files and variables
File: Metabolomics_Data.xlsx
Description:
ID: The identification number of the metabolite in mass spectrometry detection
m/z: Mass-to-charge ratio
rt (s): Retention time of the metabolite on the chromatography, also known as the peak emergence time, with the unit being seconds
Name: The name of the metabolite
Adduct: Information on the adduct ions of the compound
Score: A scoring value ranging from 0 to 1. The higher the value, the higher the degree of matching and the more reliable the qualitative result
HMDB: The HMDB number can be used to obtain detailed annotation information about the metabolite on the HMDB official website
KEGG: The corresponding C number in KEGG. The C number can be used to query detailed information about the metabolite and the metabolic pathways it participates in on the KEGG official website
Superclass: Parent class, representing the classification of the metabolite
Class: Subclass, further classifying the metabolite
Subclass: Sub-subclass, providing an even more detailed classification of the metabolite
Sample: The relative content of each metabolite in the corresponding sample. In this test, control group including NC1, NC2, NC3, NC4, NC5, NC6, and the treatment group including KD1, KD2, KD3, KD4, KD5, KD6
The missing values are displayed as "blank cells"
File: Transcriptomics_Data.xlsx
Description:
gene_id: A unique identifier for each gene.
samples: The standardized readcount values for each sample. The control group including N1, N2, and N3. The treatment group(overexpression of TIPE) including V1, V2, and V3.
log2FoldChange: The ratio of gene expression levels between the treatment group and the control group, followed by taking the logarithm to the base 2.
pvalue: P-value of significance test
padj: Adjusted p-value for multiple hypothesis testing
gene_name: The name assigned to the gene.
gene_chr: Chromosome name where the gene is located.
gene_start: Start position of the gene on the chromosome.
gene_end: End position of the gene on the chromosome.
gene_strand: Positive or negative strand information of the chromosome where the gene is located.
gene_length: The sum of all non-overlapping exon regions from the start to the end of the gene.
gene_biotype: Gene type, such as protein-coding gene, long non-coding RNA, etc.
gene_description: Functional description of the gene.
gene_tf_family: Transcription factor family annotation of the gene.
Used NA" to indicate missing values.
Methods
Transcriptomics
Sample collection and preparation
RNA quantification and qualification
① RNA degradation and contamination was monitored on 1% agarose gels.
② RNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN, CA, USA) .
③ RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA).
Library preparation for Transcriptome sequencing
A total amount of 1 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer(5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase(RNase H-). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 250~300 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95 °C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.
Clustering and sequencing (Novogene Experimental Department)
The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced on an Illumina Novaseq platform and 150 bp paired-end reads were generated.
Data Analysis
Quality control
Raw data (raw reads) of fastq format were firstly processed through in-house perl scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data. At the same time, Q20, Q30 and GC content the clean data were calculated. All the downstream analyses were based on the clean data with high quality.
Reads mapping to the reference genome
Reference genome and gene model annotation files were downloaded from genome website directly. Index of the reference genome was built using Hisat2 v2.0.5 and paired-end clean reads were aligned to the reference genome using Hisat2 v2.0.5. We selected Hisat2 as the mapping tool for that Hisat2 can generate a database of splice junctions based on the gene model annotation file and thus a better mapping result than other non-splice mapping tools.
Novel transcripts prediction
The mapped reads of each sample were assembled by StringTie (v1.3.3b) (Mihaela Pertea.et al. 2015) in a reference-based approach. StringTie uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate fulllength transcripts representing multiple splice variants for each gene locus.
Quantification of gene expression level
featureCounts v1.5.0-p3 was used to count the reads numbers mapped to each gene. And then FPKM of each gene was calculated based on the length of the gene and reads count mapped to this gene. FPKM, expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced, considers the effect of sequencing depth and gene length for the reads count at the same time, and is currently the most commonly used method for estimating gene expression levels.
Differential expression analysis
(For DESeq2 with biological replicates) Differential expression analysis of two conditions/groups (two biological replicates per condition) was performed using the DESeq2 R package (1.16.1). DESeq2 provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P-values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate . Genes with an adjusted P-value <0.05 found by DESeq2 were assigned as differentially expressed. (For edgeR without biological replicates) Prior to differential gene expression analysis, for each sequenced library, the read counts were adjusted by edgeR program package through one scaling normalized factor. Differential expression analysis of two conditions was performed using the edgeR R package (3.18.1). The P values were adjusted using the Benjamini & Hochberg method. Corrected P-value of 0.05 and absolute foldchange of 2 were set as the threshold for significantly differential expression.
GO and KEGG enrichment analysis of differentially expressed genes
Gene Ontology (GO) enrichment analysis of differentially expressed genes was implemented by the clusterProfiler R package, in which gene length bias wascorrected. GO terms with corrected Pvalue less than 0.05 were considered significantly enriched by differential expressed genes. KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-through put experimental technologies (http://www.genome.jp/kegg/). We used clusterProfiler R package to test the statistical enrichment of differential expression genes in KEGG pathways.
Metabolomics
Sample Extraction Method
After slowly thawing the sample at 4°C, an appropriate amount of the sample is taken and added to a pre-cooled methanol/acetonitrile/water solution (2:2:1, v/v). The mixture is vortexed for mixing, followed by low-temperature ultrasonic treatment for 30 minutes. The sample is then allowed to stand at -20°C for 10 minutes and centrifuged at 14,000 g at 4°C for 20 minutes. The supernatant is collected and vacuum-dried. For mass spectrometry analysis, 100 μL of acetonitrile/water solution (acetonitrile:water = 1:1, v/v) is added to redissolve the dried sample, which is then vortexed. The sample is centrifuged again at 14,000 g at 4°C for 15 minutes, and the supernatant is taken for injection and analysis.
Chromatographic Conditions
The samples were separated using an Agilent 1290 Infinity LC Ultra-High-Performance Liquid Chromatography (UHPLC) system with a HILIC column. The column temperature was maintained at 25°C, and the flow rate was set to 0.5 mL/min. The injection volume was 2 μL. The mobile phase composition was as follows: A = water + 25 mM ammonium acetate + 25 mM ammonia solution, and B = acetonitrile. The gradient elution program was as follows: 0 to 0.5 min, 95% B; 0.5 to 7 min, a linear decrease from 95% B to 65% B; 7 to 8 min, a linear decrease from 65% B to 40% B; 8 to 9 min, B maintained at 40%; 9 to 9.1 min, a linear increase from 40% B to 95% B; and 9.1 to 12 min, B maintained at 95%. During the entire analysis, the samples were kept in a 4°C autosampler. To avoid the impact of fluctuations in instrument detection signals, the samples were analyzed consecutively in a random order. Quality control (QC) samples were inserted into the sample queue to monitor and evaluate the stability of the system and the reliability of the experimental data.
Q-TOF Mass Spectrometry Conditions
The AB Triple TOF 6600 mass spectrometer was used to collect the first-order and second-order spectra of the samples.
The ESI source conditions after HILIC chromatographic separation were as follows: Ion Source Gas1 (Gas1): 60, Ion Source Gas2 (Gas2): 60, Curtain gas (CUR): 30, source temperature: 600°C, IonSpray Voltage Floating (ISVF): ±5500 V (both positive and negative modes); TOF MS scan m/z range: 60-1000 Da, product ion scan m/z range: 25-1000 Da, TOF MS scan accumulation time: 0.20 s/spectrum, product ion scan accumulation time: 0.05 s/spectrum. The second-order mass spectrometry was acquired using information-dependent acquisition (IDA) in high sensitivity mode. The Declustering Potential (DP) was set to ±60 V (both positive and negative modes), and the Collision Energy was 35 ± 15 eV. The IDA settings were as follows: isotopes within 4 Da were excluded, and 10 candidate ions were monitored per cycle.
Data Analysis Workflow
The raw data in Wiff format is converted to .mzXML format using ProteoWizard. Subsequently, XCMS software is utilized for peak alignment, retention time correction, and peak area extraction. The data obtained through XCMS extraction undergoes initial metabolite structure identification and data preprocessing. Following this, the quality of the experimental data is evaluated, and finally, the data analysis is conducted. The data analysis encompasses univariate statistical analysis, multivariate statistical analysis, differential metabolite screening, correlation analysis of differential metabolites, and KEGG pathway analysis.