RNA-seq of bone marrow samples from acute myeloid leukemia patients and healthy individuals
Data files
Dec 10, 2024 version files 16.54 MB
-
AML.txt
4.41 MB
-
Raw_data.zip
12.13 MB
-
README.md
3.89 KB
-
sample.txt
240 B
Abstract
Acute myeloid leukemia (AML) is the most common type of acute leukemia in adults, characterized by the malignant transformation and uncontrolled proliferation of abnormal myeloid hematopoietic progenitor cells in the bone marrow. This study aims to analyze the pathogenesis and prognosis of AML by performing RNA-seq on bone marrow samples from both AML patients and healthy individuals. We collected a total of 20 bone marrow samples, consisting of 10 samples from AML patients and 10 from healthy controls, all of which were stored in liquid nitrogen. The study received ethical approval (Ethical Approval Number: KY2024070), and all participants signed informed consent forms. This dataset includes the RNA-seq data from these samples. By analyzing the gene expression characteristics of myeloid progenitor cells, we attempt to construct a prognostic model to guide clinical treatment. These data not only aid in further exploring the molecular mechanisms of AML but also serve to identify new biomarkers and develop prognostic models, which are valuable for understanding the gene expression differences between AML and healthy bone marrow samples. The sharing of data from this study strictly adheres to ethical guidelines, with all sample collection and usage approved by the ethics committee and consented to by the participants.
README: RNA-seq of bone marrow samples from acute myeloid leukemia patients and healthy individuals
https://doi.org/10.5061/dryad.h9w0vt4t2
Description of the data and file structure
Files and variables
File: AML.txt
Description: RNA-seq expression matrix, rows are sample names, columns are gene names.
Variables
- Gene_Name: gene name
- Patient1: Bone marrow sample from patient 1 with acute myeloid leukemia
- Patient2: Bone marrow sample from patient 2 with acute myeloid leukemia
- Patient3: Bone marrow sample from patient 3 with acute myeloid leukemia
- Patient4: Bone marrow sample from patient 4 with acute myeloid leukemia
- Patient5: Bone marrow sample from patient 5 with acute myeloid leukemia
- Patient6: Bone marrow sample from patient 6 with acute myeloid leukemia
- Patient7: Bone marrow sample from patient 7 with acute myeloid leukemia
- Patient8: Bone marrow sample from patient 8 with acute myeloid leukemia
- Patient9: Bone marrow sample from patient 9 with acute myeloid leukemia
- Patient10: Bone marrow sample from patient 10 with acute myeloid leukemia
- Control1: Bone marrow sample from healthy person 1
- Control2: Bone marrow sample from healthy person 2
- Control3: Bone marrow sample from healthy person 3
- Control4: Bone marrow sample from healthy person 4
- Control5: Bone marrow sample from healthy person 5
- Control6: Bone marrow sample from healthy person 6
- Control7: Bone marrow sample from healthy person 7
- Control8: Bone marrow sample from healthy person 8
- Control9: Bone marrow sample from healthy person 9
- Control10: Bone marrow sample from healthy person 10
File: sample.txt
Description: Sample Description Form
Variables
- P: Bone marrow sample from a patient with acute myeloid leukemia
- C: Bone marrow sample from a healthy person
File: Raw_data.zip
Description: In the Raw data section, we provide two TPM-formatted gene expression matrix files: "gene.TPM.matrix.annot.xls" includes the annotation of probes to gene names, while "gene.tpm.matrix.xls" contains the unannotated raw data.
Variables
- gene_id: Unique identifiers for genes
- gene_name: Name of the gene
- description: Description or functional annotation of the gene
- length: Length of the gene
- Patient1 to Patient10: These columns represent the samples from 10 patients, with each column name followed by the corresponding patient number. These columns contain the expression levels of the gene in each patient sample, measured in TPM (Transcripts Per Million) units.
- Control1 to Control10: These columns represent the samples from 10 normal individuals, with each column name followed by the corresponding normal individual number. They also contain the expression levels of the gene in each sample, in TPM units.
- Control: The average expression level across all normal samples.
- Patient: The average expression level across all patient samples.
- nr: Likely refers to the identifier from the Non-Redundant Protein Sequence Database.
- go: Gene Ontology terms describing the function, process, and location of the gene.
- KO_id: KEGG Orthology ID used for classification in the KEGG database for enzymes and biological pathways.
- KO_name: The name corresponding to the KEGG Orthology.
- paths: The biological pathways in which the gene is involved.
- cog: Likely an abbreviation for Cluster of Orthologous Groups, used for classification and functional annotation.
- cog_description: The description of the cog.
- pfam: Identifier from the Pfam database, a database of protein families.
- uniprot: Protein identifier from the UniProt database, which provides protein sequence and functional information.
- entrez: Entrez Gene ID, a gene identifier in the NCBI (National Center for Biotechnology Information) gene database.
Methods
A total of 20 bone marrow samples were collected for this study, consisting of 10 bone marrow samples from patients with acute myeloid leukemia (AML) (AML group), and 10 bone marrow samples from healthy individuals (control group). All AML patients were diagnosed by bone marrow smear morphology and cytogenetic testing, and individuals in the healthy control group underwent a thorough physical examination to exclude any history of blood disorders and tumors. The samples were stored in liquid nitrogen for further processing. All subjects signed an informed consent form, and the study was approved by the Ethics Committee of the Affiliated Hospital of Southwest Medical University (Ethics Approval No. KY2024070).
Total RNA was extracted from frozen bone marrow samples using TRIzol reagent (Thermo Fisher Scientific, USA) for RNA extraction according to the manufacturer's instructions. The quality of extracted RNA was assessed by Agilent 2100 Bioanalyzer to ensure that the RNA Integrity Index (RIN) was greater than 7.0.RNA concentration was quantified using Qubit 2.0 (Thermo Fisher Scientific) to ensure that sequencing requirements were met.RNA library construction was performed using the Illumina TruSeq RNA Library Prep Kit (Illumina, USA), and library quality control was performed by Qubit and Bioanalyzer. All samples were bipartite sequenced on the Illumina NovaSeq 6000 platform with a read length of 150 bp and a target sequencing depth of 50M reads per sample.