T-cell acute lymphoblastic leukemia (T-ALL) is a heterogeneous disease characterized with high relapse rate. By single-cell transcriptome analysis, we characterized the bone marrow immune microenvironment in T-ALL patients, identifying 13 major cell clusters. These patients exhibited abnormally expanded HSCs and GMPs, immunosuppressive traits in CD4⁺T, CD8⁺T, and NK cells. Subdividing CD4⁺T cells revealed two subsets transitioning between Th1/Th2, ANXA1^-GATA3^-CD4⁺T and ANXA1⁺GATA3⁺CD4⁺T. Additionally, NK cells demonstrated exhaustion in the tumor microenvironment of relapse T-ALL patients, with JUN identified as a critical factor. Additionally, JUN was also highly expressed in T-ALL and was crucial for maintaining its proliferation. The JUN inhibitor exhibited successful lethality toward leukemia cells and ameliorate NK cell exhaustion in relapse T-ALL cell line, as well as in CDX, PDX, and NOTCH1 mutant mouse models. In summary, our findings enhance the understanding of T-ALL relapsed mechanisms and support the development of innovative immunotherapies for T-ALL relapsed patients.

Description of the data and file structure

This dataset includes single-cell transcriptome sequencing data from 3 normal donors, 3 primary T-ALL patients, and 3 relapsed T-ALL patients. It also includes single-cell transcriptome sequencing technology of T-ALL PDX model mice on days 7, 12, and 21 to simulate the disease states of primary, remission, and relapse. Finally, the dataset also includes RNA-seq data of the JNK-IN-8 treatment group and the control group in the T-ALL NOTCH1 gene mutation mouse model.

Code/software

Data quality control

To transform the original BCL files into FASTQ format, the application 10xGenomics Cell Ranger (version 3.1.0) was employed. Following that, this format was utilized for alignment and count quantification. Prior to mapping to the reference genome, all readings were examined for unique molecular identifiers (UMIs) or low-quality barcodes. Reads were only included in the count if they intersected at least 50% of exons and were uniquely mapped to the transcriptome according to UMIs.

In addition, we applied the following filters: The number of genes identified in a single cell (200.0-4300.0): for a given cell, the number of expressed genes is typically within a specific range. If the value is too high, it may indicate that a GEM contains multiple cell types, in which case these barcodes should be excluded. The total number of UMIs in a single cell (less than 18000.0): the total mRNA content within a single cell is limited. If the total number of UMIs is too high, it may suggest that two or more cells have entered the same GEM, requiring exclusion of such cells. The proportion of mitochondrial gene expression in a single cell (less than 10.0%): cell apoptosis is typically associated with elevated mitochondrial gene expression. High mitochondrial expression suggests poor cell health, indicating that these cells have been adversely affected during the experiment, which can compromise subsequent analysis. Such cells should also be excluded.

** Principal component analysis and UMAP clustering**

The expression matrix of the filtered cells was log-normalized using the "NormalizeData" function in the R Seurat package (v3.0.0). In order to blend patient samples, we utilized the "merge" tool. Following gene discovery using the "FindVariableFeatures" function, the expression matrix was scaled using the "ScaleData" function. By sticking with the defaults, we were able to pick the top 5,000 HVGs. In order to mitigate the effects of batch effects on patients and platforms, we employed the "RunHarmony" function with the following parameters: group.by.vars=c("orig. ident", "plate"), theta=c(2,2), and Harmony v1.0[50]. Utilizing the "RunUMAP" command, we implemented the dimensionality reduction strategy called UMAP. To perform clustering with the initial 30 principal components (PCs), the "FindClusters" function was employed. A combination of markers from SingleR v1.4.1, datasets from CellMarker, and previously published literature[51,52]were used to generate cell type annotations.

Analysis of differentiation trajectory of cells

Single-cell trajectories were analyzed using the cell and gene expression matrices from Monocle (Version 2.10.1). Monocle reduced the dimensionality to two dimensions and ordered the cells (sigma=0.001, lambda=NULL, param.gamma=10, tol=0.001). Once sorted, the trajectories were visualized in the reduced-dimensional space. These trajectories exhibit a tree-like structure with tips and branches. Additionally, Monocle identifies genes that are differentially expressed between groups of cells and evaluates the statistical significance of these changes. Key genes associated with developmental and differentiation processes were identified with an FDR<1e-5, and genes with similar expression patterns were grouped. These gene groups were inferred to share common biological functions and regulators. Single-cell trajectories often include branches, which emerge when cells follow alternative gene expression programs. During development, branches signify fate decisions: one lineage follows one path, while another lineage diverges along a different path. Monocle's BEAM method was used to test branch-dependent gene expression by contrasting two negative binomial GLMs.

Single-cell CNV estimation

Using inferCNV v1.6.0, we were able to identify malignant T-ALL cells that may have co-occurred with HSCs. Very large-scale chromosomal CNVs are identified by inferCNV as cells with unequal mean expression of genes across chromosomes compared to control cells. T-ALL HSC cells constituted the query set, with additional non-malignant B cells derived from T-ALL serving as reference controls.

Analysis of gene expression data

DEG analysis was used to study how T-ALL affects the transcriptional profiles of cell populations, finding activated or inhibited genes and the associated biological processes as the disease advances.

In PCA space, we compared the clusters of T cells in the healthy and sick datasets using Pearson correlation coefficients. This analysis involves representing each cell cluster as the centroid of its constituent cells in PCA space, calculated by averaging their vectors. These centroids, or average vectors, are subsequently used to compute the Pearson correlation coefficient.

Gene functional enrichment analysis

Using gene set enrichment and differential expression analysis, we were able to identify gene expression differences across several cell clusters, allowing us to distinguish between distinct cell subtypes or states. Genes mediate their effects through complex biological interactions, clarifying their roles in specific functions. Pathway-based analysis, using the KEGG database, facilitated the enrichment analysis of metabolic and signal transduction pathways enriched in the differentially expressed genes.

TCR profiling analysis

This study analyzed TCR sequencing data from 3 relapsed T-ALL patients using the Scirpy software tool. TCRs with α and β chains (αβ TCR) or γ and δ chains (γδ TCR) show a high level of variety because of (variable region, diversity region, joining region) recombination. The CDR3 (complementarity-determining region 3) is crucial as it interacts directly with antigens, thus defining T cell antigen specificity.

scRNA-seq reveals an immune microenvironment and JUN-mediated NK cell exhaustion in relapsed T-ALL

Data files

Abstract

Description of the data and file structure

Code/software

scRNA-seq reveals an immune microenvironment and JUN-mediated NK cell exhaustion in relapsed T-ALL

Data files

Abstract

README: scRNA-seq reveals an immune microenvironment and JUN-mediated NK cell exhaustion in relapsed T-ALL

Description of the data and file structure

Code/software