Human memory CD4+ T cells recognize Mycobacterium tuberculosis-infected macrophages amid broader pathogen-specific responses

Carpenter, Stephen 1 ; Schauner, Robert 1 ; Gail, Daniel1

Published Sep 17, 2025 on Dryad. https://doi.org/10.5061/dryad.vmcvdnd53

Data files

Sep 17, 2025 version files 1.16 GB

bulk_tcr.tar.gz

55.81 MB
files.md5

226 B
gene_expression.tar.gz

1.05 GB
GLIPH2_output_cleveland_cohort.csv

2.55 MB
GLIPH2_output_combined_cohorts.csv

4.97 MB
pbmc_stimulation.tar.gz

29.62 MB
README.md

16.41 KB
t_cell_stimulations.tar.gz

17.23 MB

Abstract

CD4+ T cell-mediated control of tuberculosis (TB) requires recognition of macrophages infected with Mycobacterium tuberculosis (Mtb). Yet not all Mtb-specific T cells recognize infected macrophages. Using infected monocyte-derived macrophages (MDMs) and autologous memory CD4+ T cells from individuals with latent Mycobacterium tuberculosis (Mtb) infection (LTBI), we isolate and quantify CD4+ T cells activated in response to infected macrophages. We use T cell antigen receptor (TCR) sequencing to determine the total and unique Mtb-specific TCR clonotypes linked to recognition of infected macrophages, and find that a subset required exogenous antigen exposure, suggesting incomplete recognition. Clonotypes specific for multiple Mtb antigens and other pathogens were also identified. We use bulk TCRb deep sequencing from total CD4+ T cells isolated from 10x10⁶ PBMCs from each participant to determine the natural circulating frequencies of relevant TCR clonotypes. We used single-cell RNA sequencing (scRNAseq) to examine the effector functions of CD4+ T cells in response to infected macrophages. Mtb-specific clonotypes expressed signature effector functions dominated by IFNg, TNF, IL-2, and GM-CSF or chemokine production and signaling. We propose TB vaccines that elicit T cells specific for T7SS substrates, recognize infected macrophages, and express canonical effector functions will offer protection against TB.

Dataset DOI: 10.5061/dryad.vmcvdnd53

Description of the data and file structure

Sequencing of mRNA from single T cells was performed using the 10x Genomics Chromium Next GEM Single Cell 5’ V2 platform (10X Genomics, Pleasanton, CA, USA). Activated CD4+ were resuspended in 0.04% BSA in PBS and loaded onto the 10X Chromium Controller. RNA was reverse-transcribed, and cDNA was isolated and PCR-amplified according to the 10X Genomics NextGEM 5’ V2 scRNAseq User Guide (Rev D). Quality control and quantification of cDNA and final libraries was performed on either the Agilent fragment analyzer or Agilent 2100 Bioanalyzer (Agilent technologies, Santa Clara, CA, USA). V(D)J amplification and library preparation, and gene expression library construction were performed using 10X Genomics kits according to the User Guide. Paired-end sequencing was performed by the Cleveland Clinic Lerner Research Institute Genomics Core (Cleveland, OH, USA) using an Illumina NovaSeq 6000 sequencer (Illumina, San Diego, CA, USA) according to the 10X Genomics User Guide. Demultiplexed FASTQ files were mapped to the human genome and TCR sequences, simultaneously, using Cell Ranger Multi v7.1 (10X Genomics) on the 10X Cloud server and the GRCh38.p14 reference genome.

Files and variables

Condition	Description
Infected	Infection with Mtb
Lysate	Infection with Mtb and treated with Mtb Lysate
MTB300	Infection with Mtb and treated with 300 Mtb peptides
ControlPeptide	Infection with Mtb and treated with viral peptides

Cell Ranger Outputs

FastQ files above have been processed with Cell Ranger. The MarketMatrix directories correspond with the filtered feature-barcode matrix output of Cell Ranger (ending in -filtered_feature_bc_matrix.tar.gz). The CSV files correspond to the TCR sequencing filtered_contig_annotations.csv output from Cell Ranger. The TSV files are the AIRR formatted (-airr_rearrangement.tsv) TCR sequencing outputs. More information on the AIRR format here. Each filename has the structure (Donor ID)-(Condition)-(file type). These can be found in gene_expression.tar.gz (54 files), t_cell_stimulations.tar.gz (30 files) and pbmc_stimulation.tar.gz (32 files). Cells containing "-" indicate no reads were sequenced.

AIRR formatted files

cell_id: Cell barcode defining the cell for the query sequence.
clone_id: Clonotype ID/clonotype assignment.
sequence_id: The name of the contig associated with the rearrangement.
sequence: The nucleotide sequence of the rearrangement.
sequence_aa: The amino acid sequence of the rearrangement.
productive: Whether or not the rearrangement is productive.
rev_comp: Set to false by default (10x Genomics V(D)J sequences are not reverse complemented).
v_call: The name of the aligned V gene for the rearrangement.
v_cigar: The CIGAR string of the V gene alignment.
d_call: The name of the aligned D gene for the rearrangement.
d_cigar: The CIGAR string of the D gene alignment.
j_call: The name of the aligned J gene for the rearrangement.
j_cigar: The CIGAR string of the J gene alignment.
c_call: The name of the aligned C gene for the rearrangement.
c_cigar: The CIGAR string of the C gene alignment.
sequence_alignment: The aligned sequence of the VDJ rearrangement.
germline_alignment: The assembled, aligned, full-length inferred germline sequence of the aligned sequence.
junction: The nucleotide sequence of the rearrangement's junction (CDR3).
junction_aa: The amino acid sequence of the rearrangement's junction (CDR3).
junction_length: The length of the rearrangement's junction nucleotide sequence.
junction_aa_length: The length of the rearrangement's junction amino acid sequence.
v_sequence_start: 1-based index on the contig of the V region start position.
v_sequence_end: 1-based index on the contig of the V region end position.
d_sequence_start: 1-based index on the contig of the D region start position.
d_sequence_end: 1-based index on the contig of the D region end position.
j_sequence_start: 1-based index on the contig of the J region start position.
j_sequence_end: 1-based index on the contig of the J region end position.
c_sequence_start: 1-based index on the contig of the C region start position.
c_sequence_end: 1-based index on the contig of the C region end position.
consensus_count: The number of reads associated with this rearrangement.
duplicate_count: The number of unique molecular identifiers associated with this rearrangement.
is_cell: Is this rearrangement cell-associated?

Cell Ranger VDJ formatted files

barcode: Cell barcode for this contig.
is_cell: True or False value indicating whether the barcode was called as a cell.
contig_id: Unique identifier for this contig.
high_confidence: True or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or other artifact).
length: The contig sequence length in nucleotides.
chain: The chain associated with this contig: TRA, TRB, IGK, IGL, or IGH.
v_gene: The highest-scoring V segment, e.g., TRAV1-1.
d_gene: The highest-scoring D segment, e.g., TRBD1.
j_gene: The highest-scoring J segment, e.g., TRAJ1-1.
c_gene: The highest-scoring C segment, e.g., TRBC2.
full_length: True or False value indicating if the contig was declared as full-length.
productive: True or False value indicating if the contig was declared as productive.
fwr1: The predicted FWR1 amino acid sequence.
fwr1_nt: The predicted FWR1 nucleotide sequence.
cdr1: The predicted CDR1 amino acid sequence.
cdr1_nt: The predicted CDR1 nucleotide sequence.
fwr2: The predicted FWR2 amino acid sequence.
fwr2_nt: The predicted FWR2 nucleotide sequence.
cdr2: The predicted CDR2 amino acid sequence.
cdr2_nt: The predicted CDR2 nucleotide sequence.
fwr3: The predicted FWR3 amino acid sequence.
fwr3_nt: The predicted FWR3 nucleotide sequence.
cdr3: The predicted CDR3 amino acid sequence.
cdr3_nt: The predicted CDR3 nucleotide sequence.
fwr4: The predicted FWR4 amino acid sequence.
fwr4_nt: The predicted FWR4 nucleotide sequence.
reads: The number of reads aligned to this contig.
umis: The number of distinct UMIs aligned to this contig.
raw_clonotype_id: The clonotype ID assigned to this cell barcode.
raw_consensus_id: The ID of the consensus sequence to which this contig was assigned.
exact_subclonotype_id: The ID of the exact subclontype to which this cell barcode was assigned.

Cell Ranger Market Matrix direcectories

barcodes.tsv.gz: Column indicies.
features.tsv.gz: Row indicies. Column 1: Feature Name, Column 2: Feature ID, Column 3: Feature Type (Gene Expression)
matrix.mtx.gz: A sparse matrix representation of the count matrix.

Bulk TCR Sequencing

bulk_tcr.tar.gz contains the background TCR sequencing data for 9 samples (9 files). Each filename has the structure (Donor ID)_CD4-Bulk_TCRB.tsv. The sequencing and analysis was done by Adaptive Biotechnologies. Blank cells indicate there was no available information or sequencing did not capture sufficient reads to generate results. Each file contains the following columns:

nucleotide: The nucleotide sequence of the rearrangement.
aminoAcid: The amino acid sequence of the rearrangement.
count (templates/reads): Number of reads or templates in the sequencing data
frequencyCount (%): The sequence frequency
cdr3Length: CDR3β sequence length
vMaxResolved: Vβ gene and allele used
vFamilyName: Vβ family used
vGeneName: Vβ gene used
vGeneAllele: Vβ allele used
vFamilyTies: Equivalently-scored V families identified during annotation
vGeneNameTies: Equivalently-scored V genes identified during annotation
vGeneAlleleTies: Equivalently-scored V alleles identified during annotation
dMaxResolved: D gene and allele used
dFamilyName: D family used
dGeneName: D gene used
dGeneAllele: D allele used
dFamilyTies: Equivalently-scored D families identified during annotation
dGeneNameTies: Equivalently-scored D genes identified during annotation
dGeneAlleleTies: Equivalently-scored D alleles identified during annotation
jMaxResolved: Jβ gene and allele used
jFamilyName: Jβ family used
jGeneName: Jβ gene used
jGeneAllele: Jβ allele used
jFamilyTies: Equivalently-scored J families identified during annotation
jGeneNameTies: Equivalently-scored J genes identified during annotation
jGeneAlleleTies: Equivalently-scored J alleles identified during annotation
vDeletion: Number of nucleotides deleted from the V gene during recombination
n1Insertion: Number of nucleotides inserted in the N1 (VD) junction during recombination.
d5Deletion: Number of nucleotides deleted from the 5' end of the D gene during recombination
d3Deletion: Number of nucleotides deleted from the 3' end of the D gene during recombination
n2Insertion: Number of nucleotides inserted in the N2 (DJ) junction during recombination.
jDeletion: Number of nucleotides deleted from the J gene during recombination
vIndex: The index within the full nucleotide sequence that denotes the Cysteine beginning the CDR3
n1Index: The index within the full nucleotide sequence that denotes the start of the N1 (VD) region
dIndex: The index within the full nucleotide sequence that denotes the start of the D region
n2Index: The index within the full nucleotide sequence that denotes the start of the N2 (DJ) region
jIndex: index within the full nucleotide sequence that denotes the start of the J region
estimatedNumberGenomes: Estimated number of T/B cell genomes present in the sample
sequenceStatus: The functional state of a rearrangement: in-frame (productive), out-of-frame, or containing a stop codon.
cloneResolved: Which portions of the clone are resolved

MD5sum

files.md5 contain MD5sums for each file included. These can be validated with md5sum -c files.md5.

GLIPH2 results

GLIPH2_output_cleveland_cohort.csv contains all the results from only the Cleveland cohort. GLIPH2_output_combined_cohorts.csv contains all the results from both the Cleveland and South African cohorts. Cells containing "-" indicate no available HLA typing information for the indicated allele. Both contain the following columns:

index: numeric index for cluster
pattern: TCR motif for cluster
Fisher_score: Fisher score comparing the pattern to a background
number_subject: The number of subjects in the cluster
number_unique_cdr3: The number of unique CDR3 sequences in the cluster
final_score: A combination of HLA, Vβ, expansion, length, and cluster size scores as a probability conflation
hla_score: The score quantifying the total HLA associations for the cluster.
vb_score: Probability a random selection of TCR sequences of the same size has a higher Simpson diversity index for Vβ genes
expansion_score: Probability that a random set of equal size from the same data set would have at least this same number of expanded members
length_score: Probability a random selection of TCR sequences of the same size has a higher Simpson diversity index for CDR3β lengths
cluster_size_score: Probability of a cluster this size appearing in a random selection.
type: global pattern contains '%', which indicates position allowing variants; local pattern starts with 'motif-'; and singlet likes global pattern without '%' symbol
ulTcRb: amino acids encoded by codon overlapping with N-addition nucleotides are labeled as lower case, otherwise, it is labeled as upper cases
TcRb: The TCR CDR3β sequence
V: The Vβ gene used
J: The Jβ gene used
TcRa: The TCR CDR3α sequence
Sample: The (Donor ID):(condition)
Freq: The number of cells (single-cell) or reads (bulk) this TCR appears in.
HLA-(X): HLA alleles found to associate with the cluster. Significant associates are denoted with a "!"

Code/software

Cell Ranger is needed to process the FastQ files into single-cell count matrices. To view the MarketMatrix outputs, using various R packages (Seurat, SingleCellExperiment) or Python modules (scanpy) are appropriate. The CSV outputs can be used in R (via scRepertoire) or Python (via scirpy).

The original code reported in this paper is available through the GitHub: carpenter-lab/StetsenkoGail2025.

Single-cell data analysis

Filtered_contig_annotations.csv and clonotypes.csv files from Cell Ranger containing productive, full-length TCR sequences for each sample were used for TCR clonotype analysis. Filtered feature barcode matrices (mRNA data) and contig annotations (CDR3a and b sequences) were linked to individual cell barcodes. Gene expression and TCR annotations were generated for each cell for simultaneous TCR and transcriptomics analysis after quality control and integration were performed using the R-based Seurat package (v 5.1.0) (Satija et al., 2015) with seed 123. High-quality cells and features were selected based on the following parameters: transcript detection in at least 3 cells, 200<nFeature<4,000, and percentage of mitochondrial RNA is less than 20% per cell. TCR information was added to Seurat objects using the scRepertoire package (Borcherding et al., 2020). TCR clonotypes with identical CDR3α/β sequences, present in more than 2 cells (based on unique barcodes) were considered “expanded”. Clonotypes with only an individual CDR3α or CDR3β chain were classified as “non-expanded”. After merging the samples, SCTransform (Hafemeister and Satija, 2019) was used for normalization, the percent mitochondrial and ribosomal counts, and cell-cycle gene scores were regressed out, and TCR genes were removed from the variable feature list using the quietTCRgenes function from the Trex package (Borcherding et al., 2024). Principal component analysis (RunPCA function) was followed by data integration and batch correction with the Harmony algorithm (Korsunsky et al., 2019). A UMAP (via RunUMAP) was calculated using the first 30 dimensions of the Harmony embedding with default parameters except the number of neighbors was set to 50 in addition to the FindNeighbors default parameters. Clustering was done on this shared-nearest neighbor graph using the FindClusters default parameters except the resolution was set to 0.3 using the Louvain algorithm with multilevel refinement. From 19 clusters obtained (from 0 to 18), clusters 17 (27 cells) and 18 (25 cells) were removed due to low cell numbers, and cluster 16 (830 cells) due to monocytic contamination. A total of 157,462 cells split into 16 clusters were left for the downstream analysis.

Human subjects data

Informed consent was obtained from all participants and states that any published data from the study would be de-identified. For each participant's data, including for TCR and single-cell RNA sequencing samples, we assigned a non-identifying subject ID code. The subject ID code contains a combination of letters, based on whether the participant was enrolled in the study or whether an anonymized PBMC sample was purchased, and numbers, based on consecutive enrollment or consecutive sample analysis. Sample IDs are not related to any data or personal information for any participant.

Bulk TCRb deep sequencing

Unstimulated CD4+ T cells were isolated by immunomagnetic positive selection from 10x10⁶ PBMCs from each participant using human anti-CD4 microbeads (Miltenyi Biotec). Genomic DNA was isolated using the QiaAMP DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Ultra-deep level TCRβ sequencing was performed by Adaptive Biotechnologies (Seattle, WA, USA). Data was downloaded from the ImmunoSeq Analyzer (Adaptive) and used to estimate the natural circulating frequencies of TCRs. TCRb sequences were identified in the scTCRseq analysis of CD4+ T cell responses to Mtb-infected macrophages, cross-referenced to CDR3b sequences in bulk TCRb deep sequencing data from PBMCs, which were enumerated using Microsoft Excel and graphed in GraphPad Prism.

Flow sorting of activated T cells for scRNAseq

Non-adherent cells were harvested from the T cell-infected MDM co-culture, and immunostaining was performed with monoclonal antibodies used to identify CD4+ T cells that co-expressed activation markers. After 20 min at 4 ^°C, cells were passed through a 40μm filter top tubes (Corning). Approximately 2 min before sorting each sample, 7-AAD viability dye (Invitrogen) was added to each sample. Samples were sorted using the “purity” setting, gating on 7-AADlo CD4+ singles cells, and co-expression of CD69 and either CD40L or CD25. Sorting was performed on a Sony MA900 Multi-Application Cell Sorter (Sony Biotechnology, San Jose, CA, USA). Fluorescent antibodies used in flow sorting experiments include FITC anti-human CD4 (clone RPA-T4), PE anti-CD25 (clone BC96), PE-Dazzle-594 anti-CD40L (clone 24-31), BV605 anti-CD8a (clone RPA-T6), and APC anti-CD69 (clone FN50) from Biolegend.

Exogenous antigen stimulation of CD4+T cells

Exogenous antigens were added to some samples of Mtb-infected MDMs for co-culture with autologous CD4+T cells. H37Rv whole cell lysate (BEI Resources, NR-14822) was added to Mtb-infected MDMs at a final concentration of 10 mg/mL for 24h. Lysate was washed out with cRPMI before the addition of CD4+ T cells. For evaluation of T cell responses to MTB300 peptides, MTB300 was diluted to 1 μg/mL in cRPMI, then added to Mtb-infected MDMs ~ 10 minutes before adding CD4+ T cells in co-culture. For each participant, PBMCs were also separately stimulated with either MTB300 or viral and vaccine “control” peptide megapools to identify TCR clonotypes linked to antigen-specific T cell responses. For Mtb peptide stims, 1μg/mL of the MTB300 peptide megapool was added to 10x10⁶ PBMCs in cRPMI. For “control” peptide stimulations, another 10x10⁶ PBMCs were exposed to a final concentration of 1μg/mL each of Cytomegalovirus (CMV), Epstein-Barr virus (EBV), Pertussis (P), and Tetanus toxoid (TT) peptide megapools, and 1ug/mL of the SARS-CoV-2 USA-WA1/2020 Coronavirus Spike (S) protein overlapping peptides (BEI resources NR-52402). Peptides were added to PBMCs in cRPMI together with anti-CD28/CD49d costimulatory mAbs (BD Biosciences) and anti-CD40 blocking mAb (BioLegend) for 16- 18 hours. Following stimulation, CD4-positive immunomagnetic selection was performed according to the manufacturer’s instructions using human anti-CD4 microbeads (Miltenyi Biotec), followed by flow sorting of the activated CD4+ T cells for downstream scTCRseq.

Sample processing for scRNAseq

Sequencing of mRNA from single T cells was performed using the 10x Genomics Chromium Next GEM Single Cell 5’ V2 platform (10X Genomics, Pleasanton, CA, USA). Activated CD4+ were resuspended in 0.04% BSA in PBS and loaded onto the 10X Chromium Controller. RNA was reverse-transcribed, and cDNA was isolated and PCR-amplified according to the 10X Genomics NextGEM 5’ V2 scRNAseq User Guide (Rev D). Quality control and quantification of cDNA and final libraries were performed on either the Agilent fragment analyzer or Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). V(D)J amplification and library preparation, and gene expression library construction were performed using 10X Genomics kits according to the User Guide. Paired-end sequencing was performed by the Cleveland Clinic Lerner Research Institute Genomics Core (Cleveland, OH, USA) using an Illumina NovaSeq 6000 sequencer (Illumina, San Diego, CA, USA) according to the 10X Genomics User Guide. Demultiplexed FASTQ files were mapped to the human genome and TCR sequences simultaneously, using Cell Ranger Multi V7.1 (10X Genomics) on the 10X Cloud server and the GRCh38.p14 reference genome.

Single-cell data analysis

Filtered_contig_annotations.csv and clonotypes.csv files from Cell Ranger containing productive, full-length TCR sequences for each sample were used for TCR clonotype analysis. Filtered feature barcode matrices (mRNA data) and contig annotations (CDR3a and b sequences) were linked to individual cell barcodes. Gene expression and TCR annotations were generated for each cell for simultaneous TCR and transcriptomics analysis after quality control and integration were performed using the R-based Seurat package (v 5.1.0) with seed 123. High-quality cells and features were selected based on the following parameters: transcript detection in at least 3 cells, 200<nFeature<4,000, and percentage of mitochondrial RNA is less than 20% per cell. TCR information was added to Seurat objects using the scRepertoire package. TCR clonotypes with identical CDR3α/β sequences, present in more than 2 cells (based on unique barcodes), were considered “expanded”. Clonotypes with only an individual CDR3α or CDR3β chain were classified as “non-expanded”. After merging the samples, SCTransform was used for normalization, the percent mitochondrial and ribosomal counts, and cell-cycle gene scores were regressed out, and TCR genes were removed from the variable feature list using the quietTCRgenes function from the Trex package. Principal component analysis (RunPCA function) was followed by data integration and batch correction with the Harmony algorithm. A UMAP (via RunUMAP) was calculated using the first 30 dimensions of the Harmony embedding with default parameters, except that the number of neighbors was set to 50 in addition to the FindNeighbors default parameters. Clustering was done on this shared-nearest neighbor graph using the FindClusters default parameters, except the resolution was set to 0.3 using the Louvain algorithm with multilevel refinement. From 19 clusters obtained (from 0 to 18), clusters 17 (27 cells) and 18 (25 cells) were removed due to low cell numbers, and cluster 16 (830 cells) was removed due to monocytic contamination. A total of 157,462 cells, split into 16 clusters, were left for the downstream analysis.