Single cell multiomic analysis identifies key genes differentially expressed in innate lymphoid cells from COVID-19 patients
Data files
Jul 02, 2024 version files 701.89 KB
-
Patient_MetaData.tsv
874 B
-
ReadCount.zip
602.58 KB
-
README.md
4.45 KB
-
README.txt
4.69 KB
-
Sample_MetaData.tsv
1.27 KB
-
Single_cell_location.csv
88.03 KB
Jul 02, 2024 version files 701.89 KB
-
Patient_MetaData.tsv
874 B
-
ReadCount.zip
602.58 KB
-
README.md
4.45 KB
-
Sample_MetaData.tsv
1.27 KB
-
Single_cell_location.csv
88.03 KB
Abstract
Innate lymphoid cells (ILCs) are enriched at mucosal surfaces where they respond rapidly to environmental stimuli and contribute to both tissue inflammation and healing. To gain insight into the role of ILCs in the pathology and recovery from COVID-19 infection, we employed a multi-omic approach consisting of Abseq and targeted mRNA sequencing to respectively probe the surface marker expression, transcriptional profile and heterogeneity of ILCs in peripheral blood of patients with COVID-19 compared with healthy controls. We found that the frequency of ILC1 and ILC2 cells was significantly increased in COVID-19 patients. Moreover, all ILC subsets displayed a significantly higher frequency of CD69-expressing cells, indicating a heightened state of activation. ILC2s from COVID-19 patients had the highest number of significantly differentially expressed (DE) genes. The most notable genes DE in COVID-19 vs healthy participants included a) genes associated with responses to virus infections and b) genes that support ILC self-proliferation, activation and homeostasis. In addition, differential gene regulatory network analysis revealed ILC-specific regulons and their interactions driving the differential gene expression in each ILC. Overall, this study provides mechanistic insights into the characteristics of ILC subsets activated during COVID-19 infection.
BD AbSeq allows simultaneous measurement of protein and RNA expression at the single-cell level, in combination with the BD Rhapsody high-throughput single-cell capture system. Cells were captured using D Rhapsody cell capture beads. Identity of proteins and mRNA was carried out as per protocol (see manuscript materials & method). We manually gated ILC1, ILC2 and ILCp single cells using FlowJo, enumerated the mRNA count in each cell.
More Information about BD rhapsody: https://www.bdbiosciences.com/content/dam/bdb/marketing-documents/BD-AbSeq-BD-Rhapsody-Simultaneous-mRNA-and-Protein-Quantification-DS.pdf
Description of the data and file structure
The processed dataset is available in Zip file (ReadCount.zip). The zip file contains two directories, each containing several CSV files:
1.) Protein_quantification: Quantified cell surface proteins in each manually gated CD127+ ILC single cell (ILC1, ILC2 and ILCp). Each file is specific for one batch per ILC type. Row represent protein marker, column represent single cell-index. The read count represent number of reads for a given protein. “NA” indicates that expression level of quantified protein is below reliable measurement thresholds, and could not be accurately quantified.
2.) mRNA_quantification: Quantified mRNAs in each manually gated CD127+ ILC (ILC1, ILC2 and ILCp). Each file is specific for one batch. Row represent Gene ID, column represent single cell-index. The read count represent number of reads for a given gene. “NA” indicates that expression level of quantified gene is below reliable measurement thresholds, and could not be accurately quantified.
The file structure in Zip file is shown below.
[ReadCount.zip]
ReadCount
├── Protein_quantification
│ ├── DBEC_ILC1_131a.csv
│ ├── DBEC_ILC1_131b.csv
│ ├── DBEC_ILC1_131c.csv
│ ├── DBEC_ILC1_131d.csv
│ ├── DBEC_ILC2_131a.csv
│ ├── DBEC_ILC2_131b.csv
│ ├── DBEC_ILC2_131c.csv
│ ├── DBEC_ILC2_131d.csv
│ ├── DBEC_ILCp_131a.csv
│ ├── DBEC_ILCp_131b.csv
│ ├── DBEC_ILCp_131c.csv
│ └── DBEC_ILCp_131d.csv
└── mRNA_quantification
├── DBEC_ILC1_131a.csv
├── DBEC_ILC1_131b.csv
├── DBEC_ILC1_131c.csv
├── DBEC_ILC1_131d.csv
├── DBEC_ILC2_131a.csv
├── DBEC_ILC2_131b.csv
├── DBEC_ILC2_131c.csv
├── DBEC_ILC2_131d.csv
├── DBEC_ILCp_131a.csv
├── DBEC_ILCp_131b.csv
├── DBEC_ILCp_131c.csv
└── DBEC_ILCp_131d.csv
Here each CSV filename is structured as DBEC_[ILC number]_[Batch ID] :
The meta-dataset is available in three different description files:
1.) Sample_MetaData.csv: Sample Description to identify “Batch ID” and “Sample Tag” for cell identification in each sample. Any missing value is represented as ‘NA’.
2.) Patient_MetaData.csv: Basic demographics of the COVID-19 & Healthy participants used in this study. The metadata also includes “Sample Tag” for cell identification in each sample. Any missing value is represented as ‘NA’.
3.) Single_cell_location.csv: Two column file. Column-1 : Single cell-index ; Column 2 : Sample Tag.
Batch Information
Entire study was carried in 4 batches, viz.- 131a; 131b; 131c and 131d. Samples used under each batch are available in file- Sample_MetaData.csv.
Column header description
-------------------------------------------------------
| Name | Type | Description
| ---------------- | --------------- | -------------------
| Batch | String | Batch Information
| PPID | String | Unique Deindentified Patient ID
| Sample_Tag | String | Sample barcode to identify cell samples in cell pool.
| COVID_status | String | Covid status (Healthy or COVID-19+).
| COVID_severity | String | Covid status (Mild/Moderate)
| Cell_Index | String | Unique Cell identifier.
-------------------------------------------------------
Study participants, blood draws and processing
Participants were recruited as described previously from adults who had a positive SARS-COV-2 RT-PCR test at Stanford Health Care (NCT04373148). Collection of Covid samples occurred between May to December 2020. The cohort used in this study consisted of asymptomatic (n=2), mild (n=17), and moderate (n=3) COVID-19 infections, some of whom developed long term COVID-19 (n=15). The clinical case severities at the time of diagnosis were defined as asymptomatic, moderate or mild according to the guidelines released by NIH. Long term (LT) COVID was defined as symptoms occurring 30 or more days after infection, consistent with CDC guidelines. Some participants in our study continued to have LT COVID symptoms 90 days after diagnosis (n=12). Exclusion criteria for COVID sample study were NIH severity diagnosis of severe or critical at the time of positive covid test. Samples selected for this study were obtained within 76 days of positive PCR COVID-19 test date. Healthy controls were selected who had sample collection before 2020. Informed consent was obtained from all participants. All protocols were approved by the Stanford Administrative Panel on Human Subjects in Medical Research. Peripheral blood was drawn by venipuncture and using validated and published procedures, peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-based density gradient centrifugation, frozen in aliquots and stored in liquid nitrogen at -80°C , until thawing. A summary of participant demographics is presented in Supp. Table 1.
ILC Enrichment, single cell captures for Abseq and targeted mRNAseq
Participant PBMCs were thawed, and each sample stained with Sample Tag (BD #633781) at room temperature for 20 minutes. Samples were combined in healthy control or COVID-19 tubes. Cells were surface stained with a panel of fluorochrome-conjugated antibodies (Supp. Table 2) in buffer (PBS with 0.25% BSA and 1mM EDTA) for 20 minutes at room temperature prior to immunomagnetic negative selection for ILCs. Following ILC enrichment using the EasySep human Pan-ILC enrichment kit (StemCell Technologies #17975), cells from healthy and COVID-19 recovered participants were counted and normalized before combining. ILCs were sorted using a BD FACS Aria at the Stanford FACS facility prior to incubation with AbSeq oligo-linked mAbs (Supp. Table 3). Sorted cells were processed by the Stanford Human Immune Monitoring Center (HIMC) using the BD Rhapsody platform. Library was prepared using the BD Immune Response Targeting Panel (BD Kit #633750) with addition of custom gene panel reagents (Supp. Table 4) and sequenced on Illumina NovaSeq 6000 at Stanford Genomics Sequencing Center (SGSC). ILCs were identified as Lineageneg (CD3neg, CD14neg, CD34neg, CD19neg), NKG2Aneg, CD45+ and ILCs further defined as CD127+CD161+ and as subsets: ILC1 (CD117negCRTH2neg), ILC2 (CRTH2+) and ILCp (CD117+CRTH2neg) (Supp. Fig. 1).
Computational data analysis
The above multi-modal setup allowed paired measurements of cellular transcriptome and cell surface protein abundance. The ILC1, ILC2 and ILCp cells were manually gated based on the abundance profile of CD127, CD117, CD161 and CRTH2 (Supp. Fig. 1). Before the integrative analysis, the complete multi-modal single cell dataset containing ILC subsets was converted into single Seurat object. All the subsequent protein-level and gene-level analyses were performed using multimodal data analysis pipeline of Seurat R package version 4.0. The normalized and scaled protein abundance profile was used for estimating the integrated harmony dimensions using runHarmony function in Seurat R package (reduction= ‘apca’ and group.by.vars = ‘batch’) . The batch corrected harmony embeddings were then used for computing the Uniform Manifold Approximation and Projection (UMAP) dimensions to visualize the clusters of ILC subsets. Differential marker analysis of surface proteins, between two groups of cells (COVID-19 and Healthy cohort), from abseq panels was computed with normalized and scaled expression values using FindMarkers function from Seurat R package (test.use=’wilcox’). Similarly, differential gene expression was performed on normalized and scaled gene expression values from between two groups of cells (COVID-19 and Healthy cohort) using the FindMarkers function from Seurat R package (test.use=’MAST’ and latent.vars=’batch’). Genes with log-fold change > 0.5 and adjusted p-value < 0.05 (method: Benjamini-Hochberg) (were considered as significant for further evaluation. The resulting adjusted p-values box-plots were plotted using ggplot2 R package (version 3.4.2) after computing the number of cells expressing a given protein or gene in each sample. Pathway enrichment analysis of DE genes was performed using web-server metascape (version 3.5). The AUCells score and gene regulatory network analysis was performed using pySCENIC pipeline (version 0.12.1). Gene regulatory network was reconstructed using GRNBoost2 algorithm and the list of TFs in humans (genome version: hg38) were obtained from cisTarget database. (https://resources.aertslab.org/cistarget). Cellular enrichment (aka AUCell) analysis that measures the activity of TF or gene signatures across all single cells was performed using aucell function in pySCENIC python library. The ggplot2 R package (version 3.4.2) was used for boxplot visualization. The differential gene co-expression analysis was performed using scSFMnet R package. Circular plots were generated using the R package circlize (version 0.4.15).