Whole blood RNA-seq demonstrates an increased host immune response in individuals with cystic fibrosis who develop nontuberculous mycobacterial pulmonary disease
Data files
Apr 25, 2023 version files 231.28 MB
-
cibersort_impute_absolute.csv
13.64 KB
-
cibersort_impute_percentage.csv
13.30 KB
-
clinical_rnaseq.csv
4.74 KB
-
dictionary_datasets.xlsx
21.33 KB
-
ntm_analysis_dryad.csv
11.72 KB
-
ntm_cbc_dryad.csv
3.66 KB
-
README.md
8.43 KB
-
rsem_counts.gz
231.20 MB
Abstract
Background
Individuals with cystic fibrosis have an elevated lifetime risk of colonization, infection, and disease caused by nontuberculous mycobacteria. A prior study involving non-cystic fibrosis individuals reported a gene expression signature associated with susceptibility to nontuberculous mycobacteria pulmonary disease (NTM-PD). In this study, we determined whether people living with cystic fibrosis who progress to NTM-PD have a gene expression pattern similar to the one seen in the non-cystic fibrosis population.
Methods
We evaluated whole blood transcriptomics using bulk RNA-seq in a cohort of cystic fibrosis patients with samples collected closest in timing to the first isolation of nontuberculous mycobacteria. The study population included patients who did (n = 12) and did not (n = 30) develop NTM-PD following the first mycobacterial growth. Progression to NTM-PD was defined by a consensus of two expert clinicians based on reviewing clinical, microbiological, and radiological information. Differential gene expression was determined by DESeq2.
Results
No differences in demographics or composition of white blood cell populations between groups were identified at baseline. Out of 213 genes associated with NTM-PD in the non-CF population, only two were significantly different in our cystic fibrosis NTM-PD cohort. Gene set enrichment analysis of the differential expression results showed that CF individuals who developed NTM-PD had higher expression levels of genes involved in the interferon (α and γ), tumor necrosis factor, and IL6-STAT3-JAK pathways.
Conclusion
In contrast to the non-cystic fibrosis population, the gene expression signature of patients with cystic fibrosis who develop NTM-PD is characterized by increased innate immune responses.
Study population and clinical data
This study is a secondary data analysis using blood samples and data from the “CF Biomarker” cohort approved by the University of British Columbia-Providence Health Care ethics review board (H12-00835). The local ethics board also reviewed and approved the secondary analysis (H20-00117). Patients in the parent cohort were recruited following informed consent at the St. Paul’s Hospital Adult CF Clinic (Vancouver, Canada) between January 2012 and December 2019. In the current analysis, we included participants who consented to the future use of their samples and data, had at least one positive respiratory culture for NTM, and had a whole blood RNA sample available (PAXgene® stored at -70°C). Lung transplant recipients and subjects without a definite diagnosis of CF were excluded. We preferentially selected blood samples taken during clinically stable periods and closest to the first positive growth of NTM. We did not limit blood sampling to within a specified time frame in relation to NTM infection or the development of NTM-PD as we expected an intrinsic downregulation of gene expression in immune‑related pathways in patients with CF who develop NTM-PD.
Clinical and demographic data analysis
Demographic characteristics, anthropometric measurements, pulmonary function tests, CFTR genotyping results, and microbiology laboratory results were extracted from the clinical charts using a pre-defined case report form. Clinical data were extracted from baseline positive NTM culture until December 2019; participants were censored at the time of NTM-PD diagnosis, lung transplant, or death. The diagnosis of NTM-PD was defined independently by two expert CF clinicians, based on current CF NTM guidelines, and disagreements were resolved by consensus. Lung function measurements (forced expiratory volume in 1 second [FEV1]) were standardized according to the 2012 Global Lung Function Initiative equation.
RNA extraction and RNA sequencing experiments
The PAXgene Blood micro-RNA Kit was used for RNA extraction (in five batches) following the manufacturer’s instructions (QIAGEN) and omitting the DNA depletion steps. The quality of the extracted RNA was evaluated using the 260/280nm ratio in a NanoDrop™ spectrophotometer; the mean concentration was 117.4 ± 73.2 (SD) ng/uL. The RNA Integrity Number (RIN) was evaluated using a 2100 Bioanalyzer instrument from Agilent. A RIN ≥ 7 was required for sequencing; two samples failed bioanalyzer quality control and were re-extracted. Poly-A RNA-seq libraries were prepared in a single batch with a starting concentration of 250 ng per sample. The library preparation was performed using the strand-specific Nextera NEB mRNA kit with adapters AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC and AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT. The NovaSeq 6000 S4 PE100 (Illumina) platform with automatic base calling (RTA3) and an initial concentration of 200 pM per library was used for sequencing. Samples were multiplexed in a single flow cell after dual barcoding. The sequencing run generated 150 bp paired-end reads, the lowest mean Phred+33 score per file was 36/40. The median number of reads per sample was 65 x 106 (range: 36–151 x 106). All sequencing data from this study can be found in NCBI Gene Expression Omnibus with accession GSE205161.
RNA-seq data analysis
Alignment and quantification of raw reads (FASTQ format) were performed in a High-Performance computational cluster (CentOS 7 Linux). Fast-QC was used to evaluate the quality of raw reads before and after alignment. Untrimmed reads were aligned to the primary human assembly GRCh38.p13 v38 (May 21, 2021) from GENCODE using STAR v 2.7. We produced gene‑level quantification (bam format) using RSEM v1.3.3 with default parameters. Quality control of alignment was performed using Picard tools, and quality reports were summarized using MultiQC. The raw count data was imported to R version 4.1.1 using tximport and annotated with the Ensembl database version 104 of Bioconductor. Globin subunits and genes showing abnormally large expression counts (> 7*107) were filtered out before analysis; before filtering, 68–93% of reads per sample were overrepresented. An outlier sample was detected in exploratory principal component analysis (CFB2006) and removed from further analysis. Count data were corrected for extraction batches using the ComBat-seq algorithm. Differential gene expression of the group with vs. without NTM-PD was performed using DESeq2 (v 1.3.2) with a cut-off FDR of 0.3. No minimum fold-change threshold was defined to increase the recovery of differentially expressed genes in this small exploratory cohort. Finally, gene set enrichment analysis of the molecular signatures database (human hallmark pathways) was performed with the fgsea package using the fold change ranked results from DESeq2. The cell-specific population compositions were estimated and explored using gene expression deconvolution in cibersortX against the complete blood cell counts (CBC) measured in patients at the time of sample procurement. The deconvolution process used the LM22 (22 blood immune cell types) signature matrix, bulk mode batch correction, and a hundred iterations.
The data sets are provided as comma-separated values and can be opened with standard statistical software or explored with a spreadsheet program. In our analyses, we employed R and the GUI R studio (v 4.1.1) for analysis. Raw sequencing processing and transcript counting was done in a CentOS high performance cluster, dependencies and commands ran are described inside markdown scripts.