A cohort-based study of host gene expression: tumor suppressor and innate immune/inflammatory pathways associated with the HIV reservoir size
Data files
Nov 20, 2023 version files 30.28 MB
-
data_cd4_protein.csv
6.32 KB
-
data_clinical.csv
14.58 KB
-
data_intact_dna.csv
8.08 KB
-
data_plasma_protein.csv
20.76 KB
-
data_rnaseq_count.csv
30.22 MB
-
README.md
4.23 KB
Abstract
The major barrier to an HIV cure is the HIV reservoir: latently-infected cells that persist despite effective antiretroviral therapy (ART). Most prior studies of host genetic predictors of HIV control have focused on “elite controllers,” rare individuals able to control virus in the absence of ART. However, there have been few genetic studies among ART-suppressed non-controllers, who make up the majority of people living with HIV (PLWH). We performed host RNA sequencing and HIV reservoir quantification (total DNA [tDNA], unspliced RNA [usRNA], intact DNA) from peripheral CD4+ T cells from 191 HIV+ ART-suppressed non-controllers. After adjusting for nadir CD4+ count, timing of ART initiation, and genetic ancestry, we identified two host genes for which higher expression was significantly associated with smaller total DNA viral reservoir size, P3H3 and NBL1, both known tumor suppressor genes. We then identified 17 host genes for which lower expression was associated with higher residual transcription (HIV usRNA). These included novel associations with membrane channel (KCNJ2, GJB2), inflammasome (IL1A, CSF3, TNFAIP5, TNFAIP6, TNFAIP9, CXCL3, CXCL10), and innate immunity (TLR7) genes (FDR-adjusted q<0.05). Gene set enrichment analyses further identified significant associations of HIV usRNA with TLR4/microbial translocation (q=0.006), IL-1/NRLP3 inflammasome (q=0.008), and IL-10 (q=0.037) signaling. Protein validation assays using ELISA and multiplex cytokine assays supported these observed inverse host gene correlations, with P3H3, IL-10, and TNF-a protein associations achieving statistical significance (p<0.05). Of note, plasma IL-10 was also significantly inversely associated with HIV DNA (p=0.016). HIV intact DNA was not associated with differential host gene expression, although this may have been due to a large number of undetectable values in our study. Further data are needed to validate these findings, including functional genomic studies, larger cohorts including underrepresented PLWH in research, and those including dedicated assays to measure the replication-competent HIV reservoir.
README: A cohort-based study of host gene expression: tumor suppressor and innate immune/inflammatory pathways associated with the HIV reservoir size
[https://urldefense.com/v3/https://doi.org/10.5061/dryad.k3j9kd5dw*5D(https:/*doi.org/10.5061/dryad.k3j9kd5dw);JS8!!GuAItXPztq0!jcOMW-hmT_tw2Yxv6qPgTDrXn851AWcX3Ij6p16kOr98gDyJo4mSMjd1x9idt5d4YSlalvnxZ423CrRR6AvdaUQh$
These data include host bulk RNA-seq data from peripheral CD4+ T cells from 191 people with HIV-infected adults, virally suppressed on antiretroviral therapy (ART). Cryopreserved PBMCs were enriched for CD4+ T cells (StemCell, Vancouver, Canada), and RNA was extracted from CD4+ T cells using the AllPrep Universal Kit (Qiagen, Hilden, Germany). For validation of intracellularly expressed or membrane-associated encoded proteins, we performed ELISA in a subset of the participants, using matched peripheral CD4-enriched T cells of 5 proteins (Kir2.1, connexin 26, P3H3, NBL1 and TLR4) encoded by KCNJ2, GJB2, P3H3, NBL1 and TLR4 respectively. For validation of the several inflammatory pathway genes, protein validation was performed in a subset of 175 participants, quantifying G-CSF, IP-10, TNFAIP5 (Pentraxin-3), IL-1b, IL-10, TNF-a, and sTLR4 expression using a high-sensitivity multiplex cytokine assay from matched plasma samples (Meso Scale Discovery)and a TLR4 ELISA kit (LSBio).
Description of the data and file structure
Participant data has been split across five .csv
files corresponding to provenance: ELISA, plasma cytokines, intact HIV DNA ddPCR, RNAseq, and clinical covariates. De-identified participant IDs are used throughout to link the data files. The data files are described below.
data_cd4_protein.csv
contains normalized protein expression values from the ELISAs. Each value was calculated as the ratio of the concentration of each marker (ng/ml or pg/ml) divided by the total protein concentration of the CD4+ T cell lysate(mg/ml).uuid
: unique participant identifierGJB2
: normalized protein expression data for GJB2KCNJ2
: normalized protein expression data for KCNJ2NBL1
: normalized protein expression data for NBL1P3H3
: normalized protein expression data for P3H3TLR4
: normalized protein expression data for TLR4
data_clinical.csv
contains clinical data for each patient in our study.uuid
: unique participant identifiergender
: one of "Male", "Female", or "Male to Female Transgender"age
: patient age at baseline visit (years)cd4_nadir_lab
: lab reported nadir CD4 value (cells/mm^3)newvl_preart
: pre-ART viral load (copies/mL)imp_eddi_arv_days
: timing of ART initiation in days (when necessary, missing values are imputed by the median of observed values greater than half a year)dna_copies_million
: HIV viral DNA (copies/million)rna_copies_million
: HIV viral RNA (copies/million)
intact_dna.csv
contains ddPCR measurements of intact HIV DNAuuid
: unique participant identifierintact_dna
: Intact HIV DNA count per 1e+6 T-cells (minimum over two measurements)corrected by the DNA shearing index (DSI)DnaCopies30ul
: Intact HIV DNA count per 30 uL
data_plasma_protein.csv
contains plasma protein concentrations for seven cytokinesuuid
: unique participant identifierIP_10
: IP_10 cytokine concentration (pg/mL), gene = CXCL10G_CSF
: G_CSF cytokine concentration (pg/mL), gene = CSF3Pentraxin_3
: Pentraxin_3 concentration value (pg/mL), gene = PTX3 (TNFAIP5)IL_1beta
: IL_1beta cytokine concentration (fg/mL), gene = IL1BIL_10
: IL_10 cytokine concentration (fg/mL), gene = IL-10TNF_alpha
: TNF_alpha cytokine concentration (fg/mL), gene = TNF-AlphasTLR4
: sTLR4 cytokine concentration (pg/mL), gene = TLR4
data_rnaseq_count.csv
contains the RNAseq count data for each of our patientsuuid
: unique participant identifierENSG00000223972.5
: gene counts corresponding to a specific gene EnsemblENSG00000227232.5
: gene counts corresponding to a specific gene Ensembl...
There are 60720 total gene Ensembl counts measured, each is shown in its own column.
Methods
HIV+ ART-suppressed non-controllers from the UCSF SCOPE and Options HIV+ cohorts were included in the study.
Cryopreserved PBMCs were enriched for CD4+ T cells (StemCell, Vancouver, Canada), and RNA was extracted from CD4+ T cells using the AllPrep Universal Kit (Qiagen, Hilden, Germany) with one aliquot set aside for HIV reservoir quantification and a second aliquot for host RNA sequencing. Host RNA sequencing was analyzed using the HTStream pre-processing pipeline (s4hts.github.io/htstream/) was used for removing PCR duplicates, adapters, N characters, PolyA/T sequences, Phix contaminants, and poor-quality sequences (with quality score <20 with sliding window of 10 base pairs). The quality of raw reads was assessed using FastQC. All samples had a per base quality score and sequence quality score >30. RNA-seq reads were then mapped to the human genome (GRCh38) with a corresponding annotation file from the GENCODE project. Alignment and gene quantification were performed using the STAR alignment tool and its quantification protocol. Gene expression was converted to counts per million (CPM). The mean-variance trend was estimated to assign observational weights based on predicted variance on log2-counts per million (log2-CPM) using the Limma-Voom pipeline.
For validation of intracellularly expressed or membrane-associated encoded proteins, we performed ELISA from peripheral CD4-enriched T cells. CD4+ T cells were isolated from PBMC by negative selection, using the EasySep Human CD4+ T Cell Isolation Kit (StemCell Technologies, Vancouver, BC, Canada), following manufacturer’s guidelines. The Muse Human CD4 T Cell kit (Luminex, Austin, TX) in combination with the Guava Muse Cell Analyzer was used to determine the concentration and percentages of CD4+ T cells after the isolation. To generate cellular lysates, purified CD4+ T cells were subjected to three cycles of freezing/thawing, using a dry ice (frozen CO2) /absolute ethanol mixture and a 37˚C water bath. Complete lysis was verified by trypan blue staining and microscopic analysis. Lysates were spun down at 1,500 g for 10 min at 4˚C (to remove cellular debris) and the supernatants diluted 1:5 with PBS and kept at -80˚C until the time of protein quantification. Total protein concentration of the CD4+ T cell lysates was determined using the Pierce®BCA assay (Thermo Fisher Scientific). The mean value obtained was 0.94 mg/ml (range: 0.66 mg/ml – 1.18 mg/ml). Chemiluminescence or absorbance was read on a SpectraMax® iD5 multi-mode plate reader (Molecular Devices, San Jose, CA) and reported in relative light units (RLU). A standard curve was constructed by plotting the log mean RLU reading for each standard on the y-axis against the log of known concentrations on the x-axis using the SoftMax Pro 7.1 software (Molecular Devices, San Jose, CA). Data were normalized by total protein concentration to accurately reflect the total population of cells (live and dead). Briefly, a 1:5 dilution factor (based on supernatant dilution with PBS at the time of CD4+ T cell isolation) was used to calculate the concentration in the lysate before the dilution. Each protein marker was then quantified using the marker-specific ELISA (again, taking into account the 1:5 dilution performed before cryopreservation of the lysates). Data normalization was performed by dividing the concentration of each protein in the final lysate (e.g., for P3H3 in pg/ml) by the total protein concentration (mg/ml).
For validation of the several inflammatory pathway genes identified in association with HIV usRNA, most of the encoded proteins were secreted proteins, and thus, we performed high-sensitivity multiplex plasma cytokine quantification (Meso Scale Diagnostics). Plasma levels of IP-10 (the encoded protein for CXCL10), G-CSF (GCSF) and pentraxin 3 (TNFAIP5) were quantified using the electrochemiluminescence-based 3-plex mesoscale discovery (MSD) platform (U-Plex mesoscale discovery, Rockville, MA); IL-10 (IL10), IL-1β (IL1B) and TNF-α (TNFA) were measured in a separate 3-plex S-plex Proinflammatory panel kit, and IL‑1α (IL1A) was quantified by a V-plex kit. In all these assays, undiluted samples were run in duplicate following manufacturer’s instructions, and protein concentrations were determined using MSD Discovery Workbench (version 4.0.13) analysis software. The light intensities from the samples were interpolated using a four-parameter logistic fit (FourPL) to a standard curve of electrochemiluminescence generated from eight calibrators of know concentrations. The lower limit of detection for each marker can be found on the manufacturer’s website (MesoScale Diagnostics, https://www.mesoscale.com/~/media/files/handout/assaylist.pdf).