Gene expression analysis of alveolar macrophages tolerized to ozone
Data files
Mar 04, 2024 version files 3.53 MB
-
README.md
3.43 KB
-
TableS3_ReadCountTable_v2.xlsx
10.05 KB
-
TableS4_DifferentialGeneExpressionResults_v2.xlsx
3 MB
-
TableS5_gProfiler_enrichmentanalysis_Down_v2.xlsx
252.48 KB
-
TableS6_gProfiler_enrichmentanalysis_Up_v2.xlsx
261.88 KB
Abstract
In this study, we examined the effect of repeated exposure to the air pollutant ozone (O3) on the transcriptome of alveolar macrophages (AMs). Using flow cytometry, we isolated purified AMs from the lungs of female C57BL6/NJ mice exposed to filtered air or 0.8 ppm O3 for four days and then collected 2 days later, with six mice per group. RNA was isolated from AMs and then subjected to RNA-seq analysis. We performed differential gene expression analysis followed by gene ontology enrichment analysis of differentially expressed genes.
README: Gene expression analysis of alveolar macrophages tolerized to ozone
https://doi.org/10.5061/dryad.pc866t1x9
The datasets here correspond to supplementary tables 3-6 of a manuscript submission to the journal Toxicological Sciences and include RNA-seq alignments, differential gene expression analysis, and pathway analysis of differentially expressed genes.
Description of the data and file structure
- Supplementary Table 3 (TableS3) contains RNA-seq alignment results and contains four columns corresponding to the sample name, the number of total reads, the number of uniquely aligned reads, and the percent of mapped reads.
- Supplementary Table 4 (Table S4) contains complete results of differential expression analysis and contains six columns corresponding to gene name, expression level (baseMean, normalized counts), fold change (log2FoldChange, in log2 units), standard error of the fold change (lfcSE), p-value, and adjusted p-value.
Supplementary Tables 5 (Table S5) and 6 (Table S6) contain pathway enrichment analysis of down- and up-regulated genes, respectively, generated using gProfiler. These two tables each have 12 columns (of gProfiler output) with the following information (taken from the gProfiler website):
- p-value: Hypergeometric p-value after correction for multiple testing.
- term_size: The number of genes that are annotated to the term.
- query_size: The size of the gene list used in enrichment analysis (i.e., number of DEGs)
- intersection_size: The number of DEGs that intersect the given pathway queried
- precision: The proportion of genes in the input list that are annotated to the function
- recall: The proportion of functionally annotated genes that the query recovers
- term_id: The pathway identifier
- source: The pathway source (e.g. GO, KEGG, etc.)
- term_name: The name of the pathway
- effective_domain_size: The total number of genes "in the universe " which is used as one of the four parameters for the hypergeometric probability function of statistical significance
- source_order: The numeric order for the term within its datasource
- intersection: The list of differentially expressed genes that intersect with the given pathway
The gene expression data has also been deposited in NCBI GEO under accession GSE248291.
Code/Software
The results presented in Supplementary Table 3 were generated as follows:
Raw reads were trimmed and filtered of adapter contamination using cutadapt (Martin, 2011), and further filtered such that at least 90% of bases had a quality score of at least 20 using fastx_toolkit v0.0.14. Reads were then aligned to the reference mouse genome (mm10) and GENCODE vM25 transcript annotations using STAR v2.7.7a (Dobin et al., 2013).
The results presented in Supplementary Table 4 were generated using detected using DESeq2 v1.26.0 (Love et al., 2014) in R v3.6.0, using a design that corrected for flow cytometry batch dates. These batch effects were also removed from the VST-normalized expression values using limma v3.42.2 (Ritchie et al., 2015). All log2 fold-changes reported were shrunken using ashr 2.2-47 (Stephens, 2017).
The results presented in Supplementary Tables 5 and 6 were generated using gprofiler2 v0.2.1 (Raudvere et al., 2019), which can be run on a web-based server.
Methods
Alveolar macrophages were isolated from mouse whole lung tissue using flow cytometry. Total RNA was extracted from two batches of flow-sorted AMs using QIAGEN RNAeasy kits per the manufacturer’s instructions. RNA integrity was analyzed using an Agilent Bioanalyzer. RIN values ranged from 8.9-9.7, indicating intact RNA. PolyA+ RNA libraries were prepared with the Roche Kapa mRNA stranded library preparation kit as per the manufacturer's instructions. Paired-end sequencing (50 cycles) was performed on an Illumina NovaSeq SP to a depth of >55M read pairs per sample by the UNC High-Throughput Sequencing Facility.
Raw reads were trimmed and filtered of adapter contamination using cutadapt (Martin, 2011), and further filtered such that at least 90% of bases had a quality score of at least 20 using fastx_toolkit v0.0.14. Reads were then aligned to the reference mouse genome (mm10) (Supplementary Table 3) and GENCODE vM25 transcript annotations using STAR v2.7.7a (Dobin et al., 2013), and transcript abundance was estimated using salmon v1.5.2 (Patro et al., 2017). Differential expression between ozone-exposed vs. filtered air groups was then detected using DESeq2 v1.26.0 (Love et al., 2014) in R v3.6.0, using a design that corrected for flow cytometry batch dates. These batch effects were also removed from the VST-normalized expression values using limma v3.42.2 (Ritchie et al., 2015). All log2 fold-changes reported were shrunken using ashr 2.2-47 (Stephens, 2017). Gene ontology enrichments were then assessed using gprofiler2 v0.2.1 (Raudvere et al., 2019).
Dobin, A. et al. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29, 15–21.
Love, M.I. et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550.
Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17, 10.
Patro, R. et al. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417–419.
Raudvere, U. et al. (2019) g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res., 47, W191–W198.
Ritchie, M.E. et al. (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res., 43, e47.
Stephens, M. (2017) False discovery rates: a new deal. Biostatistics, 18, 275–294.