Lung distal epithelium ChiaRed and WT epithelial cells
Data files
Oct 16, 2024 version files 41.87 MB
-
ChiaRed_and_WT_epithelial_cells.zip
41.87 MB
-
README.md
4.45 KB
Abstract
Using a method that enriches for alveolar epithelial cells, we compared distal epithelial cells from wild-type (WT) mice with AMCase reporter (ChiaRed; CR)-expressing epithelial cells from heterozygous CR mice by scRNAseq to further verify the cellular identity of AMCase-expressing cells. In the steady-state, EpCAM+ cells from WT mice were primarily comprised of mature AT2 (93%) and Tm4sf1-expressing alveolar epithelial progenitor cells (AEP; ~6%); AMCase-expressing CR+ cells matched the transcriptional profile of mature AT2s, as expected based on prior studies. In contrast to AT2s, AEPs were comparatively enriched for transcripts marking transitional alveolar epithelial cell states (Krt8, Krt19, Cldn4, Cdkn1a) that expand in pathological settings such as severe SARS-CoV-2 infection and pulmonary fibrosis. In addition, AEPs and AT2s differentially expressed Epcam, Cdh1, and H2-Ab1, encoding cell surface markers EpCAM, E-cadherin, and MHC-II, respectively, which were used to distinguish these populations by flow cytometry.
https://doi.org/10.5061/dryad.47d7wm3pc
Description of the data and file structure
Files and variables
File: Archive.zip
Description:
The zip file has two folders containing processed files using STAR aligner and Cellranger toolkit (10x Genomics):
WT sample (WT-EP_pos)* and *ChiaRed+ sample (CR_pos).
WT sample (WT-EP-pos) refers to epithelial (DAPI lo, CD45-, EpCAM+) cells sorted from lung tissues from naive C57BL/6J mice.
ChiaRed+ sample (CR_pos) refers to naive AMCase expressing epithelial (DAPI lo, CD45-, EpCAM+ CR+) cells sorted from lung tissues from AMCase knockin/knockout reporter mice that has tdTomato and humanized cre recombinase (hCre; iCre) from the mouse Chia1 (AMCase) gene. These mice have endogenous expression of Chia1 knocked out in this allele.
These two datasets are used to compare AMCase expressing epithelial cells (CR_pos) to overall epithelial cells (WT-EP_pos) in the lung to analyze specific role of AMCase expressing cells in the lung.
In each main folder, under the “outs” folder, “filtered_gene_bc_matrices” contain barcodes that were filtered as cells while “raw_gene_bc_matrices” contain barcodes from all valid barcodes from GEMs or Gel Bead-Emulsions.
Each matrices folder contains two files, “genes.tsv” file and “barcodes.tsv” file. “genes.tsv” file contains all annotated genes with each gene listed in each row with gene_id denoted in the first column and the gene name denoted in the second column. “barcodes.tsv” file contains sequences of barcodes used for each unique cell type.
The “analysis” folder contains data from secondary analysis results with “clustering” and “diffexp” folders. Under the “clustering” folder, “graphclust” contains CSV file showing barcodes for the clusters generated by the Cellranger toolkit (10X Genomics). Each “clustering” and “diffexp” folder contains subfolders for the default number of 10 clusters generated by Cellranger. The “clustering” folder contains the barcode sequences in the first column and the corresponding cluster number in the second column. The “diffexp” folder contains a table indicating which features are differentially expressed in each cluster relative to all other clusters. The table contains the gene_id and gene name in the first and second column and mean expression, log2 fold change, and a p-value denoting the significance of this specific gene expression in the cluster relative to cells in the other clusters. P-values within each cluster are adjusted for false discovery rate to account for the number of clusters being tested.
The “pca” folder contains csv files for principal component analysis which is run on the normalized filtered feature-barcode matrix to reduce the number of gene dimensions. Within this folder, the first file is a projection of each cell onto the default 10 principal components. The second file is a components matrix which indicates how much each feature contributed to each principal component. The third file contains Ensembl IDs of the genes with the highest dispersion that were selected for use in the principal component calculations. The fourth file contains the proportion of the total variance explained by each principal component. The last file shows the normalized dispersion of each feature after grouping genes by their mean expression across the dataset.
The “tsne” folder contains CSV file for t-distributed Stochastic Neighbor Embedding (t-SNE) value after running PCA.
“metrics_summary” and “web_summary” provide the total number of cells collected for sequencing and details of sequencing outputs such as mean reads per cell, median genes per cell, total genes detected and frequency of reads mapped confidently to transcriptome.
The “SC_RNA_COUNTER_CS” file contains the “SC_RNA_COUNTER” and “CLOUPE_PREPROCESS” file which contain a pipeline of commands and subfolders with corresponding commands for each stages of pipeline. For detailed information on each file, refer to the 10X website, https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/pipestance-structure#header.
Single-cell RNA sequencing (scRNAseq) was performed with WT (DAPICD45- EpCAM+) or CR+ (DAPICD45- EpCAM+CR+) epithelial cells, sorted into ice-cold 0.5% BSA in PBS and processed through the Chromium Single Cell 3' v2 Library Kit (10X Genomics) per the manufacturer’s protocol. Single-cell libraries from 10,000 cells per sample were sequenced with standard Illumina sequencing primers on an Illumina HiSeq 4000, using paired-end sequencing with single indexing, in which read 1 was 26 cycles and read 2 was 98 cycles. The resulting bcl files were de-multiplexed using bcl2fastq2.1.7v, and the resultant paired-end fastq files were aligned to the mm10 transcriptome (ftp://ftp.ensembl.org/pub/release84/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz and ftp://ftp.ensembl.org/pub/release-84/gtf/mus_musculus/Mus_musculus.GRCm38.84.gtf.gz) using STAR aligner in the Cellranger toolkit (10X Genomics).