Spatial transcriptomic analysis of human dorsoal root ganglia neurons

Published Oct 25, 2024 on Dryad. https://doi.org/10.5061/dryad.gf1vhhmxq

Data files

Oct 25, 2024 version files 12.41 GB

output-XETG00171__0018220__Region_1__20240207__003742.rar

4.87 GB
output-XETG00171__0018220__Region_2__20240207__003742.rar

1.97 GB
output-XETG00171__0018220__Region_3__20240207__003742.rar

2.68 GB
output-XETG00171__0018220__Region_4__20240207__003742.rar

2.89 GB
README.md

9.32 KB

Abstract

Xenium platform was used for the spatial transcriptomic analysis of human DRG neurons, 100 marker genes were selected as the customized probe panel and hybridized to fresh frozen hDRG sections. Manual segmentation of each neuron soma was performed, based on expressions of pan-neuronal marker gene PGP9.5, satellite glia cell marker FAB7B, and the corresponding H.E. staining. In total, 1340 neurons were identified (excluding 75 region-of-interest with poor or unclear neuronal soma morphology in H & E staining) and clustered into 16 groups. The 16 clusters were assigned as different cell types based on marker genes expression.

This dataset is associated with Yu & Nagi 2024 (https://doi.org/10.1038/s41593-024-01794-1). It contains human dorsal root ganglia (DRG) 10x Xenium spatial transcriptomics raw data. In total, four DRG tissue sections from two healthy donors were used for Xenium spatial transcriptomics analysis, A hundred gene panel (including 87 neuronal genes from our single-soma sequencing dataset and 13 non-neuronal cell marker genes) were selected to perform spatial transcriptomics. The spatial distribution of these genes in neurons and non-neuronal cells was successfully profiled and quantified.

Description of the data and file structure

Overview: The .rar file contains all of the 10x Xenium spatial transcriptomics raw data for data analysis generating plots in the associated manuscript. Each .rar file contains the following contents.

Xenium experiment file: The experiment.xenium is an experiment manifest file in JSON format that includes experiment metadata and relative file paths to other data files in the output folder needed by Xenium Explorer to visualize results.
Analysis summary: The Xenium onboard analysis pipeline outputs an interactive HTML file named analysis_summary.html. Open it on-instrument, in a web browser, or in Xenium Explorer. It contains summary metrics and automated secondary analysis results.
Morphology images: A series of tissue morphology images are output by the pipeline, which are either nuclei-stained (DAPI) or nuclei and multi-tissue stained (DAPI, cell boundary, interior stains) images in OME-TIFF format. These files include a pyramid of resolutions and tiled chunks of image data, which allows for efficient interactive image visualization (JPEG-2000 compression, 16-bit grayscale, full and downsampled resolutions down to 256 x 256 pixels, learn more here). These files inlcude: morphology.ome.tif, morphology_focus.ome.tif and morphology_mip.ome.tif. The post-Xenium H&E staining was performed in this study, the H&E image file is DRG_area_1/2/3/4.ome. The alignment file between Morphology and H&E images is DRG_area_1/2/3/4_alignment_files.
Cell summary file: The cell summary file (cells.csv.gz) in gzipped CSV format contains data to help QC the transcript counts for each identified cell. The cell summary is also provided in Parquet format (cells.parquet) to enable faster loading and reading of data
Cell and nucleus segmentation files: The cells.zarr.zip file in zipped Zarr format contains segmentation masks and boundaries for nuclei and cells. These segmentation masks are used for assigning transcripts to cells. The boundary polygons are approximations of the segmentation masks, and are provided for efficient visualization of cell segmentation in Xenium Explorer and other analysis software. The nucleus_boundaries.csv.gz and cell_boundaries.csv.gz are the CSV representation of the nucleus and cell boundaries, respectively. Each row represents a vertex in the boundary polygon of one cell. The boundary points for each cell appear in clockwise order, and the first and the last points are duplicates to indicate a closed polygon. The same nucleus and cell boundary information is also provided in Parquet format (nucleus_boundaries.parquet and cell_boundaries.parquet) to enable faster loading and reading of data
Transcript data: The transcripts file (transcripts.parquet) is provided in Parquet format to enable faster loading and reading of data. It contains data to evaluate transcript quality and localization. Transcript information is also provided in zipped Zarr format (transcripts.zarr.zip), transcripts.csv and transcripts.csv.gz.
Cell-feature matrix: The Xenium onboard analysis pipeline outputs a cell-feature matrix (cell_feature_matrix) in three file formats: the Market Exchange Format (MEX), the Hierarchical Data Format (HDF5), and the Zarr format. The matrices only include transcripts that pass the default quality value (Q-Score) threshold of Q20. Each matrix in the cell_feature_matrix/ folder is stored in the MEX format for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. The cell_feature_matrix/features.tsv.gz file contains a list of pre-designed panel genes (and any custom add-on genes), negative controls, unassigned codewords, and deprecated codewords. The cell-feature matrix is also provided in: HDF5 format (cell_feature_matrix.h5), a binary format that compresses and accesses data more efficiently than text formats such as MEX and is useful when analyzing large datasets. H5 files are supported in both R and Python. Zipped Zarr format (cell_feature_matrix.zarr.zip). This file can be read by Xenium Explorer.
Gene expression metrics: The Xenium onboard analysis pipeline outputs key metrics in text format as metrics_summary.csv. This file contains metrics that are useful for assessing decoding and cell segmentation quality.
Secondary analysis results: The Xenium onboard analysis pipeline outputs an analysis/ directory with subdirectories containing several CSV files, which store the automated secondary analysis results. A subset of these results is used to render the Analysis tab in the Analysis summary file. The subdirectories correspond to: Clustering (clustering/) with graph-based and K-means results. Graph-based clustering (under graphclust) is run once as it does not require a pre-specified number of clusters. K-means (under kmeans) is run for K=2..N where K corresponds to the number clusters, and N=10 by default. Each value of K has its own results directory. Differential Expression (diffexp/) with graph-based and K-means results. Under each of the subdirectories are the differential_expression.csv files, which contain the list of cluster-specific features that are differentially expressed in each cluster relative to all the other clusters. Principal Component Analysis (pca/) which contains a total of five files listing the features used in the dimension reduction i.e., to reduce the feature space. These results are used to perform clustering. UMAP (umap/) contains the Uniform Manifold Approximation and Projection results. The secondary analysis results are also saved as a zipped Zarr file (analysis.zarr.zip), which can be read by Xenium Explorer for data visualization.
Panel file: The gene_panel.json file is a copy of the gene panel file used in the experiment on the Xenium Analyzer instrument.
Auxiliary files: The following are provided in aux_outputs/ (see release notes for updates): The morphology_fov_locations.json file contains the field of view (FOV) name, height, width, and XY positions in the space of the region of interest's (ROI) morphology image. This is the same space used to compute transcript and cell locations and the units are in microns. The FOVs have 3,520 rows and 2,960 columns with 128 pixels of overlap on each edge (this may change in future versions of the Xenium platform). The position information is useful for determining where FOV boundaries are to assess transcript deduplication and any FOV edge effects; The overview_scan_fov_locations.json file contains the FOV name, height, width, and approximate XY positions in the space of the overview scan image. This is the space that contains all the ROIs and the units are in pixels. The accuracy of the ROI coordinates have a 5 - 10 µm error. This position information is useful for approximating where multiple ROIs are located on an overview scan image; The per_cycle_channel_images/ directory contains downsampled 2D RNA images (maximum intensity projection) from each cycle and channel (not the morphology stain images). These images may be helpful for troubleshooting analysis summary alerts or unexpected metrics and analysis results; The overview_scan.png is the full-resolution (1672 x 3498 pixels) image of the entire sample on the slide; The background_qc_images/ directory contains the autofluorescence images (downsampled, TIFF format) that are subtracted from the raw morphology stain images to produce the autofocused images (morphology_focus/) if Cell Segmentation Staining protocol used.
File names and brief description:
- 1. File: output-XETG00171__0018220__Region_1__20240207__003742.rar (humand DRG section Xenium Donor N2 Lumbar 2);
- 2. File: output-XETG00171__0018220__Region_2__20240207__003742.rar (humand DRG section Xenium Donor N4 Lumbar 3);
- 3. File: output-XETG00171__0018220__Region_3__20240207__003742.rar. (humand DRG section Xenium Donor N4 Thoracic 12);
- 4. File: output-XETG00171__0018220__Region_4__20240207__003742.rar (humand DRG section Xenium Donor N4 Thoracic 12)

Sharing/Access information

Links to other publicly accessible locations of the data:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE273557

Data was derived from the following sources:

Yu, Huasheng, et al. "Single-Soma Deep RNA Sequencing of Human Dorsal Root Ganglion Neurons Reveals Novel Molecular and Cellular Mechanisms Underlying Somatosensation." bioRxiv (2023): 2023-03.

Spatial transcriptomic analysis of human dorsoal root ganglia neurons

Data files

Abstract

README: Spatial transcriptomic analysis of human dorsoal root ganglia neurons

Description of the data and file structure

Sharing/Access information

Methods