Spatial gene expression in the mouse colon during experimental colitis measured with MERFISH
Data files
Apr 02, 2024 version files 108.70 GB
-
061923_D9_m2_Swiss.csv
976.88 MB
-
062221_D9_m3_2_slice_1.csv
198 MB
-
062221_D9_m3_2_slice_2.csv
267.36 MB
-
062221_D9_m3_2_slice_3.csv
270.43 MB
-
062921_D0_m3a_1_slice_1.csv
416.50 MB
-
062921_D0_m3a_1_slice_2.csv
435.19 MB
-
062921_D0_m3a_2_slice_1.csv
400.19 MB
-
062921_D0_m3a_2_slice_2.csv
397.91 MB
-
062921_D0_m3a_2_slice_3.csv
349.91 MB
-
062921_D9_m2a_1_slice_1.csv
260.93 MB
-
062921_D9_m2a_2_slice_1.csv
300.79 MB
-
062921_D9_m2a_2_slice_2.csv
335.10 MB
-
062921_D9_m5_1_slice_1.csv
590.15 MB
-
062921_D9_m5_1_slice_2.csv
355.42 MB
-
062921_D9_m5_1_slice_3.csv
318.68 MB
-
062921_D9_m5_2_slice_1.csv
193.71 MB
-
062921_D9_m5_2_slice_2.csv
249.10 MB
-
062921_D9_m5_2_slice_3.csv
177.65 MB
-
072523_D35_m11_1_slice_1.csv
135.47 MB
-
072523_D35_m11_1_slice_2.csv
145.75 MB
-
072523_D35_m11_1_slice_3.csv
134.58 MB
-
072523_D35_m6_1_slice_1.csv
137.74 MB
-
072523_D35_m6_1_slice_2.csv
128.93 MB
-
072523_D35_m6_1_slice_3.csv
109.61 MB
-
080823_D9_m13_Swiss.csv
809.61 MB
-
080823_D9_m5_Swiss.csv
405.68 MB
-
082421_D0_m6_1_slice_1.csv
290.45 MB
-
082421_D0_m6_1_slice_2.csv
402.24 MB
-
082421_D0_m6_1_slice_3.csv
183.89 MB
-
082421_D0_m7_1_slice_1.csv
296.07 MB
-
082421_D0_m7_1_slice_2.csv
302.37 MB
-
082421_D21_m1_1_slice_1.csv
565.17 MB
-
082421_D21_m1_1_slice_2.csv
547.84 MB
-
082421_D21_m2_1_slice_1.csv
292.50 MB
-
082421_D21_m2_1_slice_2.csv
139.81 MB
-
092421_D3_m1_1_slice_1.csv
207.94 MB
-
092421_D3_m1_1_slice_2.csv
283.15 MB
-
092421_D3_m1_1_slice_3.csv
298.13 MB
-
092421_D3_m2_1_slice_1.csv
253.01 MB
-
092421_D3_m2_1_slice_2.csv
281.96 MB
-
092421_D3_m2_1_slice_3.csv
276.92 MB
-
092421_D3_m3_1_slice_1.csv
227.71 MB
-
092421_D3_m3_1_slice_2.csv
212.39 MB
-
092421_D3_m3_1_slice_3.csv
257.61 MB
-
092421_D3_m3_1_slice_4.csv
163.16 MB
-
092421_D3_m4_1_slice_1.csv
264.93 MB
-
092421_D3_m4_1_slice_2.csv
303.15 MB
-
092421_D3_m4_1_slice_3.csv
335.93 MB
-
100221_D9_m2_1_slice_1.csv
200.24 MB
-
100221_D9_m2_1_slice_2.csv
211.75 MB
-
100221_D9_m2_1_slice_3.csv
302.49 MB
-
100221_D9_m3_1_slice_1.csv
213.42 MB
-
100221_D9_m3_1_slice_2.csv
220.27 MB
-
100221_D9_m3_2_slice_1.csv
188.67 MB
-
100221_D9_m3_2_slice_2.csv
143.61 MB
-
100221_D9_m5_1_slice_1.csv
275.74 MB
-
100221_D9_m5_1_slice_2.csv
235.57 MB
-
100221_D9_m5_1_slice_3.csv
260.70 MB
-
100221_D9_m5_2_slice_1.csv
219.18 MB
-
100221_D9_m5_2_slice_2.csv
159.88 MB
-
100221_D9_m5_2_slice_3.csv
131.55 MB
-
adata_day35.h5ad
1.51 GB
-
adata.h5ad
17.96 GB
-
Brugger_annotations.csv
212.44 KB
-
bulk_rnaseq_FPKM.csv
13.85 KB
-
cell_properties_day35.csv
14.11 MB
-
cell_properties.csv
299.13 MB
-
gene_names_day35.csv
7.06 KB
-
gene_names.csv
7.05 KB
-
Ho_annotations.csv
1 MB
-
Jasso_annotations.csv
1.51 MB
-
Kinchen_M_DSS_annotations.csv
98.60 KB
-
Kinchen_M_healthy_annotations.csv
104.04 KB
-
ligand_receptor_pair_masterlist.csv
16.36 KB
-
README.md
10.47 KB
-
X_day35.csv
2.25 GB
-
X_raw_day35.csv
2.25 GB
-
X_raw.csv
33.38 GB
-
X.csv
33.38 GB
-
Xie_DSS_annotations.csv
705.47 KB
Abstract
Gut inflammation involves contributions from immune and non-immune cells, whose interactions are shaped by the spatial organization of the healthy gut and its remodeling during inflammation. The crosstalk between stromal and immune cells is an important axis in this process, but our understanding has been challenged by incomplete cell-type definition and biogeography. To address this challenge, we used MERFISH to profile the expression of 940 genes in 1.35 million cells in colon slices collected across the onset and recovery from a mouse colitis model. We identified a large diversity of cell populations; charted their spatial organization; and revealed their polarization or recruitment in inflammation. We found a staged progression of inflammation-associated tissue neighborhoods orchestrated, in part, by multiple inflammation-associated fibroblasts, with unique expression profiles, spatial localization, cell-cell interactions, and healthy fibroblast origins. Similar signatures in ulcerative colitis suggest conserved processes in humans. Broadly, we provide a resource for understanding inflammation-induced remodeling in the gut and other tissues.
README: Spatial gene expression in the mouse colon during experimental colitis measured with MERFISH
Jeffrey R. Moffitt
Paolo Cadinu
Boston Children's Hospital, 2024
File organization
Anndata structure for Day 0 - Day 21 MERFISH data
adata.h5ad
This file was generated with the scanpy pipeline and can be loaded in Python with the tools associated with this pipeline.
The adata object contains a measure of the normalized counts observed in each cell for each of the genes measured with MERFISH.
The values stored in the X matrix are normalized by the total counts per cell, scaled to a uniform value, and then converted to logarithmic space by adding a pseudo count and applying a log10 transform.
The adata object contains an obs field that contains a data frame that describes the properties of each field.
The fields of this data frame are defined as follows:
- x: This field describes the x position of the centroid of the cell in microns
- y: This field describes the y position of the centroid of the cell in microns
- Mouse_ID: This field contains a unique name for the mouse from which the cell was measured
- Technical_repeat_number: This field contains a unique name for each technical repeat (defined as a measurement of different slices but from the same mouse)
- Sample_type: This field describes the disease stage at which the mouse was harvested. Healthy (Day 0), DSS3 (Day 3), DSS9 (Day 9), and DSS21 (Day 21)
- Slice_ID: This field contains the unique name for the slice from which the cell was measured
- FOV: This field contains the ID associated with the original field-of-view in which the cell was imaged
- cell_IDs: This field contains the unique ID associated with each cell
- sample: This field contains a unique numeric ID for each sample
- N_genes: This field contains the number of genes for which there was at least one counts
- Tier1: This field contains the name of the class to which the cell was assigned
- Tier2: This field contains the name of the tier 2 cluster to which the cell was assigned
- Tier3: This field contains the name of the final cluster to which the cell was assigned
- Leiden_neigh: This field contains the name of the neighborhood to which the cell was assigned. 'others' indicates the cell was assigned to a neighborhood not explored
- Neigh_umap_x(y): This field contains the x or y coordinate of the cell in the neighborhood UMAP
- Tier1_umap_x(y): This field contains the x or y coordinate of the cell in the UMAP constructed during Tier 1 clustering.
- Tier2_umap_x(y): This field contains the x or y coordinate of the cell in the UMAP constructed during Tier 2 clustering. There is a different UMAP for each cell class
- Tier3_umap_x(y): This field contains the x or y coordinate of the cell in the UMAP constructed during Tier 3 clustering. There is a different UMAP for each cell type for which Tier 3 clustering was applied
Anndata items for Day 0 - Day 21 MERFISH data
To facilitate access to the core elements of the anndata object without the need to load this format, we also provide the following items from the anndata object in CSV format.
- X.csv - This file contains a matrix of expression values, normalized and transformed with a pseudocount and a log10 transformation. The rows represent different cells and the columns different genes. These are the same numeric values found in the X matrix described above.
- X_raw.csv - This file contains a matrix of the counts for each gene in each cell. These data are normalized.
- gene_names.csv - This file contains the gene names associated with the columns in X in the same order as the columns in X
- cell_properties.csv - This file contains the same fields and information described in the pandas data frame above (the obs object). The rows of this file represent different cells and are in the same order as the rows in X.
Anndata structure for Day 35 MERFISH data
adata_day35.h5ad
This file was generated with the scanpy pipeline and can be loaded in Python with the tools associated with this pipeline.
The adata object contains a measure of the normalized counts observed in each cell for each of the genes measured with MERFISH.
The values stored in the X matrix are normalized by the total counts per cell, scaled to a uniform value, and then converted to logarithmic space by adding a pseudo count and applying a log10 transform.
The adata object contains an obs field that contains a data frame that describes the properties of each field.
The fields of this data frame are defined as follows:
- x: This field describes the x position of the centroid of the cell in microns
- y: This field describes the y position of the centroid of the cell in microns
- Mouse_ID: This field contains a unique name for the mouse from which the cell was measured
- Technical_repeat_number: This field contains a unique name for each technical repeat (defined as a measurement of different slices but from the same mouse)
- Sample_type: This field describes the disease stage at which the mouse was harvested.
- Slice_ID: This field contains the unique name for the slice from which the cell was measured
- FOV: This field contains the ID associated with the original field-of-view in which the cell was imaged
- cell_IDs: This field contains the unique ID associated with each cell
- Tier1: This field contains the name of the class to which the cell was assigned
- Tier3: This field contains the name of the final cluster to which the cell was assigned
- Leiden_neigh: This field contains the name of the neighborhood to which the cell was assigned. 'others' indicates the cell was assigned to a neighborhood not explored
Anndata items for Day 35 MERFISH data
To facilitate access to the core elements of the anndata object without the need to load this format, we also provide the following items from the anndata object in CSV format.
- X_day35.csv - This file contains a matrix of expression values, normalized and transformed with a pseudocount and a log10 transformation. The rows represent different cells and the columns different genes. These are the same numeric values found in the X matrix described above.
- X_raw_day35.csv - This file contains a matrix of the counts for each gene in each cell. These data are normalized.
- gene_names_day35.csv - This file contains the gene names associated with the columns in X in the same order as the columns in X
- cell_properties_day35.csv - This file contains the same fields and information described in the pandas data frame above (the obs object). The rows of this file represent different cells and are in the same order as the rows in X.
Individual RNA metadata files for all MERFISH data
In addition, there is a CSV file associated with each slice measured with MERFISH that describes the properties associated with each RNA measured within that slice. Each CSV file is named by the specific name of the mouse and slice. An example of one of these files is 100221_D9_m5_1_slice_3.csv.
These CSV files have the following fields for each RNA
- gene: This field contains the common name of the gene associated with the identified RNA
- fov_id: This field contains the number of the specific field-of-view (fov) in the microscopy tile collected for that slice in which the cell was found
- x: This field describes the x position of the centroid of the RNA in microns
- y: This field describes the y position of the centroid of the RNA in microns
- z: This field describes the z position (the depth within the tissue slice) of the centroid of the RNA in microns
- total_magnitude: This field describes the sum of the normalized brightness associated with each pixel assigned to the given RNA
- area: This field describes the number of pixels assigned to the given RNA
- error_bit: This field contains the number of the bit at which error correction was applied. 0 indicates that no error correction was applied.
- error_dir: This field describes the direction of the error (1 = 1->0 error; 0 = 0->1 error)
- cell (optional): The numeric identity of the cell measured in the dataset from which the slice was taken. 0 indicates the RNA was not assigned to the cell
- is_noise (optional): Whether or not Baysor determined the RNA should (FALSE) or should not (TRUE) be assigned to a cell
- ncv_color (optional): The color value assigned to the RNA by Baysor based on the local composition of RNAs around it
- cell_name (optional): A unique identifier associated with each cell in the dataset. These names are found within the adata.hdf5 file
- slice_id (optional): A unique name associated with the specific slice
Bulk RNAseq abundance
bulk_rnaseq_FPKM.csv
This CSV file contains the following fields:
- FPKM: abundance expressed in FPKM
- geneName: gene name
Ligand receptor pairing
ligand_receptor_pair_masterlist.csv
This CSV file contains information on ligand-receptor pairs as determined from KEGG. It contains the following fields:
- Ligand: The name of the ligand
- Receptor: The name of the receptor
- Coreceptor_1: The name of a possible co-receptor
- Coreceptor_2: The name of a second possible co-receptor
- Category: The category of ligand
- Sub-category: The sub-category of ligand
Data associated with single-cell RNAseq datasets mined from previously published works:
The raw data for the following papers can be downloaded from the links provided with the associated papers. These public data were downloaded and reanalyzed to produce annotations used in the analysis of our paper. Here we provide CSV files that contain the name associated with each cell and the final cluster label from the analysis of these data.
These CSV files contain two fields.
- cell_name: The unique ID associated with each cell as provided in the original publication
- leiden_annotated: The label assigned to each cell
The raw data for the cells in Ho_annotations.csv was downloaded from PMID 33862275
The raw data for the cells in Brugger_annotations.csv was downloaded from PMID 33306673
The raw data for the cells in Kinchen_M_DSS_annotations.csv and Kinchen_M_healthy_annotations.csv was downloaded from PMID 30270042
The raw data for the cells in Jasso_annotations.csv was downloaded from PMID 35085231
The raw data for the cells in Xie_DSS_annotations.csv was downloaded from PMID 35569814
Methods
This dataset was collected using multiplexed error robust fluorescence in situ hybridization (MERFISH) as described in the methods of the supporting paper. The methods section of this paper also describes the approaches used to analyze the data, which include the use of publicly available pipelines for the analysis of raw MERFISH images (github.com/ZhuangLab/MERFISH_analysis), RNA assignment to cells (github.com/kharchenkolab/Baysor), and single-cell analysis (github.com/scverse/scanpy). The organization of the provided files is described in the included README.md.
Usage notes
We have provided the output of our single-cell analysis in the AnnData format utilized by scanpy and saved in hdf5. Consult the scanpy documentation for instructions on loading this file. To facilitate additional access to these data, we have also provided the key elements included in the AnnData format in CSV format. All other information is provided in CSV format. We have also provided AnnData format for our reanalysis of published scRNA-seq data relevant to comparisons in our manuscript.