Data from: Protocol optimization improves the performance of multiplexed RNA imaging
Data files
Sep 26, 2025 version files 195.08 GB
-
data.zip
9.41 GB
-
README.md
11.93 KB
-
readout_screening.zip
185.67 GB
Abstract
Spatial transcriptomics has emerged as a powerful tool to define the cellular structure of diverse tissues. One such method is multiplexed error robust fluorescence in situ hybridization (MERFISH). MERFISH identifies RNAs with error tolerant optical barcodes generated through sequential rounds of single-molecule fluorescence in situ hybridization (smFISH). MERFISH performance depends on a variety of protocol choices, yet their effect on performance has yet to be systematically examined. Here we explore a variety of properties to identify optimal choices for probe design, hybridization, buffer storage, and buffer composition. In each case, we introduce protocol modifications that can improve performance, and we show that, collectively, these modified protocols can improve MERFISH quality in both cell culture and tissue samples. As RNA FISH-based methods are used in many different contexts, we anticipate that the optimization experiments we present here may provide empirical design guidance for a broad range of methods. The data provided in these repository are those used in the optimization and validation of these protocols.
File Organization
The repository has two downloadable zip folders.
- readout_screening.zip contains the Bit_Screening folder.
- data.zip contains all other folders.
U2OS_MERFISH
This folder contains data associated with MERFISH measurements of 130 RNAs in U-2 OS cell culture with old and optimized protocols.
It contains the following files:
barcode_metadata_Rep5_Ctrl.csv -- The identified RNAs from one replicate measurement of U-2 OS with the old protocols
barcode_metadata_Rep5_Opt.csv -- The identified RNAs from one replicate measurement of U-2 OS with the optimized protocols
barcode_metadata_Rep6_Ctrl.csv -- The identified RNAs from one replicate measurement of U-2 OS with the old protocols
barcode_metadata_Rep6_Opt.csv -- The identified RNAs from one replicate measurement of U-2 OS with the optimized protocols
replicate_watershed_counts.csv -- The number of nuclei counted in each of the above experiments
FPKMData.matb (.csv) -- The abundance associated with each RNA determined by bulk RNA sequencing in a closed format (matb) or an open format (csv)
barcode_metadata files have the following fields
- barcode_id: index of the associated barcode
- gene_name: name of the identified gene
- slice_id: an index associated with different ROIs in the sample. Always 1 for these samples.
- fov_id: an index for the field of view in which the RNA was imaged
- total_magnitude: the sum of all normalized brightness values for pixels located within a region identified as a molecule
- brightness: the average brightness across all pixels assigned to an RNA
- area: the number of pixels associated with the molecule
- abs_position_1: the x position of the molecule centroid on the stage (µm)
- abs_position_2: the y position of the molecule centroid on the stage (µm)
- abs_position_3: the z position of the molecule centroid as determined by the nanopositioner elevation
- error_bit: the bit number if an error has been identified in the barcode. If no bit number is applicable, this value is 0
- error_dir: the direction of the error applied. If 0->1: 0. If 1->0: 1
replicate_watershed_counts has the following fields:
- sample: the name of the sample/replicate
- nuclei_counts: the number of nuclei identified in each sample
FPKMData has the following fields
- geneName: the name of the gene
- id: the id of the RNA isoform targeted
- FPKM: the abundance determined by bulk RNA sequencing
Colon_MERFISH
This folder contains data associated with MERFISH measurements of ~1000 RNAs in Swiss rolls of the mouse colon with the old and optimized protocols.
It contains the following files:
barcode_metadata_old.csv -- The identified RNAs from a Swiss roll measurement with the old protocols
barcode_metadata_optimized.csv -- The identified RNAs from a Swiss roll measurement with the optimized protocols
barcode_metadata files have the same fields as described above
U2OS_smFISH
This folder contains data associated with all of the smFISH molecules identified in the optimization experiments presented in this work
The following files describe the following experiments:
- ‘MoleculesID_SingleFrame_AdenineSpacers’ examines the impact of hybridizing probes with differing lengths of spacer nucleotides between the encoding and readout regions. The sample names are given by the length of adenine spacers placed between the encoding and readout 1 regions, and between the readout 1 and readout 2 regions.
- ‘MoleculesID_SingleFrame_Annealing_ComplexOpool’ examines the impact of annealing on hybridization using a complex 130-RNA species library. The sample names are given by the heat at which the sample was annealed, or No_Annealing for not undergoing annealing.
- ‘MoleculesID_SingleFrame_Annealing_GeneOpool’ examines the impact of annealing on hybridization using small single-gene oligopools. The sample names are given by the heat at which the sample was annealed, or No_Annealing for not undergoing annealing.
- ‘MoleculesID_SingleFrame_HybDur_ProbeConc’ examines the impact on hybridization of differing durations on different concentration of a complex 130-RNA species library. The sample names are given by the dilution from a 4 µM stock of the library concentration and the duration for which the sample was hybridized. For instance, '1-100_1-Day' was diluted 1:100 (ie: 40 nM) and hybridized for 1 day, while '1-1_7-Day' was the hybridized at the stock concentration (4 µM) for 7 days.
- ‘MoleculesID_SingleFrame_ProbeLength_FormamideConc’ examines the impact on hybridization using probes of differing encoding regions lengths and also moderated by the concentration of formamide in the hybridization buffer. The sample names are given by their formamide concentration used in hybridization and encoding region length. For instance, '10%_20-nt' used 20-nt encoding regions hybridized at a 10% formamide concentration.
- ‘MoleculesID_SingleFrame_ReadoutHybBufAging’ examines the impact of using a readout hybridization buffer and their respective readout probes that had been aged 1, 2, 4, or 7 days in advance under various conditions. The sample names are given by the duration that readout hybridization buffer was aged and the conditions in which this buffer was stored. For instance, '4-Day Mineral Oil' was stored for 4 days under a layer of mineral oil; '0-Day' was prepared fresh.
These csv files contain the following columns:
- x: the x position within the frame, given in pixels
- y: the y position within the frame, given in pixels
- value: the fluorescent intensity value of this molecule
- sample: the identity of the sample conditions
- frame_num: the number of the image frame in which this molecule was located
- dye: the readout fluorophore dye indicating the laser channel fluorophore was read (AF488: 473 nm, ATTO565: 561 nm, Cy5: 635 nm, AF750: 750 nm)
- FOV_num: an index for the field of view in which this molecule was imaged
- rep_num: the biological replicate identity for which this molecule was located
- scope (optional): the identity of the microscope used to image this sample
- gene (optional): if applicable, the name of the gene the oligopool was designed to hybridize to
The following files describe the following experiments:
- ‘MoleculesID__LongBleaching_AgedImgBuffer’ examines the impact of bleaching on imaging buffer that had been aged for 0 or 7 days. The sample names are given by the duration of imaging buffer aging.
- ‘MoleculesID__LongBleaching_ImgBufpHOpt’ examines the impact of bleaching on fresh imaging buffer with different buffer compositions and pH when fully deoxygenated when continuously imaged for 250 seconds per dye per FOV. This examines the steady-state bleaching loss. The sample names are given by the buffer composition and pH that was modified.
- ‘MoleculesID_ShortBleaching_ImgBufpHOpt’ examines the impace of bleaching on fresh imaging buffer with different buffer compositions and pH when imaging buffer is flowed and then imediately imaged over 81 FOVs with 2 frames taken per FOV at 2fps. This examines the immediate fluorescent intensity loss during the oxygen-scavenging state. The sample names are given by the buffer composition and pH that was modified.
- 'MoleculesID_ShortBleaching_ATTO565_AF488' examines the impact of dynamic bleaching on ATTO565 and AF488 fluorophores. The sample names are given by the imaging buffer.
- 'MoleculesID_LongBleaching_ATTO565_AF488' examines the impact of static bleaching on ATTO565 and AF488 fluorophores. The sample names are given by the imaging buffer.
These csv files contain the following columns:
- x: the x position within the frame, given in pixels
- y: the y position within the frame, given in pixels
- sample: the identity of the sample conditions
- dye: the readout fluorophore dye indicating the laser channel fluorophore was read (Cy5: 635 nm, AF750: 750 nm)
- FOV_num: the sequential FOV for which this molecule was located
- rep_num: the biological replicate identity for which this molecule was located
- time#val: the fluorescent intensity value for this identified molecule, on the FOV_num listed, given by the sequential time point given by #. This is measured in seconds.
The following files contain the following information:
- 'WatershedCounts_Annealing_ComplexOpool' contains nuclei counts (i.e. cell numbers) for all listed experiments.
- 'WatershedCounts_Annealing_GeneOpool' contains nuclei counts (i.e. cell numbers) for all listed experiments.
- 'WatershedCounts_HybDur_ProbeConc' contains nuclei counts (i.e. cell numbers) for all listed experiments.
- 'WatershedCounts_ProbeLength_FormamideConc' contains nuclei counts (i.e. cell numbers) for all listed experiments.
- ‘WatershedCounts_ReadoutHybBufAging’ contains nuclei counts (i.e. cell numbers) for all listed experiments.
These csv contains the following fields
- rep_number: a unique number for each biological replicate of an experiment
- sample: a short descriptor of the experiment
- nuclei_counts: the number of nuclei identified in each sample
Scripts
This folder contains example code associated with the identification of molecules and the quantification of their brightness in molecular counting and brightness measurements.
It also contains code associated with the counting of nuclei from DAPI images.
Note that this code is not intended to be run as is, as we do not provide example images, and the code assumes a variety of naming conventions and organization of the images, which we do not provide.
However, we provide this code so that the image processing steps used to identify molecules are clear.
These scripts are provided under the same CC0 license. They were used with python 3.6 and the following package versions:
storm-analysis==2.1
tifffile=2019.7.26.2=py36_0
pillow=7.2.0=py36hcc1f983_0
pandas=1.0.5=py36h47e9c7a_0
numpy=1.17.0=py36h19fb1c0_0
matplotlib=3.2.2=0
Bit_screening
This folder contains raw imaging data for bit screening experiments. Raw data is stored as tif files in the format generated by the storm control codebase (https://github.com/MoffittLab/storm-control) with associated metadata files.
This folder contains the following sub folders
- HEK
- HeLa
- Human_TrigeminalGanglia
- Mouse_Colon
- Mouse_Ileum
- Mouse_Jejunum
- Mouse_TrigeminalGanglia
- U2-OS
Each folder contains a rawdata folder(s) and a data_organization file(s)
data_organization files describe the format of the raw data files as to be compatible with the Moffitt Lab storm analysis codebase (https://github.com/MoffittLab/storm-analysis).
These files contain the following fields
- bitName: readout probe name consistent with nomenclature of the corresponding manuscript
- imageType: file naming structure for file base
- imageRegExp: file naming structure combining file base, fov number, hybridization round, and camera id
- bitNumber: readout probe number
- imagingRound: hybridization round
- color: excitation laser used (nm)
- imagingCameraID: optional, if using a multi camera microscope, camera id
- frame: frame number within tif stack
- zPos: z position of objective as moved by nanopositioner
- fiducialImageType: file naming structure for file base of fiducial channel used to warp succesive imaging rounds
- fiducialRegExp: file naming structure combining file base, fov number, hybridization round, and camera id for fiducial channel
- fiducialImagingRound: hybridization round of fiducial images
- fiducialFrame: frame number within tif stack for fiducial chanel
- fiducialColor: excitation laser used (nm) for fiducial channel
- fiducialCameraID: optional, if using a multi camera microscope, camera id for fiducial channel
