Boosting multiplexing capabilities for error-robust spatial transcriptomic methods using a set exchange approach
Data files
Abstract
In the last decades, image-based transcriptomic and proteomic experiments have moved from single-target probes to multiplexed experiments, allowing researchers to study hundreds or even thousands of mRNA and protein targets simultaneously. This large increase in scope necessitates methods in either increased specificity or in error-correction, such as the Hamming Codes utilized in the imaging-based spatial transcriptomic method MERFISH. For some experimental conditions, Hamming Codes are efficient in encoding the highest possible number of genes for spatial analysis. However, for most experimental parameters, the optimal generation of error-robust codebooks is an unsolved mathematical problem. Here we present a method to generate highly optimized Extended Hamming Codebooks compatible with established error-correctable methodologies such as MERFISH. Our method uses an iterative set-exchange approach and generally reaches over 90% of the theoretical maximum limit of gene set complexity. We also provide ready-to-use codebooks and discuss the advantages and disadvantages of changing probe density.
https://doi.org/10.5061/dryad.zkh1893m5
We deposit a host of error-robust constant weight binary codebooks for Hamming Weights (HW) 4, 5, and 6, that are ready-to-use with current MERFISH (Multiplexed-Error-Robust-Fluorescence-In-Situ-Hybridization) methodologies. Each codebook is supplied both as a binary version and as a set-based version, and is accompanied by a file with metadata and a reordered version that maximizes the Hamming Distance of the earliest appearing rows. See below for full details of file organization.
Description of the data and file structure
The data is associated with a manuscript in Science Advances, intending to create optimized error-robust constant-weight binary codebooks for given parameter combinations.
The manuscript is titled: Boosting Multiplexing Capabilities for Error-Robust Spatial Transcriptomic Methods using a Set Exchange Approach
It was published on the 2nd of May 2025 in Science Advances with the manuscript number adr4026.
An error-robust constant weight binary codebook is a list of binary codes [an example of a code: 0011101000] that are all sufficiently different from each other, as measured with the Hamming Distance between each code. The Hamming Distance between two codes describes the number of differing positions between the two codes. These codebooks are colloquially called “Hamming codes” and are used in data protection and for spatial transcriptomics, such as the method MERFISH (Multiplexed-Error-Robust-Fluorescence-In-Situ-Hybridization).
When used for MERFISH, these codebooks can be used to assign one code per studied gene transcript, and when the image-based in-situ-hybridization detection of transcripts generates errors, a single error can be corrected due to the error-robust nature of the codebook, increasing the overall yield of recovered transcriptomic information.
The provided codebooks all fulfill the criteria of a minimum Hamming Distance of 4 (min4HD), which is also called SECDED (Single-Error-Correcting-Double-Error-Detecting) codebooks when used for consumer electronics. Due to the fact that each code has four differences from each other code, if a single code happens to be modified with one error, it can be error-corrected back to its correct code, as it will still be closest to its original code and never end up near another code in the codebook.
All of these codebooks also have a constant number of 1-bits, which is called the Hamming Weight (HW) of each code. For example, the code 0011101000 has a Hamming Weight of 4, or is denoted as HW4.
All codebooks also have a fixed Barcode Length (Bits), which denotes how many digits are in each code. For example, 0011101000 has a Barcode Length (Bits) of 10.
We provide minHD4 codebooks for fixed Hamming Weights HW4, HW5, and HW6, with various Barcode Lengths up to a total codebook length of tens of thousands.
For ease of use for computational purposes, all codebooks are provided in two versions, one binary and one set-based. Both are .csv files with the same name prefix. The binary file presents the codebooks with positional 1s and 0s, and the set-based file presents the positions of the 1 bits as a subset of all positions. For example, in the binary version of a BarcodeLength10HW4 codebook, one code could be 0011101000 in binary, and in the corresponding set-based codebook, the same code will be 3,4,5,7, denoting the positions of the 4 1-bits.
Each codebook filename contains the Barcode Length, Hamming Weight, and final size as parameters in the filename itself: [BarcodeLength]BitHW[HammingWeight]HD4_finalsize[LengthOfCodebook][Set/Binary].csv
The binary files are also supplied in a reordered version, in a separate subfolder. Each binary codebook has a corresponding reordered codebook with exactly the same contents, but the rows have been reordered to maximize the Hamming Distance between the earliest appearing rows.
Each file is also accompanied by a metadata file listing the parameters included in the filename, and intermediate output from the R code that generated the codebook, such as timing measurements for each step, and codebook sizes for each intermediate step during the running of the R code. These are fully annotated in the accompanying supplied R code.
R code to generate further codebooks is also provided, as well as R code to verify the minHD status of a provided file. MATLAB code for generating large extended Hamming codes is also provided, used in the publication for comparison purposes.
Files and variables
File: SupplData3_HW4_Codebooks.zip
Description: HW4 error-robust constant weight Codebooks ready-to-use for MERFISH
File: SupplData4_HW5_Codebooks.zip
Description: HW5 error-robust constant weight Codebooks ready-to-use for MERFISH
File: SupplData5_HW6_Codebooks.zip
Description: HW6 error-robust constant weight Codebooks ready-to-use for MERFISH
Code/software used
R is required to run the code, with the addition of the two libraries (dplyr, rvest). We used R version 4.3.2 for the generation of the supplied codebooks. The script is annotated with explanations of each step as well as describing the output contents of the metadata files (the .txt files).
MATLAB is required to run the code generating comparative filtered Extended Hamming Codes, used in the published article to compare the generated codebooks.