Data from: Dual-mode microfluidic immunostaining device for diagnostic biomarkers detection and tumor microenvironment evaluation
Data files
Jan 09, 2026 version files 83.59 MB
-
DataAnalysis.zip
57.22 KB
-
ImageRegistration.zip
22.46 KB
-
PCNSL-fig6.xlsx
6.21 MB
-
PCNSL-fig7.zip
14.16 MB
-
README.md
7.33 KB
-
registration.zip
1.15 MB
-
tonsil-fig3.zip
19.26 KB
-
tonsil-fig4.xlsx
61.97 MB
Abstract
The limited availability of tissue samples from rare tumors poses a major barrier to advances in precise diagnosis, prognostic evaluation, and therapeutic research—challenges exemplified by primary central nervous system diffuse large B‑cell lymphoma (PCNS‑DLBCL). Here, we developed a dual-mode microfluidic immunostaining (Dumi) device, which integrates diagnostic and research workflows into a single, automated platform, reducing tissue section consumption by over 90 %. Using just 1–2 slides, it enables both regional detection of biomarkers for subtyping diagnostic (≤ 16) and construction of multiplex tumor microenvironment (TME) maps. Joint analysis of multi-region diagnostic biomarkers in the TME map indicates that tumor cell subpopulations defined by specific diagnostic biomarkers can actively shape their in situ microenvironmental niche. Dumi offers an efficient, cost-effective, and multifunctional immunostaining method, overcoming the limitations of scarce tissue resources and providing a clinically accessible solution to the diagnostic and therapeutic challenges of rare tumors.
Data Components
- tonsil-fig3.zip: For the paired immunohistochemistry of conventional IHC (manual) and whole-chamber IHC (Dumi) in Fig. 3 of the associated article, the positive intensity values, mean background, and SBR of 23 × 48 matrices.
- tonsil-fig4.xlsx: Single-cell data from multiplex immunofluorescence images derived from three tonsil sections. Each row represents a single segmented cell.
- PCNSL-fig6.xlsx: Single-cell data from multiplex immunofluorescence images derived from one PCNSL section. Each row represents a single segmented cell.
- PCNSL-fig7.zip: Single-cell data of multiplex immunofluorescence generated after a sequential multi-channel to whole-chamber IF staining strategy was performed on a single PCNSL slide. Each row represents a single segmented cell.
- registration.zip: Paired raw images after multi-channel and whole-chamber staining derived from three tonsil sections.
The single-cell data provided in tonsil-fig4, PCNSL-fig6, and PCNSL-fig7 contain the following variables:
- sheet T1-T3: ROIs from each tonsil section.
- Classifications: Contains the positive labels for different biomarkers.
- Centroid X px: X-coordinate in the ROI (unit: pixels).
- Centroid Y px: Y-coordinate in the ROI (unit: pixels).
- Cell/Nucleus/Cytoplasm CD20/Ki67/CD68/CD8a/CD4/CD34 mean: The measured fluorescence intensity value for a representative area selected for different biomarkers on each cell.
- multi-channel: Single-cell data measured within the microchannels for different diagnostic biomarkers in the PCNSL sample.
Code Description
This repository contains all Python (Data Analysis) and MATLAB (Image Registration) code used in this study. Each code package (DataAnalysis.zip, ImageRegistration.zip) contains its own detailed README.md and requirements.txt file.
1. Data Analysis (Python)
This section includes Python scripts used to generate the analyses for Figures 4, 6, and 7.
1.1 Figure 4: Tonsil Reproducibility Analysis
This pipeline (scripts 1_ to 6_) assesses the reproducibility of the 6-plex mIF experiment across three serial tonsil sections.
- Workflow:
- Preprocessing (1_Data_preprocessing_QC.py): Loads 3tonsils.xlsx, merges samples, and applies per-sample Z-score normalization.
- Clustering (2_phenograph_3sections.py): Runs Phenograph clustering and t-SNE on the combined, normalized data.
- Data Prep (4_spatial_analysis.py): Creates a master file merging coordinates, cluster IDs (from Step 2), and threshold-based 'Phenotype' labels.
- Spatial Analysis (5_cluster_based.py, 6_phenotype_based.py): Performs a dual-approach spatial consistency analysis using both 'Cluster' and 'Phenotype' labels. It calculates neighborhood enrichment and quantifies similarity (Cosine, Jaccard) between samples.
1.2 Figure 6: PCNSL Clustering & Spatial Statistics
This pipeline analyzes clustering and spatial statistics for the PCNSL sample.
- Workflow:
- Clustering (1_PhenographPCNSL.py): Loads and normalizes PCNSL data, then runs Phenograph clustering (k = 30) and Barnes-Hut t-SNE.
- Visualization (2_cell_populations.py, 3_marker_intensity.py): Generates t-SNE plots colored by annotated cell type and by individual marker intensity.
- Spatial Statistics (4_Ripley's_K...py, 5_Monte_Carlo.py): Calculates various spatial statistics, including Cross Ripley's K (with Monte Carlo simulation) and Moran's I, to analyze cell-cell spatial relationships (e.g., CD8+ vs. CD20+/CD34+).
- Annotation (6_clustermap_annotation.py): Creates an annotated heatmap of mean marker expression per cluster, visualized alongside cell counts.
1.3 Figure 7: PCNSL 16-Plex Diagnostic Strip Analysis
This is a complex pipeline that aligns data from 16 diagnostic "strips" onto a single "whole image" coordinate scaffold for integrated spatial analysis.
- Workflow:
- Preprocessing (1_data_preprocessing.py): The core script. Loads the Whole image.xlsx as a coordinate scaffold. It then loads all 16 strip{Marker}.xlsx files and uses a k-d tree (cKDTree) to spatially match and transfer annotations (cell classifications) to the master scaffold.
- Global Analysis (2_global_analysis.py, 3_global_anlaysis_advanced.py): Performs global spatial analyses (KDE density maps, Correlograms, NND, PCF) on the aligned master dataset.
- Core Statistics (4_spatial_analysis_full.py): The main statistical script. It iterates through each of the 16 diagnostic markers, performing parallelized permutation tests to quantify spatial neighborhood enrichment/avoidance against key environmental markers (e.g., CD8a+, CD68+).
- Visualization (5_visualization.py, 6_doulble_positive...py): Generates the final summary figures from the statistical output, including the primary Bubble Heatmap, volcano plots, and dumbbell plots comparing double-positive (e.g., CD20+/Bcl-6+) vs. single-positive cells.
1.4 Python Dependencies
All Python scripts require Python 3.x. The specific libraries for each analysis (e.g., pandas, numpy, seaborn, scikit-learn, phenograph, libpysal) are listed in the requirements.txt file within each respective folder.
2. Image Registration (MATLAB)
This package contains MATLAB scripts for feature-based geometric transformation and image registration.
- System Requirements:
- MATLAB R2020b or newer (tested on R2020b).
- Image Processing Toolbox.
- Workflow:
- Entry Point (main.m): This is the main script to run the registration pipeline.
- Parameter Setup (SetParameter.m): A dialog box prompts the user to define parameters, such as block size, strip numbers (StripNum), and Gaussian sigma.
- Preprocessing (function/OnlySaveStripInfo.m, function/MakeStripAndOverallInSameSize.m): These functions isolate the relevant "strip" regions from the full image, handle rough alignment using phase correlation, and ensure the fixed and moving images are the same size.
- Registration (function/MakeRegister.m, function/MakeRegister1.m): Implements the core block-based registration. It matches features (e.g., KAZE, SIFT, BRISK, corner) within local blocks. MakeRegister1.m includes an alternative path using Mutual Information (MI) to guide the matching strategy.
- Finalization: The script maps all successful local feature matches to the global coordinate space and estimates a final geometric transformation (T_global), which is then applied to the moving image to create the registered TIFF file.
- Detailed instructions and descriptions of all helper functions are available in the README.md file within the RegistrationCode directory.
Authors
*Bing Shi
Hainan University
shibing@hainanu.edu
*Zhen-li Huang
Hainan University
huang2020@hainanu.edu
Human subjects data
The use of human samples in this study was approved by by the Biomedical Ethics Committee of Haikou People's Hospital (Approval No. 2025-(Ethics)-073). The Biomedical Ethics Committee waived the requirement for informed consent due to the de-identified retrospective nature of Data were de-identified.
