Data from: Generalizable physical descriptors of pool boiling heat transfer from unsupervised learning of images
Data files
Oct 30, 2025 version files 19.16 GB
-
PoolBoilingDatasets.zip
19.16 GB
-
README.md
7.13 KB
Abstract
Boiling processes are notoriously difficult to analyze visually due to the complex interactions between vapor bubbles and the surface. To aid in the quantitative analysis of these phenomena, this repository provides the high-speed videos, manually annotated pool boiling images, and MATLAB analysis toolkit associated with the study "Generalizable physical descriptors of pool boiling heat transfer from unsupervised learning of images" (International Journal of Heat and Mass Transfer, 255 (2026) 127894). The dataset comprises experiments conducted with different working fluids (water and HFE-7100) and heater surfaces (plain and microstructured copper and silicon) to investigate the effect on bubble morphology. Conventional physical descriptors, such as bubble size, bubble count, and vapor area fraction, as well as the descriptors derived from Principal Component Analysis (PCA), were extracted from the abovementioned dataset. The results demonstrate strong positive correlations between the PCA-derived descriptors and the conventional parameters, confirming that dominant amplitude correlates with bubble size and vapor area fraction, while dominant frequency correlates with bubble count. The dataset and accompanying tools therefore provide a basis for applying and validating an unsupervised learning approach that can act as a robust surrogate for traditional, time-consuming manual labeling techniques.
Overview
This repository contains four high-speed video datasets of pool boiling phenomena and a suite of MATLAB scripts for quantitative image analysis. Detailed information about the experimental methods, datasets, and analysis methodologies can be found in our publication:
L. Zhang, et al., "Generalizable physical descriptors of pool boiling heat transfer from unsupervised learning of images," International Journal of Heat and Mass Transfer, 255 (2026) 127894.
The core of this work demonstrates that new descriptors extracted via Principal Component Analysis (PCA), namely the Dominant Amplitude ($D_a$) and Dominant Frequency ($D_f$), show strong positive correlations with conventional, manually measured parameters like bubble size and count. This unsupervised approach offers a robust and efficient alternative to traditional, labor-intensive image analysis.
Datasets Information
The experimental datasets feature two main boiling regimes: steady-state nucleate boiling at lower heat loads and the transient excursion to film boiling at the highest heat loads. High-speed videos were captured using Phantom high-speed cameras.
The four datasets were provided by the following research groups:
- Cu-H₂O Datasets (
PCu-H2O,FCu-H2O): Deionized water boiling on plain copper and copper foam surfaces. Provided by the Nano Energy and Data-Driven Discovery Laboratory, University of Arkansas (Prof. Han Hu). - Si-HFE Datasets (
PSi-HFE,SSi-HFE): HFE-7100 boiling on plain silicon and micro-structured silicon surfaces. Provided by the Cooling Technologies Research Center, Purdue University (Prof. Justin Weibel).
Image Analysis Toolkit
The MATLAB-based toolkit is designed to process image frames from the experimental videos to quantify bubble dynamics. It is divided into two primary categories:
- Conventional Descriptors: Tools for manually annotating bubble contours to calculate traditional statistics such as bubble count ($N_b$), average bubble area ($\overline{A_b}$), average bubble radius ($\overline{R_b}$), and vapor area fraction ($\mathrm{VAF}$).
- Generalizable Descriptors: Tools for performing Principal Component Analysis (PCA) on image frames to extract dominant temporal modes, specifically the Dominant Frequency ($D_f$) and Dominant Amplitude ($D_a$) of the vapor mass.
Directory Structure
./PoolBoilingDatasets/: The root directory../DataProcessTools/: Contains all MATLAB analysis scripts../Conventional_descriptors/: Scripts for manual bubble annotation and conventional statistical analysis../Generalizable_descriptors/: Scripts for Principal Component Analysis and dominant descriptors extraction.
/[Dataset Name]/(e.g.,./PCu-H2O/): These folders contain the downsized experimental data as.mp4videos, converted from the Phantom camera's native format using the Phantom Camera Control (PCC) software./[Dataset Name]/annotatedBubbles/: This folder contains datasets for calculating conventional physical descriptors, including megapixel-resolution images and a master JSON file that stores the bubble contour data.- Images: Named using the convention
[Dataset Name]_heatLoad_imageSequenceNum.jpg. - JSON File: Contains an object for each image, which in turn holds a
FileNameand aBubblesobject with coordinates for each labeled bubble (b0001,b0002, etc.).
- Images: Named using the convention
Tools and Scripts
Conventional Descriptors (/DataProcessTools/Conventional_descriptors/)
1. bubbles_annotator_v6.m
- Purpose: A MATLAB-based GUI for manually drawing and labeling bubble contours on image frames.
- How to Use:
- Run the script in MATLAB.
- You will be prompted to select a source folder containing the image frames to be annotated.
- A second prompt will ask you to select a destination folder for the output (an
annotated_resultssubfolder will be created automatically). - An interactive window will appear for each image with the following controls:
- Left-click: Place the vertices of a polygon.
- Backspace key: Undo the last point.
- Double-click: Finalize the current bubble's shape.
- A dialog box will then prompt for the next action (
Define more,Redefine last bubble, etc.).
- Output: Annotated images and a single
bubble_labeled_data.jsonfile containing all contour data.
2. bubbleStat_calculator_v4.m
- Purpose: Processes the JSON file from the annotator to calculate bubble statistics.
- How to Use:
- Edit the script to define the pixel resolution (
l_px) for the dataset. - Run the script and select the
.jsonfile and the corresponding image folder.
- Edit the script to define the pixel resolution (
- Output:
- A summary table is displayed in the MATLAB Command Window with statistics grouped by heat load.
- If
save_contour_Modeis set totrue, it saves annotated images to a newBubbleContourssubfolder.
Generalizable Descriptors (/DataProcessTools/Generalizable_descriptors/)
1. PC_calculator_v4.m
- Purpose: Performs PCA on image frames by dividing them into manageable subsets.
- How to Use:
- Edit the script to set the
MainData_DIRand theheatLoadsarray (e.g.,"10W","40W"). - Ensure image frames are named sequentially (e.g.,
00000.jpg). - Configure parameters like
totalImages,numSubsets, andnumPCs.
- Edit the script to set the
- Output: Creates a
./PC_Resultsdirectory with.csvfiles containing the PC scores for each image subset.
2. dominantDescriptor_calculator_v4.m
- Purpose: Analyzes the time series of a specific Principal Component (typically PC1) using an FFT to identify the Dominant Amplitude ($D_a$) and Dominant Frequency ($D_f$).
- How to Use:
- Important: You must run
PC_calculator_v4.mfirst. - Edit the script to set
MainData_DIR,heatLoads,frameRate, and which PC to analyze (pcToAnalyze).
- Important: You must run
- Output: Creates a
./Dominant_Descriptorsdirectory with a.csvfile for each heat load, summarizing the $D_a$ and $D_f$ values.
Recommended Workflow
- Data Preparation: Extract image frames (e.g., as
.jpg) from the high-speed videos and organize them into subfolders based on heat load (e.g.,./FCu-H2O/15W/). - Manual Annotation: Collect a representative subset of images for annotation. Run
bubbles_annotator_v6.mto draw contours and generate thebubble_labeled_data.jsonfile. - Calculate Conventional Statistics: Run
bubbleStat_calculator_v4.mon the generated JSON file to compute conventional bubble statistics. - PCA for Generalizable Descriptors:
- Part A: Configure and run
PC_calculator_v4.mto process the image sequences and generate PC score files. - Part B: Configure and run
dominantDescriptor_calculator_v4.mon the PC scores to calculate the $D_a$ and $D_f$ descriptors.
- Part A: Configure and run
Script Dependencies
- MATLAB: R2021a or newer is recommended.
- Image Processing Toolbox
- Statistics and Machine Learning Toolbox
This repository provides a comprehensive dataset and a MATLAB-based analysis toolkit designed for quantitative analysis of pool boiling phenomena.
Dataset Information
The root directory contains separate folders for each experimental dataset (PCu-H2O, FCu-H2O, PSi-HFE, and SSi-HFE). Each of these folders contains the experimental data as downsized high-speed videos (.mp4) captured at various heat loads. For quantitative analysis, a manually annotated subset of randomly selected high-speed image frames (.jpg) is provided in an annotatedBubbles subfolder, along with a structured JSON (.json) file containing the corresponding bubble contour coordinates. This JSON data is organized as a structure where each image entry contains a FileName and a Bubbles object, which in turn holds the x and y coordinates for each individually labeled contour (e.g., b0001, b0002, etc.).
Image Analysis Toolkit
The DataProcessTools folder, also located under the root directory, contains the MATLAB analysis toolkit, which is divided into two main parts:
- Conventional Descriptors: Scripts are provided for manual annotation of bubble contours (
bubbles_annotator_v6.m) and for the statistical calculation of traditional physical parameters (bubbleStat_calculator_v4.m). These parameters include bubble count (Nb), average bubble area (Ab), average bubble radius (Rb), and vapor area fraction (VAF). - Generalizable Descriptors: This is a two-part workflow for unsupervised feature extraction. The first script,
PC_calculator_v4.m, processes image sequences to perform dimensionality reduction and calculate the time series of the principal components (PCs). The second script,dominantDescriptor_calculator_v4.m, then applies a Fast Fourier Transform (FFT) to the time series of the principal component to determine the Dominant Frequency (Df) and Dominant Amplitude (Da) of the boiling process.
