StomaQuant: Deep learning-based quantification for stomatal trait assessment
Data files
Jan 24, 2026 version files 1.21 GB
-
arabidopsis.zip
12.58 MB
-
barley.zip
147.82 MB
-
README.md
8.08 KB
-
RF_DETR_augmented_images.zip
156.73 MB
-
RF_DETR_test_unseen_data.zip
134.82 MB
-
RF_DETR_trained_weights.pth
382.13 MB
-
sugarcane_analysis.zip
133.73 MB
-
test.zip
3.99 MB
-
train.zip
68.75 MB
-
valid.zip
12.97 MB
-
YOLOv12_results.zip
4.53 MB
-
YOLOv12_test_unseen_data.zip
133.31 MB
-
YOLOv12_trained_weights.pt
18.76 MB
Abstract
Stomata are microscopic pores on leaf surfaces that play a vital role in transpiration and gaseous exchange. The stomatal density and size directly influence photosynthesis and hydrodynamics capacity. Conventional approaches for counting and determining stomatal density is labour-intensive and lack scalability. Although there are several AI-based stomata finder tools that were published in the last decade, existing models were trained on model plants like wheat, barley and Arabidopsis. Stomata in such model plants are generally elliptical in shape, but applying a universal model to all plant species would be inappropriate due to their diverse morphological characteristics. Previous studies have suggested using the stomatal index to quantify the ratio between epidermal cells and total stomatal count. However, this approach can be difficult to apply consistently, as epidermal cell shape and size vary across plant species. Instead, we propose measuring stomatal density based on the number of stomata per total imaged pixel area in the captured images. In this study, a comparison between YOLOv12 and RF-DETR models were made for real-time stomata detection in normal and difficult-to-image and out-of-focus occluded images. The in-house training dataset consisted of images of 300 rice, 100 barley and 50 sugarcane leaves that were captured against a dark background. YOLOv12 outperformed RF-DETR with higher mAP50:95 score. The models were trained with image augmentation for 300 epochs and YOLOv12 achieved a peak mean average precision of 98.5% and excelled at detecting stomata across both monocot and dicot plants. StomaQuant has shown to be effective for both epidermal peel and ethanol decolouration samples. It can be used to estimate stomatal density and size.
Dataset DOI: 10.5061/dryad.3j9kd51zr
Description of the data and file structure
This document provides an overview of the data organization, file contents, and directory structure used for the leaf decolorization and plant epidermal peel detection. The dataset focuses on the identification and analysis of stomata structures across different plant species (Barley, Rice and Sugarcane) using YOLOv12 (You Only Look Once) and RF-DETR (RoboFlow DEtector TRansformer) architectures.
The dataset is partitioned into three subsets to ensure robust model training and unbiased evaluation. The accompany "_annotation.coco.json" file contains the coordinates information of the stomata labels in the images. The split ratio follows standard machine learning protocols:
- Training Set (80%): train.zip – Used for model parameter optimization.
- Validation Set (15%): valid.zip – Used for hyperparameter tuning and preventing overfitting during the training phase.
- Testing Set (5%): test.zip – Reserved for final performance evaluation on known data distributions.
Files and variables
Original microscope images applied in training the computer vision models.
File: test.zip
Description: 5% of images used for testing
File: valid.zip
Description: 15% of images used for validation
File: train.zip
Description: 80% of images used for training
The finalized deep-learning model checkpoint files are found in YOLOv12_trained_weights.pt and RF_DETR_trained_weights.pth.
- Checkpoints:
- YOLOv12_trained_weights.pt: PyTorch weights for the YOLOv12 model after a 300-epoch training cycle.
- RF_DETR_trained_weights.pth: PyTorch weights for the RF-DETR model after a 300-epoch training cycle.
- PT and PTH checkpoint files can be opened for image inference using a jupyter notebook at https://github.com/kjxlau/StomaQuant/blob/main/Visualize_PTH_files.ipynb
File: YOLOv12_trained_weights.pt
Description: Checkpoint file of the YOLOv12 trained weights after 300 epochs used for model inference
Note on Weight File Structure:
The YOLOv12_trained_weights.pt file is a compressed PyTorch archive. While it may appear as a folder if opened with tools like 7-Zip (containing a data/ subfolder and data.pkl file), these are internal components of the model's architecture.
How to use: Do not extract or modify these internal files. To use the model, load the .pt file directly using the Ultralytics or PyTorch library.
File: RF_DETR_trained_weights.pth
Description: Checkpoint file of the RF-DETR trained weights after 300 epochs used for model inference.
Both the .pt (YOLO) and .pth (RF-DETR) files provided are PyTorch-serialized ZIP archives. They contain a data/ directory of raw tensors and a data.pkl map. These files are intended for use with the torch.load() or model-specific libraries and should not be manually unzipped or modified.
- Image Augmentation
- RF_DETR_augmented_images.zip contain images that are horizontal and vertical flipping and rotation to train the model to be more robust.
File: RF_DETR_augmented_images.zip
Description: Folder contains augmented images based on the original dataset. Augmentation was performed using the following parameters, Horizontal Flip (p=0.5), Vertical Flip (p=0.5), Rotate (limit=90, p=1.0)
File: YOLOv12_results.zip
Description: Folder includes training results and analysis plots such as confusion_matrix, F1_curve, P_curve, PR_curve and R_curve. Results are summarized in results.csv.
The models were tested on unseen data. Images that are not present in the initial train/val/test split.
- args.yaml: This file is a record of hyperparameter used for the run (learning rate, image size, number of epochs, augmentation settings).
- results.csv: A spreadsheet containing the raw numbers for every epoch. It tracks training/validation losses and accuracy metrics (mAP).
- results.png: A plot of the data from the CSV into line chart.
- confusion_matrix.png: A grid showing which classes the model identifies correctly and where the model misclassified.
- confusion_matrix_normalized.png: A grid showing which classes the model identifies correctly and where the model misclassified but shown in percentages (0.0 to 1.0).
- F1_curve.png: Shows the F1-score.
- PR_curve.png: The Precision-Recall curve.
- P_curve.png (Precision): Shows how many of the model's predictions were correct at different confidence levels.
- R_curve.png (Recall): Shows how many of the total real objects the model found at different confidence levels.
- labels.jpg: A 4-panel chart showing your dataset distribution. It tells you:
- Number of instances for each class.
- The typical shape of the bounding boxes.
- Where the boxes are usually located in the images center vs. edges.
- labels_correlogram.jpg: A statistical plot showing the correlation between box coordinates and sizes.
- train_batch0.jpg, train_batch1.jpg: These show the user exactly what the model saw during training. Distorted, blurry, or mosaic-style images show that Data Augmentation was applied on the images.
- train_batch40310.jpg: In each iteration, model only seen 4 images as batch size of 4 was used. This is batch no 40310 iteration.
- val_batch0_labels.jpg, val_batch1_labels.jpg, val_batch2_labels.jpg: These images show the Ground Truth, the labels that was manually drawn.
- val_batch0_pred.jpg, val_batch1_pred.jpg, val_batch2_pred.jpg: These show the Model's Predictions on the same set of images that was manually labelled. If the boxes in the pred image match the labels image, the model is working successfully.
File: arabidopsis.zip
Description: Folder contains Arabidopsis epidermal peel images.
File: barley.zip
Description: Folder contains Barley epidermal peel images.
- Annotated Results: YOLOv12_test_unseen_data.zip and RF_DETR_test_unseen_data.zip provide visual proof of model accuracy on both Arabidopsis and Barley samples.
File: YOLOv12_test_unseen_data.zip
Description: Folder contains YOLOv12-annotated images of both Arabidopsis and Barley samples that are not part of the training dataset.
File: RF_DETR_test_unseen_data.zip
Description: Folder contains RF-DETR-annotated images of both Arabidopsis and Barley samples that are not part of the training dataset.
- Sugarcane Analysis: sugarcane_analysis.zip contains summary statistics (Excel and CSV) comparing stomatal characteristics across different leaf surfaces and varieties.
File: sugarcane_analysis.zip
Description: Folder contains sugarcane top (adaxial) and bot (abaxial) leaf images from three varieties namely, Var1, Var2 and Var3. Summary analyses were performed in excel and csv files.
Code/software
Microsoft Excel or notepad can be used to open and view .xlsx and .csv format files.
PT and PTH checkpoint files can be visualized at https://netron.app/
PT and PTH checkpoint files can be opened for image inference using a jupyter notebook at https://github.com/kjxlau/StomaQuant/blob/main/Visualize_PTH_files.ipynb
All python scripts used for computer vision model training are available at https://github.com/kjxlau/StomaQuant
Access information
Other publicly accessible locations of the data:
- StomaQuant is hosted at https://huggingface.co/spaces/kennylau91/stoma.
Data was derived from the following sources:
- Arabidopsis and Barley Epidermis Peel Images (Unseen Data) were originally obtained from https://github.com/XDynames/SAI-app
Plant Cultivation and Sample Preparation
Rice (Oryza sativa) was grown for 2 weeks after transplantation to five leaf stage. Barley (Hordeum vulgare) seedlings were grown for a month. Sugarcane (Saccharum officinarum) saplings were grown for 3 months in a greenhouse facility at Lim Chu Kang, Singapore (103⁰70’49’’ E and 1⁰42’73’’ N). The leaves were excised with scissors and immersed in 70% ethanol. The ethanol-soaked leaves were then incubated in a 55oC water bath overnight to decolourize and remove the chlorophyll pigments from the leaves. The decolourized leaves were cut into small squares and placed on a glass slide and cover slip for microscopic imaging. All such rice, barley and sugarcane leaf explants were then imaged using the Olympus BX53 microscope (Evident Scientific, Japan) under the 20 and 40 magnifications at 1392 1040 pixels resolution. Data acquisition was performed in Temasek Life Sciences Laboratory, Singapore. A collection of 450 images of abaxial and adaxial surfaces of barley, rice and sugarcane leaves were captured to train the YOLOv12 and RF-DETR models for stomatal detection.
YOLOv12 Model Training Parameters
The YOLOv12 model was trained using a two-step approach, binary mask segmentation and convolutional neural network. Segmentations were done manually in labelme by labelling the object as stomata, where binary mask in white (RGB: 255, 255, 255) denotes the stomata and black (RGB: 0, 0, 0) the background. A python script was written to convert the binary mask into polygons. Localisation of the polygons were transcribed into JSON format and (You Only Look Once) YOLO txt format. Subsequently, training was then performed on the pair of original images and corresponding annotation files using the YOLOv12s model. Images were randomly split into 80% training and 20% testing. Image augmentation was performed by scaling, flipping vertically or horizontally and rotating (Fig S1). The training parameters were set to AdamW optimizer learning rate of 0.002, momentum of 0.9 and max stride of 1216. Model training was performed on a high-performance computing cluster that has an AMD EPYC 7543P 32-core processor CPU for 300 epochs and batch size of 4. For downstream statistical analysis, unseen images that were not used for training were used for testing. Statistical analyses were performed in R using ggplot2 [21].
RF-DETR Model Training Parameters
YOLO annotations were converted into COCO format. The RF-DETR (Roboflow Detection Transformer) was trained using the same set of images that were used with YOLO for training, validation and testing with 80%-15%-5% ratio. Unlike YOLOv12, the RF-DETR tool does not have a built-in augmentation pipeline, hence augmentation was performed externally using a python script available at https://github.com/kjxlau/StomaQuant. Augmentation parameters include scaling, flipping and rotation. Model training was performed on a high-performance computing cluster that has an AMD EPYC 7543P 32-core processor CPU for 300 epochs and batch size of 4.
