Stomata are microscopic pores on leaf surfaces that play a vital role in transpiration and gaseous exchange. The stomatal density and size directly influence photosynthesis and hydrodynamics capacity. Conventional approaches for counting and determining stomatal density is labour-intensive and lack scalability. Although there are several AI-based stomata finder tools that were published in the last decade, existing models were trained on model plants like wheat, barley and Arabidopsis. Stomata in such model plants are generally elliptical in shape, but applying a universal model to all plant species would be inappropriate due to their diverse morphological characteristics. Previous studies have suggested using the stomatal index to quantify the ratio between epidermal cells and total stomatal count. However, this approach can be difficult to apply consistently, as epidermal cell shape and size vary across plant species. Instead, we propose measuring stomatal density based on the number of stomata per total imaged pixel area in the captured images. In this study, a comparison between YOLOv12 and RF-DETR models were made for real-time stomata detection in normal and difficult-to-image and out-of-focus occluded images. The in-house training dataset consisted of images of 300 rice, 100 barley and 50 sugarcane leaves that were captured against a dark background. YOLOv12 outperformed RF-DETR with higher mAP50:95 score. The models were trained with image augmentation for 300 epochs and YOLOv12 achieved a peak mean average precision of 98.5% and excelled at detecting stomata across both monocot and dicot plants. StomaQuant has shown to be effective for both epidermal peel and ethanol decolouration samples. It can be used to estimate stomatal density and size.

Plant Cultivation and Sample Preparation

Rice (Oryza sativa) was grown for 2 weeks after transplantation to five leaf stage. Barley (Hordeum vulgare) seedlings were grown for a month. Sugarcane (Saccharum officinarum) saplings were grown for 3 months in a greenhouse facility at Lim Chu Kang, Singapore (103⁰70’49’’ E and 1⁰42’73’’ N). The leaves were excised with scissors and immersed in 70% ethanol. The ethanol-soaked leaves were then incubated in a 55^oC water bath overnight to decolourize and remove the chlorophyll pigments from the leaves. The decolourized leaves were cut into small squares and placed on a glass slide and cover slip for microscopic imaging. All such rice, barley and sugarcane leaf explants were then imaged using the Olympus BX53 microscope (Evident Scientific, Japan) under the 20 and 40 magnifications at 1392 1040 pixels resolution. Data acquisition was performed in Temasek Life Sciences Laboratory, Singapore. A collection of 450 images of abaxial and adaxial surfaces of barley, rice and sugarcane leaves were captured to train the YOLOv12 and RF-DETR models for stomatal detection.

YOLOv12 Model Training Parameters

The YOLOv12 model was trained using a two-step approach, binary mask segmentation and convolutional neural network. Segmentations were done manually in labelme by labelling the object as stomata, where binary mask in white (RGB: 255, 255, 255) denotes the stomata and black (RGB: 0, 0, 0) the background. A python script was written to convert the binary mask into polygons. Localisation of the polygons were transcribed into JSON format and (You Only Look Once) YOLO txt format. Subsequently, training was then performed on the pair of original images and corresponding annotation files using the YOLOv12s model. Images were randomly split into 80% training and 20% testing. Image augmentation was performed by scaling, flipping vertically or horizontally and rotating (Fig S1). The training parameters were set to AdamW optimizer learning rate of 0.002, momentum of 0.9 and max stride of 1216. Model training was performed on a high-performance computing cluster that has an AMD EPYC 7543P 32-core processor CPU for 300 epochs and batch size of 4. For downstream statistical analysis, unseen images that were not used for training were used for testing. Statistical analyses were performed in R using ggplot2 [21].

RF-DETR Model Training Parameters

YOLO annotations were converted into COCO format. The RF-DETR (Roboflow Detection Transformer) was trained using the same set of images that were used with YOLO for training, validation and testing with 80%-15%-5% ratio. Unlike YOLOv12, the RF-DETR tool does not have a built-in augmentation pipeline, hence augmentation was performed externally using a python script available at https://github.com/kjxlau/StomaQuant. Augmentation parameters include scaling, flipping and rotation. Model training was performed on a high-performance computing cluster that has an AMD EPYC 7543P 32-core processor CPU for 300 epochs and batch size of 4.

StomaQuant: Deep learning-based quantification for stomatal trait assessment

Data files

Abstract

Description of the data and file structure

Files and variables

File: test.zip

File: valid.zip

File: train.zip

File: YOLOv12_trained_weights.pt

File: RF_DETR_trained_weights.pth

File: RF_DETR_augmented_images.zip

File: YOLOv12_results.zip

File: arabidopsis.zip

File: barley.zip

File: YOLOv12_test_unseen_data.zip

File: RF_DETR_test_unseen_data.zip

File: sugarcane_analysis.zip

Code/software

Access information