A comprehensive dataset for word-wheel water meter reading under challenging conditions

Zhao, Shunyi1 ; Gao, Yibo 1 ; Liu, Fei1; Zhang, Yuxuan2; Li, Jonghui3

Published Jan 30, 2026 on Dryad. https://doi.org/10.5061/dryad.7d7wm3860

Data files

Jan 30, 2026 version files 2.62 GB

datacontent.png

1.29 MB
README.md

3.40 KB
Word-Wheel_Water_Meter_Dataset.zip

2.61 GB

Abstract

We present a comprehensive dataset designed for segmentation, recognition, and classification tasks related to word-wheel type water meter reading. This dataset encompasses a wide range of real-world scenarios, including clear, blurry, reflective, and obstructed images, captured under various environmental conditions. As a result, it provides a robust benchmark for model training and evaluation. It contains over 50,000 water meter images, annotated with segmentation masks, recognition labels, and multi-hot encoded classification labels. These annotations facilitate the training of models for segmentation, recognition, and multi-task classification, enabling them to address various challenges. Technical validation highlights the effectiveness and utility of the dataset in segmentation and recognition tasks across various challenge scenarios.

https://doi.org/10.5061/dryad.7d7wm3860

Description of the data and file structure

Files and variables

File: Word-Wheel_Water_Meter_Dataset.zip

Description: After downloading and unzipping the dataset, the root directory contains two primary subdirectories: one for detection and another for recognition. The detection directory is intended for training the model to detect and segment the reading area in water meter images, containing train set and test set. The directory structure of the train and test set is the same, and both include the following three files:

det_img: This subdirectory contains images to be segmented. The train set contains 32,714 images, which are named sequentially from train0.png to train32713.png, while test set contains 13,398 images named from test0.png to test13397.png. All images are in PNG format and have a resolution of 512×512 pixels.
seg_label: This subdirectory contains the segmentation labels. Each label file has the same name as its corresponding image in the det_img directory. The labels are also in PNG format and are the same size as the images.
class_label.csv: This CSV file contains scenario classification labels for each image. The first column contains the file name corresponding to image in the det_img directory as index. Columns two through seven contain multi-hot encoded binary strings representing the scenarios of this image.

The recognition directory is used to train the digit recognition model. The images are categorized into two distinct subdirectories, each corresponding to either 5-digit or 6-digit readings. Within these subdirectories, there exists a training set and a test set, both of which encompass the following files:

rec_img: This subdirectory contains local images of water meter readings. The 5-digit dataset contains 11,867 training images (train0.jpg to train11866.jpg) and 2,000 test images (test0.jpg to test1999.jpg). The 6-digit dataset contains 2,555 training images (train0.jpg to train2554.jpg) and 400 test images (test0.jpg to test399.jpg). All images are in JPG format with a resolution of 60×200 pixels.
rec_label.csv: This CSV file includes the labels used during the recognition step. The first column contains the file name corresponding to image in the rec_img directory. The second column contains the 5-digit or 6-digit recognition label of this image. Columns three through five contain classification labels of this image.

File: datacontent.png

This image shows representative examples for every class in the dataset.

Code/software

The code used in this paper is available at https://github.com/100000000Bo/water_meter_dataset_technical_validation_python_project.git. We provide code for detection and recognition tasks, including models for segmentation using UNet, PSPNet, DeepLabV3+, SegFormer, and models for recognition using CNNs, VGG16, ResNet, and DenseNet, along with training and testing scripts. The Python environment required is included in the requirements.txt file. CSV label testing codes read_csv.py are also provided in classification project to select specific CSV labels from the dataset for model evaluation.