Dataset for worker activity recognition and efficiency estimation during manual harvesting
Data files
Dec 17, 2025 version files 7.63 GB
-
README.md
5.74 KB
-
SantaMaria_2024.zip
6.20 GB
-
SantaMaria_CNNLSTM.zip
1.43 GB
Abstract
This dataset contains harvest data collected during manual strawberry harvesting with instrumented picking carts in Santa Maria, CA, USA, in 2024. The data includes geo-tagged harvest mass, cart location, and motion recorded by a GPS receiver, an Inertial Measurement Unit (IMU), and load cells. Each data point is annotated as either "Pick" (indicating active picking) or "NoPick" (indicating no active picking). This dataset can be used to train, validate, and test AI algorithms to recognize worker activity during manual fruit harvesting and quantify worker efficiency. It is valuable for researchers and practitioners in precision agriculture and agricultural automation who are working on optimizing labor and field management, as well as developing strawberry harvesting machines or harvest assist systems.
Dataset DOI: 10.5061/dryad.cvdncjtfm
Description of the data and file structure
The main goal of the study was to develop a practical and low-cost yield monitoring system for manually harvested strawberries to encourage widespread adoption. A key decision in the system design was to utilize an affordable GPS that relies on the freely and widely available Satellite Based Augmentation System (SBAS) corrections. The pickers were not trained prior to using the iCarritos and were deployed in a commercial field with minimal disruption to established harvesting practices.
Data was collected as part of the USDA Agricultural Research Service (USDA-ARS) Areawide Pest Management Grant Program project "Site-Specific Soil Pest Management in Strawberry and Vegetable Cropping Systems Using Crop Rotation and a Needs-Based Variable Rate Fumigation Strategy” through Non-Assistance Cooperative Agreements.
The publicly released dataset consists of two main folders, SantaMaria_CNNLSTM and SantaMaria_2024, along with corresponding Excel files that record harvested trays. The SantaMaria_CNNLSTM folder was used to train and evaluate the CNN–LSTM model, as well as for picker efficiency and tray fill time computation and evaluation. The SantaMaria_2024 folder was used to compute picker efficiency and tray fill time over the entire harvest season. The following section provides a detailed description of each component.
-
SantaMaria_CNNLSTM.zip:
Thecart_datasubfolder within this directory contains the annotated harvest data used for CNN–LSTM model training and evaluation. It includes multiple CSV files, each corresponding to a specific harvest day, following the naming conventionmm-dd-yy_train-ready_all_carts.csv(e.g.,4-10-24_train-ready_all_carts.csv). Each CSV file contains the following columns:date_cartID: Combined harvest date and iCarrito identifier.GPS_TOW: GPS Time of Week (milliseconds).easting,northing: UTM coordinates (metres).ax,ay,az: Acceleration along the x, y, and z axes (m s⁻²).raw_mass: Raw harvest mass (kg).activity: Annotated activity label, either “Pick” or “NoPick”.
The
break_logsubfolder within this directory contains daily break information for each iCarrito. Files follow the naming formatmm-dd-yy_break_log.csvand include:harvest_date: Harvest date.cart_id: iCarrito identifier.no_breaks: Number of breaks taken by the corresponding iCarrito.
The accompanying Excel file,
harvested_trays.xlsx, records the number of trays harvested per iCarrito for each harvest day, with following columns:cart_id: iCarrito identifier.no_trays: Number of trays harvested by that iCarrito.
-
SantaMaria_2024.zip:
This folder contains the complete season-long harvest data from all iCarritos, including break logs and harvested tray records. The subfolders are organised by date (e.g.,DD-MM-YY/) and contain raw harvest (cart1.csv,cart2.csv, …,cartN.csv) corresponding to individual carts. Each CSV file contains ten columns:rpi_utc_time: Raspberry Pi UTC timestamp.gps_utc_time: GPS UTC timestamp.GPS_TOW: GPS Time of Week (milliseconds).LAT,LON,HEIGHT: Geographic coordinates (degrees and metres).ax,ay,az: Acceleration along the x, y, and z axes (m s⁻²).raw_mass: Raw harvest mass (kg).
The
break_logsubfolder and the Excel fileharvested_trays_seasonlong.xlsxfollow the same structure and naming conventions as described above, containing daily break records and total harvested trays for all carts across the season.
The dataset provides comprehensive, annotated time-series data suitable for training and evaluating activity recognition models, as well as for further research in harvest process optimisation and labour efficiency analysis.
Dataset directory structure
datasets/
├── SantaMaria_CNNLSTM/
│ ├── cart_data/
│ │ ├── MM-DD-YY_train-ready_all_carts.csv
│ │ ├── ...
│ │ └── MM-DD-YY_train-ready_all_carts.csv
│ ├── break_log/
│ │ ├── MM-DD-YY_break_log.csv
│ │ ├── ...
│ │ └── MM-DD-YY_break_log.csv
│ └── harvested_trays.xlsx
│
├── SantaMaria_2024/
│ ├── DD-MM-YY/
│ │ ├── cart1.csv
│ │ ├── cart2.csv
│ │ ├── ...
│ │ └── cartN.csv
│ ├── break_log/
│ │ ├── MM-DD-YY_break_log.csv
│ │ ├── ...
│ │ └── MM-DD-YY_break_log.csv
│ └── harvested_trays_seasonlong.xlsx
Code/software
https://github.com/uddhavbhattarai/HumanActivityRecognitionEfficiencyEstimation
Human subjects data
While this dataset is derived from field operations involving human workers, it does not contain any personally identifiable information (PII). The dataset consists solely of sensor data collected from instrumented strawberry-picking carts (iCarritos), including GPS coordinates, timestamps, IMU measurements, and load-cell readings. No personal attributes, names, worker identifiers, or demographic information were collected. Carts were used interchangeably by multiple pickers during commercial harvesting operations, and no individual was permanently or uniquely associated with any specific cart. As a result, there is no linkage between the recorded data and any identifiable person.
Data was collected from instrumented picking carts, iCarrtios, in a commercial strawberry field in Santa Maria, California, growing Fronteras variety. Plants were cultivated in raised beds with a width of 110 cm, each containing four parallel strawberry rows.
iCarritos were developed by instrumenting a wire frame structure picking carts (aka carritos) with a wheelbarrow system. Carts were equipped with a SwiftNav Piksi GNSS unit with an integrated inertial measurement unit (IMU) to record the geospatial position of harvest and cart motion. The GNSS unit had a horizontal Circular Error Probable (CEP) accuracy of 0.75 meters. Two load cells were installed in front and rear of the cart to measure harvest mass. Recorded harvest mass was obtained by averaging the front and rear mass readings.
A Raspberry Pi 0W microcomputer was used as the central processing unit, and an SD card was used to run the carrito software and store the data during the harvest at 10 HZ.
