Dataset for precision yield estimation and mapping in manual strawberry harvesting
Data files
Feb 03, 2026 version files 4.84 GB
-
datasets.zip
4.84 GB
-
README.md
4.90 KB
Abstract
This dataset contains harvest data collected during manual strawberry harvesting using instrumented picking carts (iCarritos) in two California fields over a harvest season. The data includes geo-tagged harvest mass and cart motion information recorded by load cells, a GPS receiver, and Inertial Measurement Units (IMU). The dataset enables data-driven yield estimation and mapping for manually harvested strawberries. Furthermore, the dataset could be used to optimize robotic strawberry harvest machines or harvest assist systems, such as crop transportation systems. This dataset is valuable for researchers and practitioners in precision agriculture, offering a foundation for developing scalable automation and precision agriculture technology in specialty crop harvesting.
Dataset DOI: 10.5061/dryad.v6wwpzh7h
Description of the data and file structure
Background
The main goal of the study was to develop a practical, low-cost yield monitoring system for manually harvested strawberries to encourage widespread adoption. A key design decision was to use an affordable GPS relying on the freely available Satellite Based Augmentation System (SBAS) corrections. Pickers were not trained prior to using the iCarritos, and the system was deployed in a commercial field with minimal disruption to established harvesting practices. Data were collected as part of the USDA Agricultural Research Service (USDA-ARS) Areawide Pest Management Grant Program project, “Site-Specific Soil Pest Management in Strawberry and Vegetable Cropping Systems Using Crop Rotation and a Needs-Based Variable Rate Fumigation Strategy,” through Non-Assistance Cooperative Agreements.
Value of the Data
- This is a comprehensive dataset for developing and validating precision yield estimation and mapping techniques for manually harvested crops, especially strawberries.
- The dataset enables the data-driven analysis of spatial and temporal yield variability in strawberry fields, which can inform decision-making in field management, including nutrient application, irrigation, and pest control.
- Researchers in precision agriculture can reuse this data to develop new algorithms for worker activity recognition, efficiency estimation, yield forecasting, and crop management strategies.
- Agricultural engineers and data scientists can utilize this dataset to improve existing data processing pipelines or develop new ones for handling sensor data from manual harvesting operations.
- The data is gathered from scalable, practical, and low-cost picking carts, which could provide practical solutions for yield estimation and mapping for manually harvested crops, potentially benefiting both researchers and industry practitioners.
Files and variables
File: datasets.zip
Description: The publicly released dataset consists of three main folders: BedCenters, SantaMaria, and Salinas. The BedCenters folder contains two files, Salinas2024.txt and SantaMaria2024.txt, each providing the coordinates of the bed center endpoints. The SantaMaria and Salinas folders contain harvest data from all carts collected throughout the harvest season. The following section provides a detailed description of each component.
-
SantaMaria, Salinasfolders:
These folders contain complete season-long harvest data from all iCarritos, along with harvested tray records. The subfolders are organised by date (e.g.,DD-MM-YY/) and contain raw harvest files (cart1.csv,cart2.csv, …,cartN.csv) corresponding to individual carts. Each CSV file contains following columns:GPS_TOW: GPS Time of Week (milliseconds).LAT,LON,HEIGHT: Geographic coordinates (degrees and metres).ax,ay,az: Acceleration along the x, y, and z axes (m s⁻²).raw_mass: Raw harvest mass (kg).
The accompanying Excel file,
harvested_trays.xlsx, records the number of trays harvested per iCarrito for each harvest day, with the following columns:cart_id: iCarrito identifier.no_trays: Number of trays harvested by that iCarrito.
Dataset directory structure
datasets/
├── BedCenters/
│ ├── Salinas2024.txt
│ └── SantaMaria2024.txt
├── SantaMaria/
│ ├── DD-MM-YY/
│ │ ├── cart1.csv
│ │ ├── cart2.csv
│ │ ├── ...
│ │ └── cartN.csv
│ └── harvested_trays.xlsx
└── Salinas/
├── DD-MM-YY/
│ ├── cart1.csv
│ ├── cart2.csv
│ ├── ...
│ └── cartN.csv
└── harvested_trays.xlsx
Code/software
https://github.com/uddhavbhattarai/iCarritoYieldEstimationandMapping.git
Human subjects data
While this dataset was derived from field operations involving human workers, it does not contain any personally identifiable information (PII). The dataset consists solely of sensor data collected from instrumented strawberry-picking carts (iCarritos), including GPS coordinates, timestamps, IMU measurements, and load-cell readings. No personal attributes, names, worker identifiers, or demographic information were collected. Carts were used interchangeably by multiple pickers during commercial harvesting operations, and no individual was permanently or uniquely associated with any specific cart. As a result, there is no linkage between the recorded data and any identifiable person.
Data was collected from iCarritos developed by modifying a conventional wireframe structure with a wheelbarrow system, integrated with sensors, mounting systems, and control hardware. Two load cells were installed at the front and back of each iCarrito, and the harvest mass was obtained by averaging the readings from these load cells. A significant design choice was to ensure that the iCarritos were low-cost and practical. To achieve this, the carts utilized an affordable Piksi Multi GNSS unit (Swift Navigation Inc., USA) with an integrated IMU. The GNSS unit operated based on SBAS localization and has a rated horizontal Circular Error Probable (CEP) accuracy of 0.75 meters. This unit was used to measure location (latitude and longitude) as well as acceleration data along the x, y, and z axes (ax, ay, az).
A Raspberry Pi Zero W (Raspberry Pi Foundation, UK) microcomputer served as the central processing unit, with an SD card used to run the cart software and store data during harvest operations. The data was stored at the rate of 10 Hz. The control system consisted one main power toggle switch connected directly to the battery, a control switch for selecting operation modes, and two status LEDs for system feedback.
Before taking the iCarritos into the field, each cart's load cells were calibrated. Following each harvest, the collected data was reviewed and the load cells were recalibrated if necessary. Commercial strawberry pickers, who were experienced in manual harvesting but not specifically trained on the iCarritos, utilized these instrumented carts during normal harvesting operations. The pickers were compensated for their time using the iCarritos in the trial field. They were instructed to use the iCarritos as they would their regular carts, with minimal changes to their normal harvesting routine.
Studies were conducted in two major strawberry-producing regions in California during the 2024 strawberry growing season. The selected fields were commercial strawberry fields in Santa Maria, growing the Fronteras variety, and in Salinas, CA, growing the Cabrillo variety. The width of the raised beds in Santa Maria were 110 cm wide while those in Salinas were 75 cm wide. The raised bed centers were also pre-mapped using RTK-GNSS, providing centimeter-level accuracy under field conditions.
