Data from: Investigating human repeatability of a computer vision based task to identify meristems on a potato plant (Solanum tuberosum)
Data files
Jan 08, 2022 version files 574.18 KB
Abstract
Labelled training data in artificial intelligence (AI) is used to teach so-called 'supervised learning models'. However, such data may contain error or bias, which can impact model prediction accuracy. Thus, obtaining accurate training data is of high importance. In applications of AI, such as in classification and detection problems, raw training data is not always made available in published research. Likewise, the process of obtaining labelled data is not always documented well enough to enable reproducibility. This training data set captures a repeatability exercise in AI training data collection for a task that is difficult for humans to perform, delineating a bounding box in a two-dimensional image of a growing apical meristem in potato plants.
Methods
Labelled image acquisition for repeatability was carried out by multiple observers, each identifying bounding boxes of the apical meristems on potato plants from images. Additionally, repeatability of bounding box identification was assessed by two separate methods, 'live labelling' (an expert was present indicating the centre of each meristem) and 'computer labelling' (the observer identified the bounding boxes without an expert supervising). Labelling was performed on n=10 unique images, a total of three times each (thus obtaining n = 30 bounding box sets per observer). In this experiment, ten observers completed the computer labelling task, and 3 observers also completed the live labelling task. Bounding box coordinates were captured via a graphical user interface program, adapted from the popular program Yolo_mark (https://github.com/weharris/yolo_mark_utility).
Usage notes
Contents
•“SURNAME-stem-repeatability.zip” file contains folder with 30 images (10 unique) and bounding box data capture program “desktop_image_labeller.py”
•“Tuber-stem-repeatability-instructions.docx”- instructions for observers taking part in the computer experiment
•“further-information-distance-between-centres.docx” - details on the distance between centers measurement
•“boxes.xlsx” - dataset explaining the information for each individual bounding box
•“c_dist.xlsx” - dataset for the distance between the centers
•“stems.xlsx” - dataset providing information on the number of stems identified per image
• “repeatability-images-cheat-sheet.xlsx” – dataset providing a key to the unique images that have been replicated three times
•“DRYAD-README.docx” - Use notes for repeatability data capture and associated documents