Skip to main content
Dryad

Statistical filtering to aid in the classification of phytoplankton: The effects of image library size and phytoplankton shape

Data files

Mar 19, 2026 version files 24.25 KB

Click names to download individual files

Abstract

The demand for image classification methods has increased due to technological advancements that enable more intensive phytoplankton monitoring. Regardless of whether the methods are based on statistical or machine learning algorithms, algal taxa may be misidentified in taxonomically diverse samples, in which phytoplankton morphology and image traits can be variable. We evaluated the statistical filtering performance for two approaches to image library development, which we applied independently to seven commonly occurring algal shapes. To do so, we used the statistical filter in the image processing software of an imaging flow cytometer (FlowCAM) and previously classified samples. One statistical filtering approach used a small selection of images (5-15 images of a target taxon) from the same sample being filtered (i.e., intrinsic), and the other used a larger selection of images (30-80 images of a target taxon) compiled from different samples. Filter accuracy, precision, and recall varied with the type of image library, image library size, and target taxon. The largest image libraries offered high recall (> 90%) but low accuracy and precision with both image library building approaches. For the largest image libraries, accuracy and precision were higher for the intrinsic method (>90% and 72-97%) than the compiled method (>40% and 10-20% for most taxa, respectively). Statistical filtering performance was higher for larger, solitary-celled taxa with relatively uniform features (e.g., Gyrosigma) compared to small-celled colonial species with more complex or variable shapes (e.g., mucilaginous colonial cyanobacteria, and Scenedesmus). Results indicate that statistical filtering can be used to augment manual sample classification.