Data from: Advancing mold identification in the routine laboratory: Performance of smartphone-based imaging and a newly developed Convolutional Neural Network

Weber, Lukas 1 ; Brüningk, Sarah2 ; Schulthess, Bettina1 ; Stillhart, Gloria1; Bressan, Michelle1 ; Pulver-Vontobel, Nadja1; Schimetzki, Jasmin1; Puthan, Rosmi1; Egli-Berini, Andrea3; Nolte, Oliver1; Egli, Adrian 1

Published Nov 28, 2025 on Dryad. https://doi.org/10.5061/dryad.cjsxksnj4

Data files

Nov 28, 2025 version files 7.07 GB

Abstract

Background: Mold identification in clinical diagnostics is traditionally labor intensive and is dependent on expert interpretation. MoldVision is a deep learning approach that uses smartphone images ofmold cultures to automate identification.

Methods: We analyzed 161 clinical isolates across four common mold genera. Penicillium spp., Aspergillus spp. (with A. flavus and A. fumigatus), Fusarium spp., and Cladosporium spp. Daily images were captured from the top and bottom of culture plates over five days using a standardized smartphone setup, generating over 4,000 images. We trained three variations of VGG16 convolutional neural networks (CNN) and benchmarked the best-performing model (VGG16 with dual classification heads) against LightGBM models trained on pre-extracted features and human expert assessments at various time points.

Results: The best performing VGG16 model achieved a mean (SD) ROC-AUC of 92.7% ± 1.8% and sensitivity of 68.7% ±2.6% across all species. Here, the performance in identifying Cladosporium spp. was best (ROC-AUC 99.9% ± 0.1%, 5-fold cross-validation mean and SD ). Regarding the evaluations over time, early stage classification (days 1-2) was challenging (F1-score 38.8% ± 3.5% across all species but improved significantly on day 3-5 (F1 92.1% ± 2.8% across all species). Compared to experts, MoldVision consistently showed superior performance, particularly in mature cultures, detecting subtle morphological features earlier and more accurately.

Conclusions: Our results demonstrate that CNNs integrated with low-cost smartphone imaging can reliably classify mold species in routine diagnostics, outperforming human experts in many cases. This approach offers a practical and scalable solution for laboratories lacking specialized mycology expertise, especially in resource-limited settings.

Dataset DOI: 10.5061/dryad.cjsxksnj4

Description of the data and file structure

This dataset contains image files collected from Penicillium spp., Aspergillus spp. (with Aspergillus flavus and Aspergillus fumigatus), Fusarium spp., and Cladosporium spp. isolates over a 5 day period from the top and bottom of a Sabouraud Agar Plate. Each file is named according to a standardized convention to facilitate identification and reuse.

Files and variables

File: moldvision_database.csv

Description: A CSV file for ease of access for Machine Learning applications, as well as providing a database for the image files

Variables

filename
class: species of fungal mold
top/bottom: orientation of the agar plate in the picture ("top" or "bottom")
ID: three‐digit subject identifier (from 1 upwards, e.g., 024, 003)
day: day of growth after incubation
rotation_index: separator index to indicate image rotation (from 1 upwards)
time_stratification: special string to stratify into early (Day 1-2) and late images (Day 3-5)

File: moldvision_database.zip

Description: a zip file containing all image files

Filenames follow the pattern:

class_top-bottom_ID_day<d>_rot<r>.jpg

Image format: All files are compressed JPEG (.jpg) images captured at 3000×3000 pixels, 24‐bit depth.

Each image depicts a single sample at a specific orientation, on a given day after incubation, with a random rotation angle (implied by rot index).

Metadata and Attributes

No separate metadata file is provided; all relevant identifiers are embedded in the filename.