Data from: Advancing mold identification in the routine laboratory: Performance of smartphone-based imaging and a newly developed Convolutional Neural Network
Data files
Nov 28, 2025 version files 7.07 GB
-
moldvision_database.csv
233.75 KB
-
moldvision_database.zip
7.07 GB
-
README.md
1.81 KB
Abstract
Background: Mold identification in clinical diagnostics is traditionally labor intensive and is dependent on expert interpretation. MoldVision is a deep learning approach that uses smartphone images ofmold cultures to automate identification.
Methods: We analyzed 161 clinical isolates across four common mold genera. Penicillium spp., Aspergillus spp. (with A. flavus and A. fumigatus), Fusarium spp., and Cladosporium spp. Daily images were captured from the top and bottom of culture plates over five days using a standardized smartphone setup, generating over 4,000 images. We trained three variations of VGG16 convolutional neural networks (CNN) and benchmarked the best-performing model (VGG16 with dual classification heads) against LightGBM models trained on pre-extracted features and human expert assessments at various time points.
Results: The best performing VGG16 model achieved a mean (SD) ROC-AUC of 92.7% ± 1.8% and sensitivity of 68.7% ±2.6% across all species. Here, the performance in identifying Cladosporium spp. was best (ROC-AUC 99.9% ± 0.1%, 5-fold cross-validation mean and SD ). Regarding the evaluations over time, early stage classification (days 1-2) was challenging (F1-score 38.8% ± 3.5% across all species but improved significantly on day 3-5 (F1 92.1% ± 2.8% across all species). Compared to experts, MoldVision consistently showed superior performance, particularly in mature cultures, detecting subtle morphological features earlier and more accurately.
Conclusions: Our results demonstrate that CNNs integrated with low-cost smartphone imaging can reliably classify mold species in routine diagnostics, outperforming human experts in many cases. This approach offers a practical and scalable solution for laboratories lacking specialized mycology expertise, especially in resource-limited settings.
Dataset DOI: 10.5061/dryad.cjsxksnj4
Description of the data and file structure
This dataset contains image files collected from Penicillium spp., Aspergillus spp. (with Aspergillus flavus and Aspergillus fumigatus), Fusarium spp., and Cladosporium spp. isolates over a 5 day period from the top and bottom of a Sabouraud Agar Plate. Each file is named according to a standardized convention to facilitate identification and reuse.
Files and variables
File: moldvision_database.csv
Description: A CSV file for ease of access for Machine Learning applications, as well as providing a database for the image files
Variables
- filename
- class: species of fungal mold
- top/bottom: orientation of the agar plate in the picture ("top" or "bottom")
- ID: three‐digit subject identifier (from 1 upwards, e.g., 024, 003)
- day: day of growth after incubation
- rotation_index: separator index to indicate image rotation (from 1 upwards)
- time_stratification: special string to stratify into early (Day 1-2) and late images (Day 3-5)
File: moldvision_database.zip
Description: a zip file containing all image files
Filenames follow the pattern:
class_top-bottom_ID_day<d>_rot<r>.jpg
Image format: All files are compressed JPEG (.jpg) images captured at 3000×3000 pixels, 24‐bit depth.
Each image depicts a single sample at a specific orientation, on a given day after incubation, with a random rotation angle (implied by rot index).
Metadata and Attributes
No separate metadata file is provided; all relevant identifiers are embedded in the filename.
