Skip to main content
Dryad

Image-based taxonomic classification of bulk biodiversity samples using deep learning and domain adaptation

Cite this dataset

Fujisawa, Tomochika et al. (2022). Image-based taxonomic classification of bulk biodiversity samples using deep learning and domain adaptation [Dataset]. Dryad. https://doi.org/10.5061/dryad.05qfttf4f

Abstract

Complex bulk samples of insects from biodiversity surveys present a challenge for taxonomic identification, which could be overcome by high-throughput imaging combined with machine learning for rapid classification of specimens. These procedures require that taxonomic labels from an existing source data set are used for model training and prediction of an unknown target sample. However, such transfer learning may be problematic for the study of new samples not previously encountered in an image set, e.g. from unexplored ecosystems, and require methods of domain adaptation that reduce the differences in the feature distribution of the source and target domains (training and test sets). We assessed the efficiency of domain adaptation for family-level classification of bulk samples of Coleoptera, as a critical first step in the characterisation of biodiversity samples. Neural network models trained with images from a global database of Coleoptera were applied to a biodiversity sample from understudied forests in Cyprus as the target. Within-dataset classification accuracy reached 98% and depended on the number and quality of training images and on dataset complexity. The accuracy of between-datasets predictions (across disparate source-target pairs that do not share any species or genera) was at most 82% and depended greatly on the standardisation of the imaging procedure. Algorithms for domain adaptation significantly improved the prediction performance of models trained by non-standardised, low-quality images. Our findings demonstrate that existing databases can be used to train models and successfully classify images from unexplored biota, but the imaging conditions and classification algorithms need careful consideration.