Data from: Robust mosquito species identification from diverse body and wing images using deep learning
Data files
Sep 03, 2024 version files 4.85 GB
-
MosquitoCNNComparison.zip
4.85 GB
-
README.md
2.75 KB
Abstract
Mosquito-borne diseases are a major global health threat. Traditional morphological or molecular methods for identifying mosquito species often require specialized expertise or expensive laboratory equipment. The use of Convolutional Neural Networks (CNNs) to identify mosquito species based on images may offer a promising alternative, but their practical implementation often remains limited. This study explores the applicability of CNNs in classifying mosquito species. It compares the efficacy of body and wing depictions across three image collection methods: a smartphone, macro-lens attached to a smartphone and a professional stereomicroscope. The study included 796 specimens of four morphologically similar Aedes species, Aedes aegypti, Ae. albopictus, Ae. koreicus, and Ae. japonicus japonicus. The findings of this study indicate that CNN models demonstrate superior performance in wing-based classification 87.6% (CI95%: 84.2 - 91.0) compared to body-based classification 78.9% (CI95%: 77.7 - 80.0). Nevertheless, there are notable limitations of CNNs as they perform reliably across multiple devices only when trained specifically on those devices, resulting in an average decline of mean accuracy by 14%, even with extensive image augmentation. Additionally, we also estimate the required training data volume for effective classification, noting a reduced requirement for wing-based classification in comparison to body-based methods. Our study underscores the viability of both body and wing classification methods for mosquito species identification while emphasizing the need to address practical constraints in developing accessible classification systems.
Methods
We collected images from 797 female mosquito specimens with 198 - 200 specimens of four different species: Aedes aegypti, Ae. albopictus, Ae. koreicus and Ae. japonicus japonicus (Ae. japonicus) (Table 1). All specimens were reared under standardized conditions in the arthropod rearing facility at the Bernhard Nocht Institute for Tropical Medicine, Hamburg. Each specimen was photographed using three different devices: a smartphone (iPhone SE 3rd Generation, Apple Inc., Cupertino, USA), a macro-lens (Apexel-25MXH, Apexel, Shenzhen, China) connected to the same smartphone, and a stereomicroscope (Olympus SZ61, Olympus, Tokyo, Japan) with an attached camera (Olympus DP23, Olympus, Tokyo, Japan). In the following text, we will refer to the smartphone as a “phone”, the smartphone with a macro lens attachment as “macro-lens” or “macro”, and the stereomicroscope as “microscope” or “micro”.
For the “body” dataset, the complete mosquitoes were photographed with all three devices in the same orientation to guarantee the visibility of identical features in all the pictures (example images can be found in the appendix: Image comparison). Subsequently, for the “wing” dataset, the left and right wings were mounted on a microscope slide using the embedding medium Euparal (Carl Roth, Karlsruhe, Germany) and photographed with the macro-lens and microscope. Due to the small size of the wings, image capture through the phone only was not feasible. The left wing of each specimen was used. If the left wing was damaged the right wing was used as an alternative.
Image capture for Ae. aegypti, Ae. albopictus and Ae. koreicus was done in batches of 50 to reduce biases during the image capture process, e.g. light conditions in the room. Images of Ae. japonicus were collected after the initial data collection process was completed, because we aimed to add another morphologically similar species to the study to increase its robustness. All images were manually cropped to remove as much background as possible and subsequently downscaled to a size of 300x300 pixels. To create images with a ratio of 1:1, images were cropped with padding. The complete image dataset was randomly partitioned into training (70%), validation (15%), and testing (15%) subsets (Table 1). Thereby, the dataset split was determined based on mosquito specimen rather than individual images to ensure a stringent division between the datasets.