BugNet: a rapid and scalable pipeline for automated insect monitoring using hierarchical data
Data files
Mar 11, 2026 version files 668.14 MB
-
BugNet_-_Dryad.zip
668.13 MB
-
README.md
3.92 KB
Abstract
Despite the importance of monitoring insect diversity to ecological and conservation questions, we lack sufficient technologies to monitor insects at scale. While research into automated systems for monitoring biodiversity through camera traps has led to the development of a number of machine learning approaches for insect monitoring, these tools suffer from a lack of training data and face challenges in classifying insects in highly diverse systems where the majority of species are unknown to science. To address these challenges, we developed BugNet, an automated pipeline for aggregating insect image data from online databases and training classification models, and test a large-scale insect detection model on GBIF and field images. We show that this system can be used to rapidly create and validate classification models with high accuracy on internet and field images. Furthermore, we show that incorporating hierarchical data into classification models improves their ability of models to handle unknown taxa. These systems are an important step towards a generalized and scalable insect detection platform. While not capable of monitoring every dimension of insect diversity, BugNet can be used to accurately classify insects from camera trap images, and is can be scaled to meet the data needs of larger ecological and conservation questions.
Dataset DOI: 10.5061/dryad.g1jwstr5f
Description of the data and file structure
This repository contains data and analysis code related to the BugNet data pipeline and tests of BugNet classification models
Files and variables
File: metadata.xlsx
Description: Descriptions of code and data files, including column descriptions when relevant
File: data/classificationdata.9.9.csv
Description: Formatted data for tests of classification models on raw and augmented validation images. Missings coded as NA.
| Column name | type | description |
| image | character | full basename of processed image |
| name | character | name of taxonomic level, one of order, family, genus, species. |
| conf | numeric | model confidence in prediction |
| pred | numeric | numeric ID of predicted taxon |
| group | character | identifier for factors used in analysis |
| value | numeric | numeric ID of ground truth taxon |
| FN | bool | ID is false negative |
| FP | bool | ID is false positive |
| TN | bool | ID is true negative |
| TP | bool | ID is true positive |
| check | bool | ID is correct |
| upper | character | broad class for factors used in analysis |
File: data/validated_insects.csv
Description: Ground truth labels for processed field images. Missings coded as NA.
| Column name | type | description |
| valid_taxon | character | taxon name for validated image |
| global_id | numeric | numeric ID of individual insect |
File: data/pixels.txt
Description: Dimensions of cropped insects in field images.
| Column name | type | description |
| path | character | local path to analyzed image |
| insect | numeric | numeric ID for individual insect |
| x | numeric | x dimension in pixels of insect |
| y | numeric | y dimension in pixels of insect |
File: data/data.txt
Description: Raw predicted insect confidences. first col is an identifier col for each image, each following col contains model confidences for each of 1127 potential taxa.
File: data/model.yaml
Description: Model metadata. used to apply taxonomic names to raw model numeric IDs for each taxon
Code/software
Code can be run using R version 4.5.2.
File: code/classifier.figures.R
Description: code for summary stats and figures surrounding tests of the insect classifier and image augmentations.
File: code/functions.R
Description: Helper functions used in other scripts
File: code/insect_formatting.R
Description: Code for formatting raw insect data, sumary stats, GLMs of model performance, and comparisons to ground truth data.
