Tephritid26: A standardized, multi-angle image dataset of quarantine-significant true fruit flies for deep learning-based identification
Data files
May 26, 2026 version files 119.44 GB
-
1001Bactrocera_correcta_Crop.zip
629.61 MB
-
1001Bactrocera_correcta.zip
3.39 GB
-
1002Bactrocera_zonata_Crop.zip
944.20 MB
-
1002Bactrocera_zonata.zip
4.97 GB
-
1003Bactrocera_dorsalis_Crop.zip
1.64 GB
-
1003Bactrocera_dorsalis.zip
6.05 GB
-
1005Bactrocera_latifrons_Crop.zip
10.21 MB
-
1005Bactrocera_latifrons.zip
56.06 MB
-
1006Bactrocera_tryoni_Crop.zip
498.25 MB
-
1006Bactrocera_tryoni.zip
2.49 GB
-
1007Bactrocera_tuberculata_Crop.zip
213.43 MB
-
1007Bactrocera_tuberculata.zip
1.80 GB
-
1009Zeugodacus_cilifer_Crop.zip
520.94 MB
-
1009Zeugodacus_cilifer.zip
3.06 GB
-
1010Zeugodacus_cucurbitae_Crop.zip
1.44 GB
-
1010Zeugodacus_cucurbitae.zip
6.17 GB
-
1011Zeugodacus_tau_Crop.zip
1.78 GB
-
1011Zeugodacus_tau.zip
6.39 GB
-
1012Zeugodacus_scutellatus_Crop.zip
1.48 GB
-
1012Zeugodacus_scutellatus.zip
5.97 GB
-
1013Bactrocera_minax_Crop.zip
2.02 GB
-
1013Bactrocera_minax.zip
5.71 GB
-
1018Ceratitis_capitata_Crop.zip
814.16 MB
-
1018Ceratitis_capitata.zip
5.77 GB
-
1019Anastrepha_fraterculus_Crop.zip
1.50 GB
-
1019Anastrepha_fraterculus.zip
5.95 GB
-
1020Anastrepha_ludens_Crop.zip
1.39 GB
-
1020Anastrepha_ludens.zip
5.70 GB
-
1021Anastrepha_obliqua_Crop.zip
1.44 GB
-
1021Anastrepha_obliqua.zip
6.17 GB
-
1022Bactrocera_oleae_Crop.zip
280.74 MB
-
1022Bactrocera_oleae.zip
2.62 GB
-
1024Carpomya_vesuviana_Crop.zip
350.52 MB
-
1024Carpomya_vesuviana.zip
3.32 GB
-
1025Dacus_trimacula_Crop.zip
619.71 MB
-
1025Dacus_trimacula.zip
2.23 GB
-
1026Rhagoletis_cerasi_Crop.zip
270.44 MB
-
1026Rhagoletis_cerasi.zip
2 GB
-
1027Bactrocera_umbrosa_Crop.zip
509.97 MB
-
1027Bactrocera_umbrosa.zip
2.89 GB
-
1028Zeugodacus_diaphorus_Crop.zip
769.77 MB
-
1028Zeugodacus_diaphorus.zip
3.75 GB
-
1029Bactrocera_thailandica_Crop.zip
265.57 MB
-
1029Bactrocera_thailandica.zip
2.18 GB
-
1030Ceratitis_cosyra_Crop.zip
274.34 MB
-
1030Ceratitis_cosyra.zip
2.44 GB
-
1031Anastrepha_suspensa_Crop.zip
142.76 MB
-
1031Anastrepha_suspensa.zip
1.86 GB
-
1032Dacus_bivittatus_Crop.zip
308.67 MB
-
1032Dacus_bivittatus.zip
2.87 GB
-
1034Dacus_punctatifrons_Crop.zip
264.09 MB
-
1034Dacus_punctatifrons.zip
3.24 GB
-
README.md
1.81 KB
-
Tephritid26_OriginalData_JSON_Labels.zip
18.13 MB
-
Tephritid26_OriginalData_TXT_Labels.zip
8.84 MB
-
Tephritid26_SplitData.csv
3.50 MB
Abstract
Accurate and rapid identification of quarantine-significant tephritids is critical to global agricultural biosecurity, but the application of deep learning is limited by the lack of large public image datasets. We present Tephritid26, a multi-angle image dataset of 26 tephritid species to address this gap. The dataset includes 38,081 images from 1,473 specimens across seven genera and two subfamilies, assembled through a global collaborative effort to source these regulated species. Specimens were mounted using a novel protocol combining varied thoracic attachment points and pin angles, and a rotational imaging setup, then systematically captured each specimen from multiple perspectives to mimic real inspection conditions. The dataset is formatted for machine learning workflows. To demonstrate its utility, we trained deep learning models for species identification. ResNet-50, ConvNeXt-B, Vit-Small, and Swin-Tiny all attained high species-level accuracy (Macro-Averaged F1-score > 96.75). Gradient-weighted Class Activation Mapping confirmed that the models focused on taxonomically informative morphological regions. This dataset serves as a benchmark for developing automated identification tools in phytosanitary applications.
We submitted the raw images (####.zip) and cropped images (####_____Crop.zip) of Tephritd26. Two types of object detection bounding boxes were provided in Tephritid26_OriginalData_JSON_Labels.zip and Tephritid26_OriginalData_TXT_Labels.zip. Data partition for training model was provided in Tephrid26_SplitData.csv. Relevant code was deposited at https://github.com/lizitao2005/TrueFruitFly_Classification.
Tephritid26_OriginalData_JSON_Labels (generated by X-AnyLabeling)
- version: version number of X-AnyLabeling
- flags: null
- shapes: boundary coordinates of the bounding box
- imagePath: name of the image that belongs to this JSON label
- imageData: null
- imageHeight: height of the image that belongs to this JSON label
- imageWidth: width of the image that belongs to this JSON label
Tephritid26_OriginalData_TXT_Labels (YOLO type)
- class_id(each set "0") x_center y_center width height
Tephritid26_SplitData.csv
- image_path: relative path to all 38,081 images
- label: class of the image
- specimen: specimen taken from the image
- split: data partition (train/val/test) for training the model
Image files were renamed by the following rules:
Images of Tephritid26: e.g. 021001001111101.jpg ---> "02-": Tephritidae (all the same), "-1001-": SpeciesID, "-001-": SpecimenID, "-1111-": Adult (all the same), "-01": SequentialID
Tephritid26_OriginalData_TXT_Labels.zip: e.g. 021001001111101.txt ---> file name is the same as image name.
Tephritid26_OriginalData_JSON_Labels.zip: e.g. 021001001111101.json ---> file name is the same as image name.
