Skip to main content
Dryad

Multi‐barcoding‐based Gastropoda identification using hierarchical attention network with staged curriculum learning

Data files

Jan 28, 2026 version files 6.78 GB

Click names to download individual files

Abstract

This dataset contains multiple types of DNA barcoding sequences of COI, 16S, H3, 18S, ITS1, and ITS2 of Gastropoda species accessed from GenBank and BOLD, and the corresponding the RNA secondary structure embedding of the 6 barcoding types and the DNA barcoding embedding of COI sequences. The RNA secondary structure embeddings are extracted using the RNA foundation model ERNIE-RNA (https://github.com/Bruce-ywj/ERNIE-RNA) (Yin et al. 2025, Nature Communications, 16: 10076). The DNA barcoding embedding of COI sequences is extracted using the DNA barcoding foundation model BarcodeMAE (https://github.com/bioscan-ml/BarcodeMAE) (Safari et al. 2025, arXiv, 2502.18405). These sequences and embeddings are used for training, validation, testing, and independent testing data for a deep learning model named SnailBaLLsp, which is developed for multi‐barcoding‐based Gastropoda identification using hierarchical attention network with staged curriculum learning.