Bounding-box detection data for delphinid whistles
Data files
Aug 01, 2025 version files 8.25 GB
-
Networks_and_Test_Results.zip
5.09 GB
-
README.md
7.53 KB
-
Testing_Audio_and_Annotations.zip
376.96 MB
-
Training_Audio_and_Anntoations.zip
2.79 GB
Abstract
Deep learning methods offer automated solutions for detecting marine mammal calls, yet require time-intensive development for optimized neural network performance, including carefully curating data and creating a robust network architecture. Using data collected in two aquariums and two open ocean environments, we evaluated the performance of a series of pre-trained object detection networks, CSP-DarkNet-53, ResNet-50, and Tiny YOLO, in detecting highly variable bottlenose dolphin (Tursiops truncatus) whistles using DeepAcoustics, a user-friendly deep learning tool. We compared the F1-score, average precision (AP), and mean AP performance of all network architectures with combinations of training samples from each acoustic environment. CSP-DarkNet-53 consistently outperformed Tiny YOLO and ResNet-50 across various test datasets, demonstrating robustness, but underperformed in select scenarios. Performance remained higher for aquarium data compared to open ocean data based on AP and mean AP values, indicating a greater ability of the networks to accurately detect whistles in these environments. However, networks trained on open ocean datasets showed only slightly improved APs on open ocean data, highlighting the challenge of achieving generalizability across divergent acoustic environments. This effort highlights the importance of network architecture selection and the effects of different acoustic environments on deep learning methods for detecting complex underwater vocalizations.
Dataset DOI: https://doi.org/10.5061/dryad.z34tmpgq6
Description of the data and file structure
This dataset supports the study titled "Effects of network selection and acoustic environment on bounding-box object detection of delphinid whistles using a deep learning tool." It includes audio recordings, annotations, trained models, and evaluation metrics used to assess the performance of deep learning networks for detecting bottlenose dolphin (Tursiops truncatus) whistles.
The dataset is organized into three folders:
Training_Audio_and_Anntoations
.zip: Contains training audio files and corresponding annotation data (e.g., selection tables and detection labels) from four acoustic environments—two aquarium and two open ocean settings. Subfolders by dataset house the audio files, selection tables (.txt), and DeepAcoustics imported .mat file. The aquarium datasets contain two merged audio file sets with associated selection tables and imported .mat files. The open ocean data contain a folder of audio in native duration (as this modification was made to DeepAcoustics recently to accommodate multiple audio files per single .txt/.mat file.), and a single selection table/imported .mat file. Note the Oceanographic data is labeled with "Valencia" as that was an earlier descriptor of the data.Testing_Audio_and_Annotations
.zip: Contains test audio and annotation files used to evaluate model performance across different environments. There is a single audio test file and .txt and .mat file. Note the Oceanographic test file is labeled with "Baseline" as that was an earlier descriptor of the data.Networks_and_Test``Results
.zip: Includes trained network models (CSP-DarkNet-53, ResNet-50, Tiny YOLO), performance metrics (e.g., F1-scores, average precision, mean AP), and detection output files. Within each network labeled folders, there are the trained networks, and two subfolders that house details of test file "Detections" and performance metrics (labeled "PR").
Acronyms:
- IMMS: Institute for Marine Mammal Studies, Gulfport, Mississippi.
- OF: Oceanogràfic Foundation, Valencia, Spain.
- DCLDE: Detection, Classification, Localisation, and Density Estimation workshop (2011)
- Towed Array: Towed array recordings from National Oceanic and Atmospheric Administration (NOAA), Southwest Fisheries Science Center (SWFSC) marine mammal surveys
This dataset was used to evaluate how network architecture and training environment affect the performance of a user-friendly deep learning tool, DeepAcoustics, in detecting variable dolphin whistles. Results highlight the influence of acoustic conditions on detection accuracy and the challenge of achieving generalizability across environments.
This data package is intended for researchers working in marine bioacoustics, deep learning model evaluation, and the development of automated species detection tools.
Files and variables
Data Repository Structure and Contents
Training_Audio_and_Anntoations/
- Contains training audio files and annotation
.mat
files for each recording site. - Subfolders:
Aquarium_IMMS/
– Audio from the Institute for Marine Mammal Studies (IMMS), Gulfport, Mississippi.Aquarium_Oceanografic/
– Audio from the Oceanogràfic Foundation (OF), Valencia, Spain.OpenOcean_DCLDE2011/
– Audio from the DCLDE 2011 workshop dataset.OpenOcean_SWFSC/
– Towed array recordings from NOAA SWFSC surveys.
- Annotation files:
IMMS_01_Training_Annotations.mat
IMMS_02_Training_Annotations.mat
Oceanografic_01_Training_Annotations.mat
Oceanografic_02_Training_Annotations.mat
OpenOcean_DCLDE2011_Training_Annotations.mat
OpenOcean_TowedArray_Training_Annotations.mat
- Contains training audio files and annotation
Testing_Audio_and_Annotations/
- Contains test audio files and annotation
.mat
files organized similarly to the training folder. - (Structure mirrors
Training_Audio_and_Anntoations/
with corresponding test datasets.)
- Contains test audio files and annotation
Networks_and_Test_Results/
- Contains trained models, detection outputs, and performance results.
- Subfolders:
DarkNet/
– Outputs and model results using the CSP-DarkNet-53 architecture.ResNet/
– Outputs and model results using the ResNet-50 architecture.tinyYOLO/
– Outputs and model results using the Tiny YOLO architecture.
- Contents include:
.mat
files for each dataset and architecture (e.g.,AllAquarium_DarkNet_3sec_WB_512pix_19train_vX.mat
).png
plots showing performance metrics (e.g., PR curves)
- Additional subfolders:
Detections/
– Contains bounding-box prediction outputs.PR/
– Contains precision-recall results in .mat file.
Code/software
Software Requirements
This dataset was processed using DeepAcoustics, an open-source MATLAB application for detecting marine mammal vocalizations using deep learning-based object detection. DeepAcoustics is built around the YOLOv4 architecture, offering a balance of performance and accessibility for bioacoustic research.
To run DeepAcoustics, the following software and toolboxes are required:
- MATLAB R2023b or later
- Deep Learning Toolbox
- Computer Vision Toolbox
- Signal Processing Toolbox
- Parallel Computing Toolbox (required for GPU acceleration)
- Statistics and Machine Learning Toolbox
DeepAcoustics is available at:
https://github.com/Ocean-Science-Analytics/DeepAcoustics
Tutorial and setup guide:
https://github.com/Ocean-Science-Analytics/DeepAcoustics_Guide_and_Tutorial
GPU Requirements
A CUDA-enabled GPU is required for training deep learning models within DeepAcoustics to ensure efficient processing of large datasets and network computations. While training on a CPU is possible, it is significantly slower and not recommended for large-scale models like YOLO.
However, a GPU is not required to run detections using pre-trained models — users can perform inference and review results using only a standard CPU setup.
Data Preparation and Structure Notes
- Aquarium data (from IMMS and Oceanogràfic) were originally recorded as numerous short files. Due to early import limitations in DeepAcoustics, these files were compiled into longer audio files before annotation and training.
- For open ocean datasets (DCLDE and SWFSC towed array), recent improvements to the tool allowed for individual short audio files to be imported and used directly, improving granularity and flexibility in annotation.
- Test files were selected to ensure temporal and contextual separation:
- Aquarium test sets were drawn from different days or times than the training data.
- Open ocean test sets were selected from distinct acoustic events (e.g., different encounters or deployments) to avoid overlap and evaluate generalization.
- The
Training_Audio_and_Anntoations/
folder includes both training and validation data.- In earlier versions of DeepAcoustics, validation data were automatically subsampled from the training folder.
- Recent updates allow users to define separate annotated validation datasets, improving control over model evaluation and reducing the risk of overlap.