Bounding-box detection data for delphinid whistles

Ferguson, Elizabeth 1 ; Alongi, Gabriela1 ; Sugarman, Peter2; Pettis Schallert, Jennifer3; Lyn, Heidi4

Published Aug 01, 2025 on Dryad. https://doi.org/10.5061/dryad.z34tmpgq6

Data files

Aug 01, 2025 version files 8.25 GB

Networks_and_Test_Results.zip
5.09 GB
README.md
7.53 KB
Testing_Audio_and_Annotations.zip
376.96 MB
Training_Audio_and_Anntoations.zip
2.79 GB

Abstract

Deep learning methods offer automated solutions for detecting marine mammal calls, yet require time-intensive development for optimized neural network performance, including carefully curating data and creating a robust network architecture. Using data collected in two aquariums and two open ocean environments, we evaluated the performance of a series of pre-trained object detection networks, CSP-DarkNet-53, ResNet-50, and Tiny YOLO, in detecting highly variable bottlenose dolphin (Tursiops truncatus) whistles using DeepAcoustics, a user-friendly deep learning tool. We compared the F1-score, average precision (AP), and mean AP performance of all network architectures with combinations of training samples from each acoustic environment. CSP-DarkNet-53 consistently outperformed Tiny YOLO and ResNet-50 across various test datasets, demonstrating robustness, but underperformed in select scenarios. Performance remained higher for aquarium data compared to open ocean data based on AP and mean AP values, indicating a greater ability of the networks to accurately detect whistles in these environments. However, networks trained on open ocean datasets showed only slightly improved APs on open ocean data, highlighting the challenge of achieving generalizability across divergent acoustic environments. This effort highlights the importance of network architecture selection and the effects of different acoustic environments on deep learning methods for detecting complex underwater vocalizations.

Dataset DOI: https://doi.org/10.5061/dryad.z34tmpgq6

Description of the data and file structure

This dataset supports the study titled "Effects of network selection and acoustic environment on bounding-box object detection of delphinid whistles using a deep learning tool." It includes audio recordings, annotations, trained models, and evaluation metrics used to assess the performance of deep learning networks for detecting bottlenose dolphin (Tursiops truncatus) whistles.

The dataset is organized into three folders:

Training_Audio_and_Anntoations.zip: Contains training audio files and corresponding annotation data (e.g., selection tables and detection labels) from four acoustic environments—two aquarium and two open ocean settings. Subfolders by dataset house the audio files, selection tables (.txt), and DeepAcoustics imported .mat file. The aquarium datasets contain two merged audio file sets with associated selection tables and imported .mat files. The open ocean data contain a folder of audio in native duration (as this modification was made to DeepAcoustics recently to accommodate multiple audio files per single .txt/.mat file.), and a single selection table/imported .mat file. Note the Oceanographic data is labeled with "Valencia" as that was an earlier descriptor of the data.
Testing_Audio_and_Annotations.zip: Contains test audio and annotation files used to evaluate model performance across different environments. There is a single audio test file and .txt and .mat file. Note the Oceanographic test file is labeled with "Baseline" as that was an earlier descriptor of the data.
Networks_and_Test``Results.zip: Includes trained network models (CSP-DarkNet-53, ResNet-50, Tiny YOLO), performance metrics (e.g., F1-scores, average precision, mean AP), and detection output files. Within each network labeled folders, there are the trained networks, and two subfolders that house details of test file "Detections" and performance metrics (labeled "PR").

Acronyms:

IMMS: Institute for Marine Mammal Studies, Gulfport, Mississippi.
OF: Oceanogràfic Foundation, Valencia, Spain.
DCLDE: Detection, Classification, Localisation, and Density Estimation workshop (2011)
Towed Array: Towed array recordings from National Oceanic and Atmospheric Administration (NOAA), Southwest Fisheries Science Center (SWFSC) marine mammal surveys

This dataset was used to evaluate how network architecture and training environment affect the performance of a user-friendly deep learning tool, DeepAcoustics, in detecting variable dolphin whistles. Results highlight the influence of acoustic conditions on detection accuracy and the challenge of achieving generalizability across environments.

This data package is intended for researchers working in marine bioacoustics, deep learning model evaluation, and the development of automated species detection tools.

Files and variables

Data Repository Structure and Contents

Training_Audio_and_Anntoations/
- Contains training audio files and annotation .mat files for each recording site.
- Subfolders:
  - Aquarium_IMMS/ – Audio from the Institute for Marine Mammal Studies (IMMS), Gulfport, Mississippi.
  - Aquarium_Oceanografic/ – Audio from the Oceanogràfic Foundation (OF), Valencia, Spain.
  - OpenOcean_DCLDE2011/ – Audio from the DCLDE 2011 workshop dataset.
  - OpenOcean_SWFSC/ – Towed array recordings from NOAA SWFSC surveys.
- Annotation files:
  - IMMS_01_Training_Annotations.mat
  - IMMS_02_Training_Annotations.mat
  - Oceanografic_01_Training_Annotations.mat
  - Oceanografic_02_Training_Annotations.mat
  - OpenOcean_DCLDE2011_Training_Annotations.mat
  - OpenOcean_TowedArray_Training_Annotations.mat
Testing_Audio_and_Annotations/
- Contains test audio files and annotation .mat files organized similarly to the training folder.
- (Structure mirrors Training_Audio_and_Anntoations/ with corresponding test datasets.)
Networks_and_Test_Results/
- Contains trained models, detection outputs, and performance results.
- Subfolders:
  - DarkNet/ – Outputs and model results using the CSP-DarkNet-53 architecture.
  - ResNet/ – Outputs and model results using the ResNet-50 architecture.
  - tinyYOLO/ – Outputs and model results using the Tiny YOLO architecture.
- Contents include:
  - .mat files for each dataset and architecture (e.g., AllAquarium_DarkNet_3sec_WB_512pix_19train_vX.mat)
  - .png plots showing performance metrics (e.g., PR curves)
- Additional subfolders:
  - Detections/ – Contains bounding-box prediction outputs.
  - PR/ – Contains precision-recall results in .mat file.

Code/software

Software Requirements

This dataset was processed using DeepAcoustics, an open-source MATLAB application for detecting marine mammal vocalizations using deep learning-based object detection. DeepAcoustics is built around the YOLOv4 architecture, offering a balance of performance and accessibility for bioacoustic research.

To run DeepAcoustics, the following software and toolboxes are required:

MATLAB R2023b or later
Deep Learning Toolbox
Computer Vision Toolbox
Signal Processing Toolbox
Parallel Computing Toolbox (required for GPU acceleration)
Statistics and Machine Learning Toolbox

DeepAcoustics is available at:
https://github.com/Ocean-Science-Analytics/DeepAcoustics
Tutorial and setup guide:
https://github.com/Ocean-Science-Analytics/DeepAcoustics_Guide_and_Tutorial

GPU Requirements

A CUDA-enabled GPU is required for training deep learning models within DeepAcoustics to ensure efficient processing of large datasets and network computations. While training on a CPU is possible, it is significantly slower and not recommended for large-scale models like YOLO.

However, a GPU is not required to run detections using pre-trained models — users can perform inference and review results using only a standard CPU setup.

Data Preparation and Structure Notes

Aquarium data (from IMMS and Oceanogràfic) were originally recorded as numerous short files. Due to early import limitations in DeepAcoustics, these files were compiled into longer audio files before annotation and training.
For open ocean datasets (DCLDE and SWFSC towed array), recent improvements to the tool allowed for individual short audio files to be imported and used directly, improving granularity and flexibility in annotation.
Test files were selected to ensure temporal and contextual separation:
- Aquarium test sets were drawn from different days or times than the training data.
- Open ocean test sets were selected from distinct acoustic events (e.g., different encounters or deployments) to avoid overlap and evaluate generalization.
The Training_Audio_and_Anntoations/ folder includes both training and validation data.
- In earlier versions of DeepAcoustics, validation data were automatically subsampled from the training folder.
- Recent updates allow users to define separate annotated validation datasets, improving control over model evaluation and reducing the risk of overlap.