HARP North Atlantic beaked whales: Echolocation click collection for machine learning
Data files
Jun 12, 2024 version files 74.26 GB
Abstract
This dataset package is publicly available to advance the detection of beaked whale populations through passive acoustic monitoring. This extensive dataset from the western North Atlantic has been instrumental in developing a deep neural network (DNN) to improve the detection of ephemeral events.
The volume of data generated by passive acoustic monitoring can be overwhelming, complicating efforts to quantify species occurrence for effective conservation and management. Automation of data processing using machine learning algorithms enables efficient species identification using their sounds. Beaked whale acoustic events, often infrequent and ephemeral, can be missed when co-occurring with signals of more abundant, and acoustically active species that dominate acoustic recordings. Large-scale classification efforts using DNN, which included beaked whales as one of many classes along with other odontocete species and anthropogenic signals, often missed the ephemeral events in favor of more common and dominant classes. By training the DNN to focus on the taxonomic family of beaked whales, we demonstrate that ephemeral events can be correctly and efficiently identified to species, even with few echolocation clicks.
This classification method can support improved estimation of beaked whale occurrence in regions of high odontocete acoustic activity, and this dataset package can be used to develop classification methods to improve data availability for these rare species. We kindly ask that if this dataset is used, authors cite this repository and the accompanying article that provides detailed information on how the dataset has been collected and processed.
README: HARP North Atlantic Beaked Whales
https://doi.org/10.5061/dryad.gf1vhhmw0
Usage notes
The data package consists of the following 11 zip files, each named after the abbreviation of its signal class, with corresponding signal data:
- BWG.zip - Signal class: Unknown species named as Beaked Whale Gulf
- Mb.zip - Signal class: Mesoplodon bidens
- Md.zip - Signal class: Mesoplodon densirostris
- Me.zip - Signal class: Mesoplodon europaeus
- Mm.zip - Signal class: Mesoplodon mirus
- Zc.zip - Signal class: Ziphius cavirostris
- Despp.zip - Signal class: Unspecified species from the family* Delphinidae*
- Gg.zip - Signal class: Grampus griseus
- Kospp.zip - Signal class: Unspecified species from the genus Kogia
- PmBoat.zip - Signal class: Impulsive signals with frequency below 20 kHz, likely corresponding to vessels and Physeter macrocephalus
- ESping.zip - Signal class: Miscellaneous echosounder pings
ZIP files contain multiple data files. Files are names according to the acoustic dataset from which they were derived following the pattern:
Project_Site_DeploymentNumber_diskNumber_SignalClass_TPWS1.mat
Data files contain event detections from deployments of High-frequency Acoustic Recording Packages (HARPs), with detections with received levels above 118 dBs re 1uPa. The data file type "TPWS" stands for Time, peak-to-peak amplitude, waveform, and spectrum, and these files include five variables:
- MTT: Time of event as Matlab datenumber (days elapsed since January 0, 0000). Each row represents one detection.
- MPP: Peak to peak amplitude of event in dB re 1uPa. Each row represents one detection.
- MSN: Event waveforms, sampled at 200kHz, padded with zeros as needed to reach 200kHz, with maximum energy located at the 100th sample. Each row represents one detection.
- MSP: Spectra of each event, in 0.5kHz frequency bins. Each row represents one detection.
- f: Frequency vector in kHz associated with the spectra in MSP.