Skip to main content
Dryad

Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal Passive Acoustic Monitoring datasets

Cite this dataset

Cominelli, Simone; Bellin, Nicolo'; Brown, Carissa D.; Lawson, Jack (2024). Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal Passive Acoustic Monitoring datasets [Dataset]. Dryad. https://doi.org/10.5061/dryad.3bk3j9kn8

Abstract

Passive Acoustic Monitoring (PAM) is emerging as a solution for monitoring species and environmental change over large spatial and temporal scales. However, drawing rigorous conclusions based on acoustic recordings is challenging, as there is no consensus over which approaches, and indices are best suited for characterizing marine and terrestrial acoustic environments.

Here, we describe the application of multiple machine-learning techniques to the analysis of a large PAM dataset. We combine pre-trained acoustic classification models (VGGish, NOAA & Google Humpback Whale Detector), dimensionality reduction (UMAP), and balanced random forest algorithms to demonstrate how machine-learned acoustic features capture different aspects of the marine environment.

The UMAP dimensions derived from VGGish acoustic features exhibited good performance in separating marine mammal vocalizations according to species and locations. RF models trained on the acoustic features performed well for labelled sounds in the 8 kHz range, however, low and high-frequency sounds could not be classified using this approach.

The workflow presented here shows how acoustic feature extraction, visualization, and analysis allow for establishing a link between ecologically relevant information and PAM recordings at multiple scales.

The datasets and scripts provided in this repository allow replicating the results presented in the publication. 

README: Data for: Acoustic features as a tool to visualize and explore marine soundscapes: applications illustrated using marine mammal Passive Acoustic Monitoring datasets.

https://doi.org/10.5061/dryad.3bk3j9kn8

The data and scripts provided here allows replicating the results presented in the publication: "Acoustic features as a tool to visualize and explore marine soundscapes: applications illustrated using marine mammal Passive Acoustic Monitoring datasets."

List of tables:

SM_1_WMD_Features_and_Labels.csv -> table containing VGGish features extracted from audio files downloaded from the Watkins Marine Mammals Sounds Database (https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm).

Missing values in this dataset are marked as nan.

Fields description:

ID_row : progressive ID number for each row in the dataset

0 - 127: labels for the 128 VGGish features.

ID: reference to the Watkins Marine Mammal Sounds Database. Each ID corresponds to an audio file stored in the database.

SPECIES: Species associated with each recording from the Watkins Marine Mammal Sounds Database. The species identifiers are coded using four
characters: the first two letters of the genus, followed by the first two letter of the species (e.g., Eubalaena glacialis -> Eugl).

HGROUP: marine mammal functional hearing group (HF: high-frequency species; LF: low-frequency species)

TAX: taxonomic group (Mys: Mysticete; Odo: Odontocete)

COUNTRY: labels for the country of origin of the recording, obtained from the Watkins Marine Mammal Sounds Database (Us: United States; Ca: Canada; Ns: Canada - Nova Scotia; Nr: Norway; Bm: Bahamas; Uk: British Virgin Islands; Pr: Puerto Rico; Au: Australia; Ar: Argentina; It:Italy; Sl: Santa Lucia; Svg: Saint Vincent's and the Grenadines; Ma: Madeira; Ml: Malta; Cr: Croatia). NOTE: This field contains empty cells. Records with unknown/unspecified origin in the country were left empty (no value assigned) to indicate a missing value.

sample_ID: progressive ID for each VGGish feature within an audio file.

prog_ID: field combining ID and sample_ID to uniquely identify each VGGish feature as a fragment of a recording from the Watkins Marine Mammal Sounds Database with 960 ms of duration.

SM_2_Annotations_Dataframe_Multilable.xlsx -> excel table containing annotated audio files and corresponding VGGish features for a subset of the Placentia Bay PAM dataset. The annotations were prepared using Raven Software (https://www.ravensoundsoftware.com/).

Fields descriptions:

ID_row : progressive ID number for each row in the dataset
0 - 127: labels for the 128 VGGish features
HW detection; HW visual: Humpback whale model detections and visual detections (0= absence; 1=presence)

SM_3_RI_features_database.csv -> VGGish feature dataset for the full Placentia Bay PAM dataset with time stamps.

Fields description

File: Original file name of the audio file with start time embedded in the file name (e.g., AMAR667.20190722T054122Z).

Channel: selected channel (1 for all audio files).

Begin Time (s) & End Time (s): elapsed time from the start time of the audio file to the end of the time-window used to generate acoustic features. Start - end times begin with 0 and progress with increments of 4.8 s till the end of the audio file. The Begin Time is reset to 0 at the start of each subsequent audio file.

Low Freq (Hz) & High Freq (Hz): lower (Low Freq) and upper (High Freq) frequency limits of the audio samples in Hz.

Delta Time: Difference between End Time (s) & Begin Time (s)

Delta Freq (Hz): Difference between High Freq (Hz) and Low Freq (Hz)

Avg Power Density (dB FS/Hz): uncalibrated average power density for the audio sample

0 - 127: labels for the 128 VGGish features

HW_detection & HW visual: humpback whale detections from PacificSoundDetectHumpbackSong (https://docs.mbari.org/pacific-sound/notebooks/humpbackwhales/detect/PacificSoundDetectHumpbackSong/) and marked through visual inspection of audio recordings (0 = absence; 1 = presence).

location: hydrophone deployment location

SM_4_PBD_Oceanographic_Data -> table containing environmental variables collected by the Smart Atlantic Buoy located in Red Island (Placentia Bay) (https://www.smartatlantic.ca/) in proximity to the hydrophone deployment location of the Placentia Bay PAM dataset. NOTE: This field contains empty cells. Records with no measures available in the original data were left empty (no value assigned) to indicate a missing value.

Fields description

station_name: unique ID field for the station

time: time stamp for the oceanographic data, in the format yyyy-mm-dd HH:MM:SS.00

longitude & latitude (precise_longitude & precise_latitude): general location of the station (precise location of the insturument)

wind_spd_avg & wind_spd2_avg : average wind speed measured in m/s

wind_spd_max & wind_spd2_max: max wind speed of wind gusts measured in m/s

wind_dir_avg & wind_dir2_avg: average wind direction in degrees

air_temp_avg: average air temperature in degree Celsius.

air_pressure_av: average atmospheric pressure in millibar (mbar)

air_humidity_avg: average air humidity, unitless and ranging 0-100.

air_dewpoint_avg: dewpoint temperature in degree Celsius:

surface_temp_avg: average temperature at the ocean surface, in degree Celsius

wave_ht_max: sea surface wave maximum height (m)

wave_ht_sig: sea surface wave significant height (m)

wave_period_max: sea surface maximum period (s)

wave_dir_avg: average sea surface wave direction in degrees

wave_spread_avg: sea surface wave directional spread in degrees

curr_dir_avg: sea water velocity to direction in degrees

curr_spd_avg: sea water speed in mm/s

Metadata and variables descriptions can be found here: https://www.smartatlantic.ca/erddap/index.html

List of scripts

The scripts provided here read the data tables and reproduce the analysis and figures presented in the manuscript. The scripts were prepared using Google Collaboratory and written using Python language. Running the scripts requires connecting the notebook to a GDrive account where the data tables have been uploaded.

SM_5_WMD_species_and_locations.ipynb -> the script replicates the analysis performed on the recordings from the Watkins Marine Mammal Sound Database.

SM_6_PBD_Detections.ipynb; SM_7_PBD_Ocean_Variables.ipynb -> the scripts replicate the analysis performed on the Placentia Bay PAM dataset.

*External Data Sources and Scripts: *

Methods

Data acquisition and preparation

We collected all records available in the Watkins Marine Mammal Database website listed under the “all cuts'' page. For each audio file in the WMD the associated metadata included a label for the sound sources present in the recording (biological, anthropogenic, and environmental), as well as information related to the location and date of recording. To minimize the presence of unwanted sounds in the samples, we only retained audio files with a single source listed in the metadata. We then labelled the selected audio clips according to taxonomic group (Odontocetae, Mysticetae), and species. 

 We limited the analysis to 12 marine mammal species by discarding data when a species:  had less than 60 s of audio available, had a vocal repertoire extending beyond the resolution of the acoustic classification model (VGGish), or was recorded in a single country. To determine if a species was suited for analysis using VGGish, we inspected the Mel-spectrograms of 3-s audio samples and only retained species with vocalizations that could be captured in the Mel-spectrogram (Appendix S1). The vocalizations of species that produce very low frequency, or very high frequency were not captured by the Mel-spectrogram, thus we removed them from the analysis. To ensure that records included the vocalizations of multiple individuals for each species, we only considered species with records from two or more different countries. Lastly, to avoid overrepresentation of sperm whale vocalizations, we excluded 30,000 sperm whale recordings collected in the Dominican Republic. The resulting dataset consisted in 19,682 audio clips with a duration of 960 milliseconds each (0.96 s) (Table 1). 

The Placentia Bay Database (PBD) includes recordings collected by Fisheries and Oceans Canada in Placentia Bay (Newfoundland, Canada), in 2019. The dataset consisted of two months of continuous recordings (1230 hours), starting on July 1st, 2019, and ending on August 31st 2029. The data was collected using an AMAR G4 hydrophone (sensitivity: -165.02 dB re 1V/µPa at 250 Hz) deployed at 64 m of depth. The hydrophone was set to operate following 15 min cycles, with the first 60 s sampled at 512 kHz, and the remaining 14 min sampled at 64 kHz. For the purpose of this study, we limited the analysis to the 64 kHz recordings.

Acoustic feature extraction

The audio files from the WMD and PBD databases were used as input for VGGish (Abu-El-Haija et al., 2016; Chung et al., 2018), a CNN developed and trained to perform general acoustic classification. VGGish was trained on the Youtube8M dataset, containing more than two million user-labelled audio-video files. Rather than focusing on the final output of the model (i.e., the assigned labels), here the model was used as a feature extractor (Sethi et al., 2020). VGGish converts audio input into a semantically meaningful vector consisting of 128 features. The model returns features at multiple resolution: ~1 s (960 ms); ~5 s (4800 ms); ~1 min (59’520 ms); ~5 min (299’520 ms). All of the visualizations and results pertaining to the WMD were prepared using the finest feature resolution of ~1 s. The visualizations and results pertaining to the PBD were prepared using the ~5 s features for the humpback whale detection example, and were then averaged to an interval of 30 min in order to match the temporal resolution of the environmental measures available for the area. 

UMAP ordination and visualization

UMAP is a non-linear dimensionality reduction algorithm based on the concept of topological data analysis which, unlike other dimensionality reduction techniques (e.g., tSNE), preserves both the local and global structure of multivariate datasets (McInnes et al., 2018). To allow for data visualization and to reduce the 128 features to two dimensions for further analysis, we applied Uniform Manifold Approximation and Projection (UMAP) to both datasets and inspected the resulting plots.

The UMAP algorithm generates a low-dimensional representation of a multivariate dataset while maintaining the relationships between points in the global dataset structure (i.e., the 128 features extracted from VGGish). Each point in a UMAP plot in this paper represents an audio sample with duration of ~ 1 second (WMD dataset), ~ 5 seconds (PBD dataset, humpback whale detections), or 30 minutes (PBD dataset, environmental variables). Each point in the two-dimensional UMAP space also represents a vector of 128 VGGish features. The nearer two points are in the plot space, the nearer the two points are in the 128-dimensional space, and thus the distance between two points in UMAP reflects the degree of similarity between two audio samples in our datasets. Areas with a high density of samples in UMAP space should, therefore, contain sounds with similar characteristics, and such similarity should decrease with increasing point distance. Previous studies illustrated how VGGish and UMAP can be applied to the analysis of terrestrial acoustic datasets (Heath et al., 2021; Sethi et al., 2020). The visualizations and classification trials presented here illustrate how the two techniques (VGGish and UMAP) can be used together for marine ecoacoustics analysis. UMAP visualizations were prepared the umap-learn package for Python programming language (version 3.10). All UMAP visualizations presented in this study were generated using the algorithm’s default parameters.  

Labelling sound sources 

The labels for the WMD records (i.e., taxonomic group, species, location) were obtained from the database metadata. 

For the PBD recordings, we obtained measures of wind speed, surface temperature, and current speed from (Fig 1) an oceanographic buy located in proximity of the recorder. We choose these three variables for their different contributions to background noise in marine environments. Wind speed contributes to underwater background noise at multiple frequencies, ranging 500 Hz to 20 kHz (Hildebrand et al., 2021). Sea surface temperature contributes to background noise at frequencies between 63 Hz and 125 Hz (Ainslie et al., 2021), while ocean currents contribute to ambient noise at frequencies below 50 Hz (Han et al., 2021)    Prior to analysis, we categorized the environmental variables and assigned the categories as labels to the acoustic features (Table 2). Humpback whale vocalizations in the PBD recordings were processed using the humpback whale acoustic detector created by NOAA and Google (Allen et al., 2021), providing a model score for every  ~5 s sample. This model was trained on a large dataset (14 years and 13 locations) using humpback whale recordings annotated by experts (Allen et al., 2021). The model returns scores ranging from 0 to 1 indicating the confidence in the predicted humpback whale presence. We used the results of this detection model to label the PBD samples according to presence of humpback whale vocalizations. To verify the model results, we inspected all audio files that contained a 5 s sample with a model score higher than 0.9 for the month of July. If the presence of a humpback whale was confirmed, we labelled the segment as a model detection. We labelled any additional humpback whale vocalization present in the inspected audio files as a visual detection, while we labelled other sources and background noise samples as absences. In total, we labelled 4.6 hours of recordings. We reserved the recordings collected in August to test the precision of the final predictive model. 

Label prediction performance

We used Balanced Random Forest models (BRF) provided in the imbalanced-learn python package (Lemaître et al., 2017) to predict humpback whale presence and environmental conditions from the acoustic features generated by VGGish. We choose BRF as the algorithm as it is suited for datasets characterized by class imbalance. The BRF algorithm performs under sampling of the majority class prior to prediction, allowing to overcome class imbalance (Lemaître et al., 2017).   For each model run, the PBD dataset was split into training (80%) and testing (20%) sets.

The training datasets were used to fine-tune the models though a nested k-fold cross validation approach with ten-folds in the outer loop, and five-folds in the inner loop. We selected nested cross validation as it allows optimizing model hyperparameters and performing model evaluation in a single step. We used the default parameters of the BRF algorithm, except for the ‘n_estimators’ hyperparameter, for which we tested five different possible values: 25, 50, 100, 150, 200. We choose to optimize the model for ‘n_estimators’ as this parameter determines the number of decision trees generated by the BRF model and finding an optimal value reduces the chances of overfitting. Every iteration of the outer loop generates a new train-validation split of the test dataset, which is then used as input to a BRF. 

The testing datasets were then used to evaluate model performance. We evaluated model performance using the balanced-accuracy score, computed as:

Balanced Accuracy BA=Sensitivity+Specificity2                                (eq. 1)

We choose balanced-accuracy scores as the evaluation metric for both datasets as it is suited for measuring model performance when samples are highly imbalanced (Brodersen et al., 2010). 

In total, we conducted four trials on the PBD dataset. In the first three trials, we used the PBD dataset to test the ability of VGGish in predicting one of the three environmental variables: wind speed, ocean surface temperature, and current speed. In the fourth trial we tested the ability of VGGish in identifying humpback whale vocalizations. Lastly, we tested the humpback whale model on the recordings from the month of August, which were not part of model training and evaluation. We inspected all detections in August and computed model precision as: 

Precision=True Positives(True Positives+False Negatives)                                (eq. 1)

All predictive models for the PBD were trained and tested on the 128 acoustic features generated by VGGish. The UMAP plots were used to visually inspect the structure of the PBD and WMD features datasets. For the WMD dataset, we used violin plots to explore the distribution of the two UMAP dimensions in relation to the clusters of data points labelled according to taxonomic group, species, and location of origin of the corresponding audio samples.  M

Funding

Memorial University of Newfoundland, Ph.D. Program Funding

Fisheries and Oceans Canada, Species at Risk, Oceans Protection Plan, and Marine Ecosystem Quality programmes of the Department of Fisheries and Oceans Canada, Newfoundland and Labrador Region

University of Parma, Ph.D. program in Evolutionary Biology and Ecology (University of Parma, agreement with University of Ferrara and University of Firenze)