Data from: Acoustic behavior of endangered Hawaiian false killer whales

Madrigal, Brijonnay 1 2 ; Gough, William 2 ; Currie, Jens2; Bejder, Lars2; Hollers, Augusta2; Baird, Robin3; Mooney, Aran4; Pacini, Aude2

Published Dec 17, 2025 on Dryad. https://doi.org/10.5061/dryad.s7h44j1n6

Data files

Dec 17, 2025 version files 87.78 MB

README.md

14.12 KB
RSOS-250918_dryadsubmission.zip

87.77 MB

Abstract

This dataset accompanies the study: Acoustic behavior of endangered Main Hawaiian Islands insular false killer whales *(Pseudorca crassidens) *and contains data and analyses used to characterize the pulsed call repertoire, nonlinear phenomena, and behavioral context of calling. Biologging tags (CATS and DTAG) were deployed on four individuals from two social clusters around the Main Hawaiian Islands between 2011 and 2024. The CATS tags were sampled at 96 kHz, and the DTAG at 240 kHz. Data was processed using Raven Pro and PAMGuard.

The repository includes the following:

Data inputs (kinematic/acoustic data tables)
Data used in the statistical analysis
Interobserver reliability test scores
Spreadsheets of call features from each trace extracted using the ROCCA module in PAMGuard for each call type
Examples of audio files used to generate spectrograms in the Supplementary material (S1)

A total of 5,940 high-quality possible focal pulsed calls were analyzed, from which 52 stereotyped call types were identified and categorized based on their fundamental frequency contours. Please cite this dataset and the associated Royal Society Open Science publication when using these data.

Dataset DOI: 10.5061/dryad.s7h44j1n6

Description of the data and file structure

Two suction cup tag types were used to collect acoustic and kinematic data: digital acoustic recording tags (DTAG version 3) and customized animal tracking solutions (CATS) tags . Both tags were equipped with acoustic sensors to record sound production, a pressure sensor to record depth, and a suite of inertial sensors to measure animal orientation and allow for fine-scale movement tracking. Tags were deployed by pole during multiple boat-based field efforts in 2011 (off Hawaiʻi Island) and from 2023-2024 (off the islands of Maui and Lānaʻi). HTI-96 mini hydrophones (sampling rate 96 kHz, 16 bit resolution, flat frequency response from 2 Hz – 30 kHz) were integrated in the CATS tags with a hydrophone-specific sensitivity ranging from -169.4 to 170.2 re: 1V uPa-1. CATS tags were equipped with multiple inertial sensors including tri-axial accelerometers (SR 400 Hz), magnetometers (SR 50 Hz), gyroscopes (SR 50 Hz), a light sensor (LED headlight) and a high-resolution (2K) video camera. The DTAG sampled at 240 kHz with a nominal tag hydrophone sensitivity of –175 dB re 1 V/uPa.

Description of the data and file structure

The dataset in RSOS-250918_dryadsubmission.zip is comprised of five folders:

FOLDER 1: 1_tag_data_tables

.mat files with _acoustics

[Column 1] SNR_values_t - signal to noise (SNR) ratio (threshold used indicated in the filename.

[Column 2] SNR_files_t - name of exported signal .wav file where selection in Raven Pro was taken from

[Column 3] C_filename - name of exported call .wav file where selection in Raven Pro was taken from

[Column 4] Selection - selection number from Raven Pro table

[Column 5] BeginFile - name of original tag .wav file where signal/call selections were exported from.

[Column 6] BeginDateTime - Date/time of start of call in YYY/MM/DD hh:mm:ss.SSSS.

[Column 7] BeginTime_s - Time of start of call in seconds (Beginning of Raven bounding box)

[Column 8] EndClockTime - Time of end of call in hh:mm:ss.SSSS (End of Raven bounding box)

[Column 9] EndTime_s - Time of end of call in seconds (End of Raven bounding box)

[Column 10] SelectionType_S_N_C_ - S = signal only, C= signal with buffer in time before and after the call for tracing in ROCCA.

[Column 11] Call_ - Arbitrary call number since first call in Raven from .wav file.

[Column 12] Animal_F_NF_ - F = focal animal, NF = non0focal animal, U = unsure will need to designate based on SNR

[Column 13] Clicks__R_0_ - Clicks present (0, R= rasp, or if missing any value (empty cell) clicks were present)

[Column 14] ClickAssociation - Clicks association with the call (C- yes with the call; U - unknown)

[Column 15] SecondarySidebands_P_A_ - Indicates if secondary sidebands are present (P) or absent (A) in the call.

[Column 16] Notes - Additional call notes.

[Column 17] Overlap - 1= no overlap with focal contour but non-focal signal in box (added K for keep at the end if focal overlap in box is not major and could still be included); 2= overlap OF focal contour with non-focal signal; 3= no overlap with focal contour but potentially non focal overlap in box; 4= overlap with focal contour with contour that is also potentially non-focal so unclear which is the focal signal; B= Buzz present; R - Remove (clicks hard to tell which belongs to which); C= background clicks

[Column 18] AudioCut_Out - Y: yes occurring

[Column 19] LoudBackground - Y: yes occurring

[Column 20] Chaos - P: present

[Column 21] Sequence - Indicates if call is part of a sequence (e.g. 1/3,2/3,3/3). Denominator of the fraction indicates the number of calls in the sequence.

[Column 22] CallType_Final - Earlier call classification (unpublished).

[Column 23] deployment_callnum - Consecutive call number from start of deployment.

[Column 24] Call_Type_Category - Earlier call classification (unpublished).

[Column 25] CallNumber - Earlier version call number (unpublished).

[Column 26] manuscript_classification - Final classification used in the manuscript based on published call catalog.

[Column 27] TagNum - 1= DTAG_102011_SNR22_v2_acoustics; 2 = CATSJ1022023_SNR25; 3 = CATSJ1102024_SNR29; 4 = CATSJ4_102024_SNR40

.mat files with _acoustics_kinematics

[Columns 1 - 4, 6-11] - (These columns were not used for this study.)

[Column 5] - DiveStates - Surface (0–2m— Dive State 0), Descent

(Dive State 1), Ascent (Dive State 2), Shallow bottom phase (or less than 100 m—Dive State 3), Deep

bottom phase (deeper dives > 100 m—Dive State 4).

[Column 12] - SwimSpeed - Swim Speed (m/s) calculated from CATS tag inertial sensor data.

[Column 13] - Depth - Animal depth (m) calculated from CATS tag inertial sensor data.

[Column 14-27] - (Columns transferred over from the '_acoustics' .mat files. Column information detailed above.)

[Column 28] - Jerk - calculated from CATS tag inertial sensor data.

[Column 29] - ID - CATS tag number (pc230215-J1, pc241020-J1,pc241020-J4)

[Column 30] - TagNum - 1= DTAG_102011_SNR22_v2_acoustics; 2 = CATSJ1022023_SNR25; 3 = CATSJ1102024_SNR29; 4 = CATSJ4_102024_SNR40

[Column 28] - Calling - 0 - no call associated with that CATS tag PRH times. 1 - call associated with that specific CATS tag PRH times

.mat files with _acoustics_kinematics_callsonly

Only rows from .mat files with _acoustics_kinematics that have calls (Calling column has a value of 1)

.mat files with _acoustics_kinematics_plotting

All rows from .mat files with _acoustics_kinematics were included in this .mat file but only rows where the tag was on the animal were included (beginning/end of table clipped based on time the tag was on/off animal - refer to Table 1 in the manuscript for those times).

FOLDER 2: 2_statistical_analysis

GLMM_data_60s.RData: R workspace that contains the data table used for the final generalized linear mixed (GLMM) model

[Column 1] DiveStates - Minute bin from time in deployment.

[Column 2] Calling - 0= no calling recorded in that bin, 1= calls recorded in that bin.

[Column 3] Speed_speed - Average speed (m/s) calculated across a 60 second time interval and derived from the speed column (Column 12 from '_acoustics_kinematics' .mat file)

[Column 4] Depth - Average depth (m) calculated across a 60 second time interval and derived from the depth column (Column 13 from '_acoustics_kinematics' .mat file)

[Column 5] Hour_mode - 0= start of the deployment and numbers continue in sequential order.

[Column 6] TagNum - 1= DTAG_102011_SNR22_v2_acoustics; 2 = CATSJ1022023_SNR25; 3 = CATSJ1102024_SNR29; 4 = CATSJ4_102024_SNR40

[Column 7] Time - Sequential numbering order per deployment based on 60 second bins (e.g. 1 is the first 60 second bin).

[Column 8] Time_of_Day - Hour of the day.

2. NBR_all_tag_callrates_divestates.csv: Csv file that contained the tag call rates included in the negative binomial regression (NBR).

[Column 1] Minute - Minute bin from time in deployment.

[Column 2] Call_rate - Call rate calculated as number of calls recorded per minute.

[Column 3] Speed_average - Average speed (m/s) derived from the speed column (Column 12 from '_acoustics_kinematics' .mat file)

[Column 4] Depth_average - Average depth (m) derived from the depth column (Column 13 from '_acoustics_kinematics' .mat file)

[Column 5] Tag_num - 1= DTAG_102011_SNR22_v2_acoustics; 2 = CATSJ1022023_SNR25; 3 = CATSJ1102024_SNR29; 4 = CATSJ4_102024_SNR40

[Column 6] Dive_state_mode - - Surface (0–2m— Dive State 0), Descent

(Dive State 1), Ascent (Dive State 2) and Shallow bottom phase (or less than 100 m—Dive State 3). If two dive states occurred, each accounting for 40–60% of samples, the dive state was considered a transition dive state ('to' indicated in the spreadsheet).

FOLDER 3: 3_interobserver_reliability_test

Results from the Interobserver reliability test in IORT_summaryscores.csv. Columns descriptions below:

[Column 1] Actual_class - Call type number from call catalog.

[Column 2] Observer_1 - Observer 1 classification results

[Column 3] Observer_2 - Observer 2 classification results

[Column 4] Observer_3 - Observer 3 classification results

[Column 5] TRUE - Number of observers whose classification aligned with the correct call classification.

[Column 6] FALSE - Number of observers whose classification did not aligned with the correct call classification.

FOLDER 4: 4_audio_examples_S1

Exemplar audio files (.wav) for each call type for both CATS and DTAG data.

FOLDER 5: 5_ROCCA_tables

CATS and DTAG folders: Contain a RoccaContourStats .csv for each call type with each individual row representing an individual trace extracted from ROCCA. Refer to Appendix A in the ROCCA (Real-time Odontocete Call Classification Algorithm) User’s Manual for a description of each column:

Begsweep- slope of the beginning sweep (1 = positive, -1 = negative, 0 = zero)

Begup- binary variable: 1=beginning slope is positive, 0=beginning slope is negative

Begdwn- binary variable: 1=beginning slope is negative, 0=beginning slope is positive

Endsweep- slope of the end sweep (1 = positive, -1 = negative, = 0 zero)

Endup- binary variable: 1=ending slope is positive, 0=ending slope is negative

Enddwn- binary variable: 1=ending slope is negative, 0=ending slope is positive

Beg- beginning frequency (Hz)

End- ending frequency (Hz)

Min- minimum frequency (Hz)

Dur- duration (sec)

Range- maximum frequency–minimum frequency (Hz)

Max- maximum frequency (Hz)

mean freq- mean frequency (Hz)

median freq- median frequency (Hz)

std freq- standard deviation of the frequency (Hz)

Spread- difference between the 75th and the 25th percentiles of the frequency

quart freq- frequency at one quarter of the duration (Hz)

half freq- frequency at one half of the duration (Hz)

Threequart- frequency at three quarters of the duration (Hz)

Centerfreq- (minimum frequency + (maximum frequency-minimum frequency))/2

rel bw- relative bandwidth: (max freq - min freq)/center freq

Maxmin- max freq/min freq

Begend- beg freq/end freq

Cofm- coefficient of frequency modulation: take 20 frequency measurements equally spaced in time, then subtract each frequency value from the one before it. COFM is the sum of the absolute values of these differences, all divided by 10,000

tot step- number of steps (10 percent or greater increase or decrease in frequency over two contour points)

tot inflect- number of inflection points (changes from positive to negative or negative to positive slope)

max delta- maximum time between inflection points

min delta- minimum time between inflection points

maxmin- delta max delta/min delta

mean delta- mean time between inflection points

std delta- standard deviation of the time between inflection points

median delta- median of the time between inflection points

mean slope- overall mean slope

mean pos- slope mean positive slope

mean neg- slope mean negative slope

mean absslope- mean absolute value of the slope

Posneg mean- positive slope/mean negative slope

perc up- percent of the whistle that has a positive slope

perc dwn- percent of the whistle that has a negative slope

perc flt- percent of the whistle that has zero slope

up dwn- number of inflection points that go from positive slope to negative slope

dwn up- number of inflection points that go from negative slope to positive slope

up flt- number of times the slope changes from positive to zero

dwn flt- number of times the slope changes from negative to zero

flt dwn- number of times the slope changes from zero to negative

flt up- number of times the slope changes from zero to positive

step up- number of steps that have increasing frequency

step dwn- number of steps that have decreasing frequency

step.dur- number of steps/duration

inflect.dur- number of inflection points/duration

Citation:

Oswald, J. N., & Oswald, M. (2013). ROCCA (Real-time Odontocete Call Classification Algorithm) User’s Manual. prepared for Naval Facilities Engineering Command Atlantic, Norfolk, Virginia under HDR Environmental, Operations and Construction, Inc., Contract No. CON005-4394-009, Subproject, 164744.

2. All_Rocca_Summary_Stats_averages.csv: The mean and standard deviation of the following ROCCA variables:

FREQBEG (units = Hz), FREQBEG_KHZ (units = kHz),FREQEND (units = Hz), FREQEND_KHZ (units = kHz), FREQMIN (units = Hz), FREQMIN_KHZ (units = kHz), FREQMAX (units = Hz), FREQMAX_KHZ (units = kHz), FREQRANGE (units = Hz), FREQRANGE_KHZ (units = kHz), DURATION (units = s). These averages/SD were extracted from the .csv spreadsheets included in the folders detailed above.

3. call_parameters_comparison_withliterature.csv: This spreadsheet contains a comparative summary of prior summary variables included in other studies.

[Column 1] - Source - Paper citation where data was summarized from.

[Column 2] - Region - Population of false killer whales/ region where the data was collected.

The following acoustic parameters were also included in columns 3-14:

StartF(mean) - start frequency mean

StartF(SD) - start frequency standard deviation

EndF(mean) - end frequency mean

EndF(SD) - end frequency standard deviation

MinF(mean) - minimum frequency mean

MinF(SD) - minimum frequency standard deviation

MaxF(mean) - maximum frequency mean

MaxF(SD) - maximum frequency standard deviation

Bandwidth(mean) - bandwidth mean

Bandwidth(SD) - bandwidth standard deviation

Duration(mean) - duration mean

Duration(SD) - duration standard deviation

Code/software

MATLAB or R is required to view the data.

Access information

For access to the raw .wav files used in this study, please contact the corresponding author at brijonnay.madrigal@gmail.com

Data from: Acoustic behavior of endangered Hawaiian false killer whales

Data files

Abstract

README: Data from: Acoustic behavior of endangered Hawaiian false killer whales

Description of the data and file structure

Description of the data and file structure

FOLDER 1: 1_tag_data_tables

FOLDER 2: 2_statistical_analysis

FOLDER 3: 3_interobserver_reliability_test

FOLDER 4: 4_audio_examples_S1

FOLDER 5: 5_ROCCA_tables

Code/software

Access information