A novel method for estimating avian roost sizes using passive acoustic recordings
Data files
Sep 25, 2024 version files 1.56 GB
-
20200820.WAV
285.34 MB
-
20200831.WAV
386.44 MB
-
20200903.WAV
334.93 MB
-
20210205.WAV
331.99 MB
-
20210217.WAV
225.48 MB
-
README.md
1.18 KB
Sep 25, 2024 version files 1.56 GB
-
20200820.WAV
285.34 MB
-
20200831.WAV
386.44 MB
-
20200903.WAV
334.93 MB
-
20210205.WAV
331.99 MB
-
20210217.WAV
225.48 MB
-
README.md
1.55 KB
Abstract
Communal bird roosts serve as information centers and a means of thermoregulation for many species. While some communally roosting species are major pests and cause dis-amenities, others are of conservation concern. Estimating the population of roosting birds can provide a useful proxy of population size and possibly a more reliable estimate than other sampling techniques. However, estimating these populations is challenging as some roosts are large and often occluded in foliage. Previous acoustic methods such as paired sampling, microphone arrays, and use of call rate have been used to estimate bird abundances; however, these are less suited for estimating large roost populations where hundreds of individuals are calling in unison. To address this challenge, we explored using machine learning techniques to estimate a roost population of the Javan Myna, Acridotheres javanicus, an invasive species in Singapore. While one may expect to use sound intensity to estimate roost sizes, it is affected by various factors such as the distance of the recorder, local propagation conditions (e.g., buildings and trees), weather conditions, and noise from other sources. Here, we used a deep neural network to extract higher-order statistics from the sound recordings and use those to help estimate roost sizes. Additionally, we validated our method using automated visual analysis with a dual-camera setup and manual bird counts. Our estimated bird counts over time using our acoustic model matched the automated visual estimates and manual bird counts at a selected Javan Myna roost, thus validating our approach. Our acoustic model estimated close to 400 individual mynas roosting in a single tree. Analyses of additional recordings of Javan Myna roosts conducted on two separate occasions and at a different roost location using our acoustic model showed that our roost estimates over time also matched our automated visual estimates well. Our novel approach to estimating communal roost sizes can be achieved robustly using a simple portable acoustic recording system. Our method has multiple applications such as testing the efficacy of avian roost population control measures (e.g., roost tree pruning) and monitoring the populations of threatened bird species that roost communally.
README: A novel method for estimating avian roost sizes using passive acoustic recordings
This dataset consists of sample acoustic recordings of roosting Mynas.
Description of the data and file structure
This dataset consists of five WAV
files named for the dates in YYYYMMDD format.
File list:
20200820.WAV
20200831.WAV
20200903.WAV
20210205.WAV
20210217.WAV
Code/Software
The code to estimate roost size based on recordings is available here.
Installation
- Download and install Julia 1.5 or greater (https://julialang.org/downloads/)
- Ensure
julia
is in your PATH - Run
julia mynacounter.jl
or./mynacounter.jl
to complete the installation
Usage
usage: mynacounter.jl [--csv] [-o OUTPUT] [-h] filename
positional arguments:
filename WAV filename
optional arguments:
--csv Save output in CSV file
-o, --output OUTPUT Output filename (default: "output.csv")
-h, --help show this help message and exit
Example
julia mynacounter.jl --csv 20200820.WAV
will process file 20200820.WAV
to produce an estimate of the myna roost and also an output.csv
file that can be opened with a spreadsheet software containing 3 columns (lower
, median
, upper
) representing the least, best and highest estimates of vocalizing mynas in the roost, as a function of time (each row represents one minute of data).
Methods
1. Video analysis technique
To develop an acoustic technique to estimate myna roost sizes, we had to ground truth the number of roosting mynas to calibrate our acoustic model. To achieve this, we developed an automated visual technique to count mynas flying in and out of a roost site early in the evenings, before mynas start arriving at the site. The difference between the cumulative number of mynas arriving at the site and leaving the site is the number of mynas at the roost site.
While manually counting mynas coming in and out of a roost site is possible, it is error-prone and labour-intensive. We therefore focused on the development of an automated visual technique based on the analysis of video recordings from two cameras pointed at the roost tree from different angles. The two cameras together provided a full view of the tree from all angles, so all arriving and leaving mynas could be counted. Additionally, we validated the technique by manually counting the mynas in one set of video recordings and compared it against the visual automated analysis.
We chose a roost tree that was separated from nearby trees (i.e., non-joining canopies) and other nearby occlusions, such that we could see mynas coming in and out of the tree from all directions. Two cameras were deployed facing the tree from about 1-2 hours before sunset, until well after sunset. This covers the time during which the mynas arrive at the tree, and lasts until the end of the acoustic measurements that we compare the camera's counts against. Camera 1 was usually set up on the south-east of the tree, and Camera 2 on the north-west. We collected multiple datasets from the same roost site on different days, at different times of the year (Table 1).
To automate our detection of birds flying in and out of the roost site, we drew boundaries around the tree and counted birds crossing the boundaries in either direction. We call these boundaries “virtual markers”. Whenever a bird crossed a marker, we estimated its direction of flight, determined if it was flying in or out, and updated the estimated bird count at the roost site.
Detecting dark birds against a light sky background (even in twilight hours with sufficient light) was reliably achieved with simple image processing techniques. We used a rapid change in the brightness of pixels on the marker for bird detection. We added a minimum required time gap between detections in the same location in the image to avoid duplicate detections from the same bird flapping its wings or moving in a way that causes the brightness to oscillate as the bird crosses the marker.
While two cameras ensured that we had a complete view of the roost tree to see birds arriving from all directions, it also posed a challenge. A single bird might be seen on both cameras and could be double-counted. Birds crossing the marker from the south-west or north-east could be potentially detected on both cameras. To avoid double-counting, we had to associate detections from both cameras and only count detections on one of the cameras. This was achieved with heuristics such as proximity in time, detection of opposite boundaries on the two cameras, and direction of flight. The dataset collected on 3 September 2020 was used as the primary dataset for validation of the visual analysis technique (Table 1). For this dataset, we performed manual counting of birds by carefully watching videos from both cameras and annotating the arrival and departure of each bird.
2. Acoustic recording analysis technique
The audio dataset collected on 3 September 2020 was used as training data in our acoustic analysis. An acoustic recorder was set up close to Camera 1 during data collection. The exact locations of the cameras and the acoustic recorder differed on different days, as the intent was to make the techniques robust against small differences in the recorder setup. Both cameras and the acoustic recorder were synchronised in time.
The audio data was collected using a Zoom H6 recorder and an Electro-Voice ND66 condenser cardioid instrument microphone. The directional microphone was mounted on a tripod and placed about 5–10 m from the roost tree of interest and pointed into the centre of the foliage of the tree. The acoustic technique developed is not sensitive to the exact distance, as long as the roost chorus is audible at the microphone and the roost does not span more than a 90° angle from the microphone. The microphone has a beamwidth of about 90°, which was sufficient to cover the roost site, but not so wide as to pick up significant noise from other nearby roost sites.
In the time series of the recorded data, the sound intensity increased as the roost chorus got louder through the evening. The sudden drop in intensity at the 1 hour 24-minute mark occurred during a disturbance, and then gradually increased as the birds returned to their roost. After sunset, the roost chorus gradually fades till the birds stop vocalizing. Several loud events also can be observed throughout the recording, representing noises that are inevitable when recording in uncontrolled settings and public places.
While the data at first glance suggested we could use the acoustic time series amplitude to estimate roost sizes, it can be confounded by multiple factors. These include the distance between the roost site and the recorder, the environmental acoustic propagation conditions, the local noise sources, the gain settings on the recorder, and the pointing direction of the microphone. Thus, the time series amplitude might not represent a close proxy of roost size since these variables were difficult to control operationally. As such, we considered other properties of the acoustic time series in our analysis.
3. Machine learning
A traditional approach to finding acoustic time series properties of interest would be to handcraft features based on temporal statistics of the time series data. Such features often include ratios of power spectral densities at various frequencies, and other higher-order temporal statistics. These handcrafted features can then be used for regression analysis to calibrate a model. Here, we applied a deep neural network (DNN) to learn the features from the time series data.
Before feeding the time series data to a DNN, we decided to bandpass filter the data to remove frequencies that were dominated by traffic and other urban sounds and did not contain much roost chorus. Since the roost chorus was mostly in the 1–5 kHz band, we applied a digital finite impulse response (FIR) bandpass filter (with 128 taps) to remove other sounds. The recording was then down-sampled at 16384 Hz, well above Nyquist frequency, to reduce the number of time series samples in the recording. The recorded time series was then split into 4096 sample blocks (250 ms blocks) and used as input to the DNN.
We used a 1D convolutional DNN with three convolutional layers, one mean pooling layer, followed by three dense fully connected layers in the DNN, working directly with the acoustic data at the input. This is quite different from common approaches in DNN, where the data is first converted to a 2D spectrogram image and fed to a 2D convolutional DNN designed to work with images. Here, the 2D spectrogram conversion was unnecessary, and potentially detrimental to the retention of information in the acoustic recording as spectrogram conversion loses phase information from the original time series data. We used a normalization layer at the input of the DNN, removing any cues on acoustic intensity, as we did not want the DNN to learn to use the relationship with intensity for roost size estimation. While the relationship with intensity was strong, it was unreliable due to factors discussed earlier. We used a scaling layer at the output of the DNN to convert a normalized output to a scale that is relevant to typical roost sizes. This scaling was determined based on the roost size estimates from the video analysis of the same dataset. Appropriate scaling aids in the training of the DNN and can be considered a hyperparameter of the model, but is not required to be very accurate; we only require to know the rough order of magnitude roost size to choose the scale.
To train the network, we used a backpropagation-based learning algorithm (ADAM) with a learning rate of 0.002, early stopping, and a mini-batch size of 128 samples. This learning rate, as well as the DNN architecture, was obtained through many hyper-parameter optimization runs. We used 85% of the available 250 ms blocks as training samples, and the remaining 15% as validation samples. Furthermore, we augmented the training data with four weakly filtered (2-tap finite impulse response filters with weights [1.0, ±1.0] and [1.0, ±0.5]) versions of the input data. The augmentation helped to improve the robustness of the DNN to changes in environmental acoustic propagation and noise conditions. The trained DNN was saved to disk for use in an acoustic roost size estimation model described next.
4. Acoustic roost size estimation model
The trained DNN from the previous section formed the heart of the acoustic roost size estimation model. In order to estimate the roost size, we bandpass filtered the acoustic recording in the 1–5 kHz band and split it into one-minute blocks. We then further split each one-minute block into 250 ms chunks, as we did during training. These chunks were fed to the trained DNN, and the bird count estimates from each one-minute block were statistically pooled to yield a median estimate and a 50% confidence interval. The process was repeated for each 1-minute block to yield a time series of bird count estimates. The largest bird count during an evening of data collection could then represent our roost size estimate.
5. Analysis of different data sets for model validation
We first developed an acoustic roost size estimation model based on training data recorded on 3 September 2020 (Table 1). We tested this model with datasets recorded at the same roost site on other days, months later, and also on datasets recorded at other roost sites to demonstrate the robustness of the trained model.
The primary dataset we used for validating our acoustic roost size estimation model was a dataset collected on 17 February 2021, about five months after the training dataset (Table 1). While we collected this dataset at the same location, a lot had changed between the two collections. The seasons had changed, the trees were pruned, the acoustic recorder was set up at a different location and a different directional microphone was used. So, for all practical purposes, this acoustic dataset was independent of the dataset collected on 3 September 2020.
To further validate the acoustic model, we presented data from yet another dataset collected at the same roost site on 20 August 2020, about two weeks before the training data was collected (Table 1). This dataset, unfortunately, had a problem with the microphone cable after about 18 minutes, so the comparison can only be made until then. Data from a different roost site (site 2) was collected on 31 August 2020 (Table 1). The site did not permit full visual analysis due to occlusions on one side of the roost tree. However, camera recordings were made to provide a rough visual estimate.
We also collected a dataset on 5 February 2021, but it was very windy. To reduce wind noise, we put on a wind cover (foam cover) on the microphone during the data collection (Table 1). Later, it was clear that it sounded more muffled than the usual recordings. A spectral analysis of the recording showed that the high-frequency sounds were significantly attenuated in this recording. To test the robustness of the acoustic model, we processed this recording using the acoustic roost size estimation model and compared it with the estimates from visual analysis of camera recordings.
Table 1: Data collected for training and validating the acoustic model and validating the visual counts
Date | Data type | Purpose | Remarks | |||
Audio | Visual | Training (acoustic) | Validation (visual) | Validation (acoustic) | ||
3-Sep-20 | Yes | Yes | Yes | Yes | - | |
17-Feb-21 | Yes | Yes | - | - | Yes | Primary data for validation |
20-Aug-20 | Yes | Yes | - | - | Yes | Only 18 minutes of audio recorded |
31-Aug-20 | Yes | Partial | - | - | - | Nearby vegetation occluded roost tree |
5-Feb-21 | Yes | Yes | - | - | Yes | Wind cover used |