Ndege Zetu: A dataset to compare bird species monitoring approaches in the Mt Kenya ecosystem
Data files
Feb 24, 2025 version files 1.21 GB
-
ndege-zetu.zip
1.21 GB
-
README.md
7.06 KB
Abstract
Biodiversity loss is a pressing challenge with ecosystems across the world under threat from factors such as human encroachment, over exploitation and climate change. It is important to increase ecosystem monitoring efforts to provide actionable insights for ecosystem managers and to allow effective use of conservation resources. This dataset is used to compare traditional bird survey approaches using point counts to the use of autonomous recording units and citizen scientists data at two sites within the Mt Kenya ecosystem. We also present a new dataset of over 20 hours of recordings obtained from the Mt Kenya ecosystem and annotated by expert ornithologists. These audio recordings are used to demonstrate the use of large deep learning models to recognise species in the Mt Kenya ecosystem.
README: Ndege Zetu: A dataset to compare bird species monitoring approaches in the Mt Kenya ecosystem
https://doi.org/10.5061/dryad.d51c5b0c7
Description of the data and file structure
We present a dataset used to compare traditional bird survey approaches using point counts to the use of autonomous recording units and citizen scientists data at two sites within the Mt Kenya ecosystem.The dataset contains over 20 hours of new recordings obtained from the Mt Kenya ecosystem and annotated by expert ornithologists. The two sites are the Dedan Kimathi University Wildlife Conservancy (DeKUWC) and the Mt Kenya National Park (MKNP).
Files and variables
File: ndege-zetu.zip
Description:
This file contains three folders namely:
- annotations
- audio
- embeddings
The files in these directories are as shown below
├── annotations
│ ├── dekuwc-aru-2016.csv
│ ├── dekuwc-aru-2017.csv
│ ├── dekuwc-kbm.csv
│ ├── dekuwc-pc-2017.csv
│ ├── forest-birds-KE-UG.csv
│ ├── Kenya-Species-List.csv
│ ├── mknp-aru-2017-2018.csv
│ ├── mknp-kbm.csv
│ ├── mknp-pc-2017-2018.csv
│ ├── MKNP-PC-Cues.csv
│ └── single_species_filenames.json
├── audio
│ ├── DeKUWC-10-2017-01-14-06-30-05.mp3
│ ├── DeKUWC-10-2017-01-14-07-00-05.mp3
│ ├── DeKUWC-10-2017-01-14-07-40-05.mp3
...
├── embeddings
│ ├── DeKUWC-1-2016-01-05-12-35-01.npz
│ ├── DeKUWC-1-2016-01-05-16-35-01.npz
│ ├── DeKUWC-1-2016-01-05-16-40-01.npz
...
Annotations
The files in these folder contain
- Annotations of recordings in the audio directory:
dekuwc-aru-2016.csv
,dekuwc-aru-2017.csv
andmknp-aru-2017-2018.csv
- Point count data from the Dedan Kimathi University Wildlife Conservancy (DeKUWC) and the Mt Kenya National Park (MKNP):
dekuwc-pc-2017.csv
,mknp-pc-2017-2018.csv
,MKNP-PC-Cues.csv
- Data from the Kenya Bird Map (KBM):
dekuwc-kbm.csv
,mknp-kbm.csv
- Audio filenames used to train single species classifiers:
single_species_filenames.json
- Species lists: Kenyan bird species (
Kenya-Species-List.csv
) and forest birds from Kenya and Uganda (forest-birds-KE-UG.csv
)
Audio annotations
The audio annotations are weak annotations indicating the species observed in the recording. In addition we indicate whether the species was judged to be in the foreground or background of the recording. Optional remarks are included. Each annotation file has four columns namely:
- Filename
- Foreground Species
- Background Species
- Remarks
Point Count Data
These data are stored as presence absence matrices with the rows representing the species observed and the columns representing the point count.
Kenya Bird Map Data
This is data derived from full-protocal cards and is stored in the same format as the point count data as presence absence matrices with the rows representing the species observed and the columns representing the full protocal card number.
Audio
These are MP3 recordings obtained from DeKUWC and MKNP. The naming convention is {site}-{PC-Location}-YYYY-MM-DD-HH-MM-SS.mp3
. Where site will be DeKUWC or MKNP and the PC location is an integer from 1 to 10 representing the 10 point count locations per site. The time the audio recording was captured is encoded in the filename as YYYY-MM-DD-HH-MM-SS
There are 3893 minute long recordings with 1192 of these being new recordings (~20 hours). 2701 are from an earlier dataset
wa Maina, Ciira; Muchiri, David; Njoroge, Peter (2017). Data from: A bioacoustic record of a conservancy in the Mount Kenya ecosystem [Dataset]. Dryad. https://doi.org/10.5061/dryad.69g60
Embeddings
For each recording in the audio directory, we derive dimensional embeddings from the Google Bird Vocalization Classifier (aka Perch). Each minute long recording is divided in to 12 five second segments and a 1280 dimensional embedding is derived for each segment. The 1280 by 12 dimensional array is stored as a numpy array (.npz)
Code/software
This code is used to analyse the data in this repository which compares traditional bird survey approaches using point counts to the use of autonomous recording units and citizen scientists data at two sites within the Mt Kenya ecosystem.
We demonstrate the use of embeddings obtained from Google's Bird Vocalization Classifier (aka Perch) to train classifiers for the species observed.
Code Directory
The code directory contains the following files.
bvc_helper_funcs.py
: Functions to compute embeddings using the Bird Vocalization Classifierbirdnet_eval.py
: Code to evaluate BirdNET on the annotated databvc_eval.py
: Code to evaluate Perch (Google's Bird Vocalization Classifier) on the annotated databirdnet-perch-comp.ipynb
: Notebook to compare BirdNET and Perch on annotated datacompute_file_embeddings.py
: A file to compute embeddings from audio filesmlp-multi-species.ipynb
: A notebook to train the multi label multilayer perceptron classifier for Mt Kenya datafew-shot.ipynb
- A notebook describing few shot learning experimentsforest-species.ipynb
- An analysis of forest species observed via point counts, citizen science and autonomous recording unitslr-mlp-single-species.ipynb
- Code to train single species classification models using embeddingspc-audio-kbm-cmp.ipynb
- Code for statistical comparison of point counts, citizen science and autonomous recording unitssingle-species-files.ipynb
- Code to detemine audio files with single speciesspecies-accumulation.ipynb
- Code to compute species accumulation curves for point countsvenn-species-counts.ipynb
- Code to produce venn diagrams to analyse species overlap for point counts, citizen science and autonomous recording units. The code also analyses species more often heard than seen in MKNP by processingMKNP-PC-Cues.csv
.
To run these files, install the requirements using the procedure described below. An accompanying GitHub repo is https://github.com/DeKUT-DSAIL/ndege-zetu.
Requirements
See the requirements.txt
file
Installation procedure
Create a virtual environment
python3.10 -m venv bird-env
The update pip
pip install --upgrade pip
Install the requirements
pip install -r requirements.txt
Access information
2701 of the audio files were was derived from the following sources:
- https://datadryad.org/stash/dataset/doi:10.5061/dryad.69g60 - licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license.