Many analyses of birdsong require time-consuming manual annotation of the individual elements of song, known as syllables or notes. We developed the first automated algorithm for birdsong annotation, "TweetyNet", that is applicable to complex song such as canary song. TweetyNet is trained with a small amount of hand-labeled data using supervised learning methods. We evaluate the amount of data required for training TweetyNet models using vocalizations of two songbird species - Bengalese finches and Canaries. This dataset contains song audio files and their accompanying annotation files for the three canaries used in this analysis.

This dataset was acquired between late April and early May 2018 - a period during which canaries perform their mating season songs. Birds were individually housed in soundproof boxes and recorded for 7-10 days (Audio-Technica AT831B Lavalier Condenser Microphone, M-Audio M-track amplifiers, and VOS games' Boom Recorder software on a Mac Pro desktop computer). In-house software was used to detect and save only sound segments that contained vocalizations.

The vocalizations of 3 canaries are in 3 separate folders.

Two annotation files describe all the labeled vocalization segments of each animal in two formats:

1. A .csv file contains the annotations as a table with a row for each annotated canary syllable and columns:

label - Identity of syllable
onset_s - Time (sec from file onset) of the syllable onset
offset_s - Time (sec from file onset) of the syllable offset
onset_Hz
offset_Hz
audio_file - Path to audio file
annot_file - Path to annotation file
sequence - n.a.
annotation - number of audio files

2. A Matlab file containing 2 cell arrays:

keys - cell array of srtings - audio file names.
elements - cell array if structs (matching the files in 'keys') with fields:
- filenum: file number
- segFileStartTimes (vector): Time (sec from file onset) of the syllable onsets
- segFileEndTimes (vector): Time (sec from file onset) of the syllable offsets
- segType (vector): syllable identities

Song recordings and annotation files of 3 canaries used to evaluate training of TweetyNet models for birdsong segmentation and annotation

Data files

Abstract

Song recordings and annotation files of 3 canaries used to evaluate training of TweetyNet models for birdsong segmentation and annotation

Data files

Abstract

Methods

Usage notes

Works referencing this dataset