Time-lapse cameras facilitate remote and high-resolution monitoring of wild animal and plant communities, but the image data produced require further processing to be useful. Here we publish pipelines to process raw time-lapse imagery, resulting in count data (number of penguins per image) and ‘nearest neighbour distance’ measurements. The latter provide useful summaries of colony spatial structure (which can indicate phenological stage) and can be used to detect movement – metrics which could be valuable for a number of different monitoring scenarios, including image capture during aerial surveys. We present two alternative pathways for producing counts: 1) via the Zooniverse citizen science project Penguin Watch and 2) via a computer vision algorithm (Pengbot), and share a comparison of citizen science-, machine learning-, and expert- derived counts. We provide example files for 14 Penguin Watch cameras, generated from 63,070 raw images annotated by 50,445 volunteers. We encourage the use of this large open-source dataset, and the associated processing methodologies, for both ecological studies and continued machine learning and computer vision development.
Penguin Watch Manifest
This file contains metadata for each of the 63,070 different images, captured by 14 Penguin Watch time-lapse cameras, included in the repository found at: DOI: https://doi.org/10.5061/dryad.vv36g. The following variables are provided: image name, date/time, Zooniverse ID, path, classification count, state, temperature in Fahrenheit, lunar phase, and a URL link to a thumbnail image. Please see the associated paper for an explanation of these variables.
PW_Manifest.csv
Kraken Files
This folder contains 14 'Kraken Files' - one for each of the time-lapse cameras described in Jones et al. (2018); DOI: https://doi.org/10.1038/sdata.2018.124. These 'Kraken Files' contain filtered Penguin Watch ‘consensus click’ data combined with metadata (date/time, temperature in Fahrenheit, lunar phase, and a URL link to a thumbnail image) for the 63,070 different time-lapse images stored at: DOI: https://doi.org/10.5061/dryad.vv36g. The ‘consensus click’ data used to create these files, along with a detailed explanation of them, can also be found at DOI: https://doi.org/10.5061/dryad.vv36g. The filtering thresholds employed here were 'num_markings > 3' for adults, and 'num_markings > 1' for chicks and eggs. This means that only ‘consensus clicks’ formed from four or more, or two or more, volunteer clicks, are included for adults and chicks/eggs, respectively. These values can be altered in the ‘Kraken Script’ (DOI:10.5281/zenodo.3238554) used to generate these files. The metadata are extracted from the 'Penguin Watch Manifest' file also included in this repository. Please see the associated paper for more details and an explanation of terms.
Kraken_Files.zip
Narwhal_Files
This folder contains ‘Narwhal Files’ for each of the 14 different cameras discussed in Jones et al. (2018) - DOI: https://doi.org/10.1038/sdata.2018.124. These files comprise count data (for adults, chicks and eggs), in addition to ‘nearest neighbour distance measurements’ and metadata (date/time, temperature (Fahrenheit and Celsius), lunar phase and URL links to thumbnail images. See the associated paper for an explanation of each variable. ‘Narwhal Files’ are generated using the Narwhal Script (DOI: 10.5281/zenodo.3238573), with ‘Kraken Files’, which are also included in this repository, as the input file type.
Narwhal Plots
'Narwhal Plots' are produced from 'Narwhal Files' (also included in this repository) using the 'Narwhal Plotting Script' (DOI: 10.5281/zenodo.3238583). For each time-lapse image (displayed above the graphs), they show trends up to, and including, the date of the image of: graph 1: number of adults and number of chicks (moving average, n=20), graph 2: chick 'second nearest neighbour distances' (moving average, n=2), and graph 3: mean adult ‘nearest neighbour distance’ between the ith and (i-1)th image (moving average, n=20). The number of images over which a moving average is taken can be altered in the script. Here, 22 folders of 'Narwhal Plots' are presented, corresponding to Table 4 of Jones et al. (2018) (DOI: https://doi.org/10.1038/sdata.2018.124). Note that PETEd2013 is not included, as this is a duplicate image set of PETEc2014. Overall, the plots relate to 14 different Penguin Watch time-lapse cameras.
Narwhal_Plots.zip
Penguin Watch Narwhal Animations
This folder contains 'Penguin Watch Narwhal Animations' for each of the different 14 cameras described in Jones et al. (2018) (DOI: https://doi.org/10.1038/sdata.2018.124). Note that the cameras are sub-divided in accordance with Table 4 in Jones et al. (2018). The animations are generated from the 'Narwhal Plots' also included in this repository, using GIMP (v2.8) and VisualDub (v1.10.4).
PW_Narwhal_Animations.zip
Pengbot Out Folders
This 7-zip file comprises 14 folders - one for each unique Penguin Watch camera described in Jones et al. (2018) (https://doi.org/10.1038/sdata.2018.124). These folders contain 'Pengbot Out Files' - MATLAB files generated using the Pengbot convolutional neural network (CNN). These MATLAB files are matrices of 'penguin densities', which are used to create 'Pengbot Density Maps' and 'Pengbot Count Files' (both also included in this repository). For more information about the Pengbot algorithm and 'Pengbot Out Files', please see the associated paper, https://www.robots.ox.ac.uk/~vgg/data/penguins/, and Arteta et al. (2016) (https://doi.org/10.1007/978-3-319-46478-7_30).
Pengbot_Out_Folders.7z
Pengbot Density Maps
'Pengbot Density Maps' are generated from 'Pengbot Out Files' (also in this repository) using the 'Pengbot Counting Script' (DOI: 10.5281/zenodo.3238590). They are a visual representation of pixel densities, where the densities comprising one individual penguin will sum to one. Generally, where a penguin is detected in the background of an image, pixel densities will be higher than in the foreground. This is because penguins in the distance appear smaller, so fewer pixels are required to form one individual. In these density maps, brighter yellow pixels represent higher densities, and redder pixels represent rocks/snow/ice etc. See the associated paper, https://www.robots.ox.ac.uk/~vgg/data/penguins/, and Arteta et al. (2016) (https://doi.org/10.1007/978-3-319-46478-7_30) for more information.
Pengbot_Density_Maps.zip
Pengbot Count Files
This folder contains 14 'Pengbot Count Files'. Each file gives a penguin count for every photograph within the image set (folders correspond to the cameras described in Jones et al. (2018) (https://doi.org/10.1038/sdata.2018.124). These counts are derived from 'Pengbot Out Files' (also included in this repository), using the 'Pengbot Counting Script' (DOI: 10.5281/zenodo.3238590). Since raw counts are generated from pixel densities, they are not an integer. Therefore, a rounded count is also provided. Please see the associated paper, https://www.robots.ox.ac.uk/~vgg/data/penguins/, and Arteta et al. (2016) (https://doi.org/10.1007/978-3-319-46478-7_30) for more information about the Pengbot convolutional neural network (CNN).
Pengbot_Count_Files.zip
Method Comparison File
This Excel spreadsheet contains penguin counts for 1183 different time-lapse images, derived using three different methods: expert annotation, citizen science (i.e. the Penguin Watch project - www.penguinwatch.org), and computer vision (i.e. the Pengbot algorithm; https://www.robots.ox.ac.uk/~vgg/data/penguins/). Summary statistics showing the variation in counts between the methods are also included. The images are associated with four different cameras (DAMOa (n=300), HALFc (n=283), LOCKb (n=300) and PETEc (n=300)). Please see the associated paper for more information.
Method_Comparison_File.xlsx