Global analysis of emperor penguin populations
Data files
Feb 14, 2024 version files 128.82 KB
-
colony_attributes_1.csv
-
empe_aerial_2023-05-25_1.xlsx
-
empe_satellite_2023-05-25_1.xlsx
-
fast_ice_trends_2.csv
-
README.md
Abstract
Like many polar animals, emperor penguin populations are challenging to monitor because of the species’ life history and remoteness. Consequently, it has been difficult to establish its global status, a subject important to resolve as polar environments change. To advance our understanding of emperor penguins, we combined remote sensing, validation surveys, and using Bayesian modeling we estimated a comprehensive population trajectory over a recent 10-year period, encompassing the entirety of the species’ range. Reported as indices of abundance, our study indicates with 81% probability that the global population of adult emperor penguins declined between 2009 and 2018, with a posterior median decrease of 9.6% (95% credible interval (CI) -26.4% to +9.4%). The global population trend was -1.3% per year over this period (95% CI = -3.3% to +1.0%) and declines likely occurred in four of eight fast ice regions, irrespective of habitat conditions. Thus far, explanations have yet to be identified regarding trends, especially as we observed an apparent population up-tick toward the end of time series. Our work potentially establishes a framework for monitoring other Antarctic coastal species detectable by satellite, while promoting a need for research to better understand factors driving biotic changes in the Southern Ocean ecosystem.
README: LaRue et al. (2024): Advances in remote sensing of emperor penguins: first multi-year time series documenting global population change
Overview
This repository contains data, code, and model output associated with the global-scale analysis of Emperor penguin population dynamics described in LaRue et al. (2024), based on integrating raw data from aerial surveys with time series of circumpolar satellite surveys of known emperor penguin colonies.
The model is used to estimate an annual index of abundance at every known Emperor penguin colony in Antarctica (as of 2018), for every year between 2008 and 2018. Regional and global population indices are then calculated by summing colony-level estimates, according to regional colony membership.
Simulations are also performed to evaluate the ability of the model to accurately detect population trends, if they exist.
File structure and code description
- analysis/
- output/
- data_viz/ [visualizations of raw data]
- model_checks/ [goodness-of-fit assessments]
- model_results/
- Global_Level/ [global estimates of population abundance, change, and trend]
- Regional_Level/ [regional estimates of population abundance, change, and trend]
- Colony_Level/ [colony-level estimates of population abundance, change, and trend]
- parameter_estimates.csv [parameter estimates from fitted Bayesian model]
- simulation/ [raw results and figures related to simulations]
- EMPE_data_formatted.RData [formatted data for analysis, after cleaning. Is generated by
script1_PrepareData.R
] - fitted_model.RData [Bayesian output in JAGS. Is generated by
script2_FitModel.R
]
-
script1_PrepareData.R
[R script to visualize, clean, and package raw data for analysis with JAGS] -
script2_FitModel.R
[R script to fit Bayesian model, check convergence, evaluate goodness of fit, and generate results, figures, and summary statistics] -
script3_Simulation.R
[R script to conduct simulations; used for testing the Bayesian modeling approach]
- output/
data/
- colony_attributes.csv [locations, names, and abbreviations for each colony]
- site_id: four-letter unique identifier for emperor penguin colony location.
- site_name: name of the emperor penguin colony.
- lat: approximate latitude in decimal degrees of the emperor penguin colony.
- lon: approximate longitude of the emperor penguin colony.
- ice_reg: delineations of fast ice regions based on Figure 2 in Fraser et al. 2021.
- p_ice_reg: delineations of pack ice regions based on Figure 1 in Parkinson 2019.
empe_aerial_2023-05-25.csv [raw data associated with aerial counts of emperor penguin colonies, derived from peguinmap.com]
- site_name: name of the emperor penguin colony.
- site_id: four-letter unique identifier for emperor penguin colony location.
- ccamlr_region: the Commission on the Conservation of Antarctic Marine Living Resources (CCAMLR) region within which the emperor penguin colony falls.
- longitude_epsg_4326: approximate longitude in decimal degrees of the emperor penguin colony defined by European Petroleum Survey Group (EPSG) 4326.
- latitude_epsg_4326: approximate latitude in decimal degrees of the emperor penguin colony defined by European Petroleum Survey Group (EPSG) 4326.
- common_name: name of the species in colloquial English.
- day: day of month (numbers 1-31) an observational survey took place. Blanks indicate data were not provided.
- month: month of year (numbers 1-12) an observational survey took place. Blanks indicate data were not provided.
- year: year (YYYY format) an observational survey took place.
- season_starting: the year in which the first survey of a colony for a species took place, defined by the spring year, as Antarctic spring begins in September/October and continues into the following autumn (e.g., March). For example, the 2018 season would represent surveys that took place during September 2018 through March 2019.
- penguin_count: integer of birds counted during the survey.
- accuracy: integer (1-5) representing level of accuracy in the penguin count as defined in Croxall and Kirkwood (1979). 1 = pairs/nests individually counted and accuracy is better than (plus or minus) 5%; 2 = number of pairs in a known area is counted and then given the area, the total is calculated; 3 = accurate estimate to within 10-15%; 4 = rough estimate, accurate to ~25-50%; 5 = guesstimate, accurate to order of magnitude.
- count_type: age class surveyed, either chicks (born in the season in which the observation took place) or adults (any adult at the colony).
- vantage: survey type conducted as per the corresponding reference material. Ground = counts from the ground; aerial = counts from aircraft (either helicopter or fixed wing) either by photography or manual count; ground photo = counts conducted after taking photographs of the colony from the ground. Missing data means that the reference material did not specify how the counts were conducted.
reference: Source of the data, including peer-reviewed literature, reports, or dataset.
empe_satellite_2023-05-25.csv [raw data associated with satellite surveys of emperor penguin colonies]
site_id: four-letter unique identifier for emperor penguin colony location.
site_name: name of the emperor penguin colony.
source: first and last name of the remote sensing analyst for the image.
catalog_id: unique identifier of the image acquired by Maxar Technologies (formerly DigitalGlobe, Inc.). NA = imagery was not available for a given year at the colony location. Blank catalog IDs represent images used in Fretwell et al. (2012) and were not recalculated in this study.
img_long: approximate longitude centroid of the image.
img_lat: approximate latitude centroid of the image.
img_date: date of year the image was acquired, in day/month/year format. NA = imagery was not available for a given year at the colony location.
img_day: day of month (numbers 1-31) the satellite image was acquired by Maxar Technologies (formerly DigitalGlobe, Inc.).NA = imagery was not available for a given year at the colony location.
img_month: month of year (numbers 1-12) the satellite image was acquired by Maxar Technologies (formerly DigitalGlobe, Inc.). NA = imagery was not available for a given year at the colony location.
img_year: year (in YYYY format) the satellite image was acquired by Maxar Technologies (formerly DigitalGlobe, Inc.). NA = imagery was not available for a given year at the colony location.
satellite: platform on which the image was acquired. WV01 = Worldview-1; WV02 = WorldView 2; QB02 = Quickbird-2; WV03 = Worldview-3; GE01 = GeoEye-1. NA = imagery was not available for a given year at the colony location.
area_m2: area of penguin pixels calculated by a combination of supervised classification followed by maximum likelihood calculation on satellite imagery. NA = imagery was not available for a given year at the colony location.
img_qualit: quality of the imagery, ranging from 1-3 with an image quality of 1 being the worst and image quality 3 being the best.
bpresent: binary yes or no as to whether birds were present. Yes = birds were detected on the image; No = image was inspected and there were no birds on the image, or there was no fast ice; NA = image was unavailable, image was corrupted upon delivery or it was not possible to determine (due to image quality) whether birds were present.
comments: Notes about remote sensing analysis of the imagery.
fast_ice_trends_2.csv [data related to regional trends in Antarctic fast ice. Full dataset available in Table 1 in Fraser et al. 2021]
- Region: delineated region of fast ice, as shown in Figure 2 in Fraser et al. (2021).
- FastIceExtent_min: minimum extent of fast ice within each region, derived from Table 1 in Fraser et al. (2021).
- FastIceExtent_max: maximum extent of fast ice within each region, derived from Table 1 in Fraser et al. (2021).
- FastIceTrend: calculated trend for fast ice during 2000-2018, derived from Table 1 in Fraser et al. (2021).
FastIceTrend_CI: confidence interval for fast ice trend during 2000-2018, derived from Table 1 in Fraser et al. (2021).
Sharing/Access information
- colony_attributes.csv [locations, names, and abbreviations for each colony]
Data is openly accessible through Dryad (link).
Methods
During the 2018 Antarctic field season, under permit #2019-006 granted by the National Science Foundation, our US-based team conducted aerial photography at emperor penguin colonies in the Ross Sea to add to robust validation of imagery. Our efforts included one flight via fixed wing aircraft over colonies distant from McMurdo Station and five flights via helicopter to a single colony (Cape Crozier) near the station. The five flights to Cape Crozier, 24 October to 15 November, were used to better understand population fluctuation through a single season. Our fixed wing survey took place on 31 October 2018, flying in the vicinity of Beaufort Island (ASPA 105), Franklin Island, Cape Washington (ASPA 173), Coulman Island, and Cape Roget. At each location (both by fixed wing and helicopter), we circled the colony 1-4 times, maintaining a minimum of 500 m horizontal distance from the periphery of the colony and a minimum altitude of 500 m. No behavioral disturbance to birds (e.g., rapid movement or dispersion of adults or chicks) were noted during any flight. Oblique photographs (with a Canon EOS Mark 7D II with Tamron 400m zoom lens) were taken through the window of the Basler aircraft, and in the case of our AStar helicopter surveys with the window down, in continuous shooting mode to ensure effective ability to stitch photos together for counting.
We then downloaded and stitched the multiple photos per colony with Adobe photo-stitching software to create a single image for manual counting. We loaded images of colonies into the free software ImageJ, which allowed us to document and assess precision of counts among observers. Our field team (four people) counted the largest colony in the world, Coulman Island, to gain an understanding of among-counter precision. After determining a coefficient of variation of ~2.5% (small variation in counts among observers), we determined that each team member would be assigned to count one of the remaining colonies each, to speed the process and to arrive at a population count of adult emperor penguins at each of six Ross Sea colonies during spring 2018. These data were immediately entered into MAPPPD repository. We used these counts as validation for our observation model (see population modeling below).
Satellite imagery
To gather images for analysis, we first reviewed discover.digitalglobe.com (Maxar Technologies) to determine image availability per emperor penguin location per year, and to determine utility/quality of images for analysis. Images had been requested for acquisition via the National Geospatial Intelligence Agency (NGA), when possible, once per month during cloud-free days in austral spring. We avoided images with excessive clouds, and those that were too dark, too bright, or otherwise low-quality. We created a list comprising the unique identifier for each image (called the “catalog ID”) and then requested images be processed (via Polar Geospatial Center [PGC]), specifically pan-sharpened (i.e., increasing the spatial resolution of the multispectral image by merging it with its higher-resolution, panchromatic pair) and projected to Antarctic Polar Stereographic (ESPG code 3031). We then followed semi-automated methods already established: briefly, we loaded VHR imagery into ArcGIS 10.8 (ESRI), identified the location of emperor penguins on the image, and then clipped images to the extent of the colony. We conducted a supervised classification by manually training the program with shapefile points representing pixels of guano, snow, and penguin. We conducted a maximum likelihood classification based on these classes to arrive at a classified raster image identifying pixels that are likely penguin pixels. Our final step was to convert the raster to a polygon shapefile and to calculate the area (m2) of penguin pixels. The area of “penguin pixels” per colony per year served as the response variable and input to the population modeling (below). We conducted this process for all 50 colonies in all years for which imagery existed during 2009-2018.
Model overview
We developed a Bayesian state-space model to accommodate several key features of emperor penguin population dynamics and the data collection (i.e., observation) process. We gathered all available adult count data (obtained from remote cameras, ground, and aerial surveys) from MAPPPD for colonies that ranged in size and in proximity to research stations, and which were situated in different regions of the Antarctic.
The population processes we sought to model were:
1) Colony-level trends and annual fluctuations can differ, even among nearby colonies;
2) Individual colonies can temporarily “vanish” for a breeding season and re-appear in future years (somewhat depending on fast ice conditions);
3) Daily abundance of adult penguins in a colony can vary substantially throughout the spring (August – December) survey period, caused by breeding synchrony, temporal variation in adult foraging trips, the presence/absence of non-breeding adults, emigration from the colony, breeding failure, and changes in parenting behavior (crèching). As noted, unknown to us was the prevalence of these factors before images were acquired, thus subsequently affecting what we measured as “colony size”.
The data collection (i.e., observation) processes we sought to model were:
4) Counts of adults from aerial surveys are an imperfect observation of the seasonal population index (i.e., due to counting errors during surveys and intraseasonal variation in daily abundance of adults at colonies; point 3 above);
5) Satellite observations of the “area of ground occupied by penguins” are imprecise and potentially biased estimates of the true count during the survey, and by extension, of the seasonal expected count;
6) The expected number of birds counted at a colony (either through aerial surveys or satellite images) potentially changes over the survey period, owing to chick mortality and subsequent emigration by attendant adults.
Model fitting
All data were analyzed using R version 4.2 (R Core Team 2021), with posterior samples generated using Markov chain Monte Carlo methods implemented using JAGS version 4.3. After a burn-in period of 50,000 iterations, we stored every 50th iteration until we accumulated 10,000 posterior samples from each of three Markov chains. The model unambiguously converged; the Gelman–Rubin convergence statistic was less than 1.1 for all hyperparameters, colony- and year-level effects, regression coefficients, and latent states. We confirmed that the effective sample size for each parameter was greater than 2,000 and also confirmed the ability of the model to generate data that are consistent with the observed data, using posterior predictive checks. We confirmed the ability of our model to generate unbiased and identifiable trend estimates using simulations. We report the medians and 95% equal-tailed credible intervals of all modeled quantities unless otherwise noted. Code and data to replicate our analysis is available at https://github.com/davidiles/EMPE_Global.
Goodness-of-fit and model diagnostics
Posterior predictive checks confirmed that the fitted model could generate data with reasonable properties and no obvious systematic discrepancies with the observed data (i.e., data simulated from the fitted model “looks like” the observed data). Of our simulated datasets, 31% had aerial observations with lower RMSE than the observed data (i.e., Bayesian p-value = 0.31) and 29% of simulated datasets had satellite observations with lower RMSE than observed data (i.e., Bayesian p-value = 0.29; Figure S1). Bayesian p-values close to 0.5 indicate a reasonable fit and occur when the fitted statistical model is equivalent to the “true” model that generated the data. Visual inspection of observed versus fitted values (Figure S2) also indicated the model was a good fit to the data.
Simulations to confirm parameter identifiability
We conducted a series of simulations (n = 300) to confirm that the statistical model could generate identifiable and unbiased estimates of global population trend and change, given realistic data availability, observation error, and survey imbalance among colonies. In each simulation, we generated a time series of “true” abundance at each of the 50 colonies (also resulting in a simulated global trend), and then simulated aerial and satellite observations at each colony, including realistic data imbalance, as well as aerial and satellite observation error (based on values estimated from the analysis of empirical data; Table S3). We then used these simulated observations as “data,” re-fit our statistical model to those simulated data, and evaluated whether we could recover unbiased estimates of global trend with appropriate credible interval coverage (Figures 1 and 2). Simulations indicated that the model could reliably recover estimates of global trend (median bias = 0.3%; Figure 2) and change between 2009 and 2018 (median bias = 3.5%; Figure 2), with appropriate 95% credible interval coverage. These simulations suggest that the extreme data imbalance in our data, coupled with our choice of priors, does not induce severe bias into estimates of population change.
Sea ice correlations
To investigate a possible relationship between regional trends in fast ice or pack ice, we gathered published data on fast ice trends and pack ice trends within discrete regions of Antarctica that differ in their patterns of ice formation and sea ice co-variability. We then assigned each emperor penguin colony to these sea ice regions. Within each region, we used samples from the Bayesian joint posterior to sum colony indices and thereby calculate estimates of regional indices and population change. Finally, we estimated the Spearman rank correlation between regional population and regional trends for fast ice (n = 8), and pack ice (n = 5).