Evidence of emperor penguins' sensitivity to sea ice fluctuations
Data files
Jan 23, 2026 version files 650.57 KB
-
ALL_areas_SeaIceExtract_2025.zip
87.73 KB
-
code.zip
37.08 KB
-
data.zip
502.96 KB
-
README.md
22.80 KB
Abstract
Using satellite-derived abundance indices over 20 years, we investigated population change among the seven Ross Sea emperor penguin (Aptenodytes forsteri) colonies. We found a 90% probability that the Ross Sea metapopulation had lower springtime attendance between 2005 and 2024 (mean change = -23%; 95% CI: -46 to +20%). We identified two distinct phases of change: slightly increasing 2005 to 2019/2020, followed by a steep decline 2020-2024, resulting in a decrease of ~23,000 birds in five years, or approximately 32% of the regional population. Over the 20 years, the two southernmost colonies, Cape Crozier and Cape Colbeck, increased in size, yet Beaufort, Coulman, Roget, and Washington declined, resulting in a negative population trend, especially evident from 2021. Asynchronous population change suggests metapopulation dynamics, with immigration from western colonies possibly driving increases at Colbeck and Crozier. Winter and spring sea ice concentration anomalies (5-y lag) and El Niño phases (1-y lag) statistically explained the most variance in our time series. We suggest emperor penguins may be sensitive to sea ice fluctuations, offering insight into how this species may have adapted throughout their history, and emphasising the need for timely re-evaluation of targeted conservation strategies in an era of sea ice change.
Access this dataset on Dryad (DOI: https://doi.org/10.5061/dryad.cnp5hqcjv)
Overview
This repository outlines the code and process for analysing Ross Sea emperor penguin population dynamics from satellite imagery over 20-years+ from 2000-2024.
We modelled population change at the seven Ross Sea emperor penguin colonies between 2000 and 2024 and ran a series of Generalised Additive Models to identify potential drivers of population change.
The paper is published in Proceedings B (DOI: 10.1098/rspb.2016.0049) entitled, "Evidence of emperor penguin sensitivity to sea ice fluctuations"
Description of scripts (code.zip)
We estimated emperor penguin abundance indices for each of the seven colonies in the Ross Sea and the overall metapopulation, converting satellite areas and aerial counts to abundance indices, using the LaRue et al. (2024) Bayesian state-space model (found here: https://datadryad.org/dataset/doi:10.5061/dryad.m63xsj48v). The code used to convert satellite areas to abundance indices is:
- script1_PrepData_github_05-24.R
- script2_FitModel_github_05-24.R
- script3_Simulation_github.R
We ran a breakpoint detection analysis on the Bayesian posterior trajectories and that analysis can be found in
- script4_PosteriorAnalysis_05-24.R
We calculated population growth rates from estimated abundance indices and ran a series of analyses to determine population synchrony and investigated environmental drivers of change using GAMs. RScript to recreate this analysis is found in
- GAM_modelling.R
Code for recreating our population figures, change point detection, and running synchrony analyses:
- population_figs_final.R
Code for recreating our SICa extraction process:
-
SIC_extractAnomalise_2025.R
The code also includes how we extracted sea ice concentration anomalies (SICa) from large-scale SIC data (accessed through NSIDC: https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/south/monthly/geotiff/), including how we calculated the seasonal averages and the anomalies (how each month compares to the monthly average between 1995-2024).
-
This script uses polygon shapefile data found in data.zip (ALL_areas_SeaIceExtract_2025.zip)
Description of data files:
These data are in support of the manuscript published in ProcB, "Evidence of emperor penguins' sensitivity to sea ice fluctuations".
ALL_areas_SeaIceExtract_2025:
- The polygon shapefile used to extract sea ice concentration data from the NSIDC geotiffs using the code described in SIC_extractAnomalise_2025.R.
- To load this data into geospatial mapping programs such as ArcGIS Pro, Use 'extract all' for the zipped folder and then select the entire unzipped folder to be imported into the spatial data program. Each layer within the zipped folder contains important information for the visualisation of the polygons, and all are required to work with this data.
Data file that includes all population abundance index estimates:
Overall dataset combining abundance indices (results from BSS output) and environmental covariates used for our figures and GAMs.
- EP_data_FINAL_Nov2025.csv
Ross Sea metapopulation abundance, trends and change: Results from 'global' BSS model
- global_abundance_summary05_24.csv (metapopulation abundance indices incl. in EP_data_FINAL_Nov2025)
- global_change_summary05_24.csv (percent probability of decline and mean change)
- global_trend_summary05_24.csv (regional trend)
aerial_03-06-25_RossSea.csv:
Aerial survey data (adult) counts used in the Bayesian state-space model used to convert penguin areas (calculated from high-resolution satellite imagery) to estimated abundance indices (code available in LaRue et al. 2024, https://datadryad.org/dataset/doi:10.5061/dryad.m63xsj48v)
satellite_03-06-2025_RossSea.csv:
Satellite area estimates (penguin area m2) used in the BSS model to convert penguin areas (calculated from high-resolution satellite imagery) to estimated abundance indices (code available in LaRue et al. 2024, https://datadryad.org/dataset/doi:10.5061/dryad.m63xsj48v)
colony_attributes.csv:
colony attribute data used in the BSS model to convert penguin areas (calculated from high-resolution satellite imagery) to estimated abundance indices (code available in LaRue et al. 2024, https://datadryad.org/dataset/doi:10.5061/dryad.m63xsj48v)
sync_2025.csv:
Community matrix data used for the community synchrony analysis to determine if populations fluctuated synchronously across the region, including growth rates (GR) and estimated abundance indices (N_mean / NM) at each Ross Sea colony 2005-2024 (NAs for Colbeck and Coulmann growth rate (GR) in 2005 due to not having data before 2005 so not able to calculate difference between 2004 and 2005 to calculate growth rate)
- GR = growth rate (calculated from abundance indices)
- NM = N_mean (estimated abundance index)
SIC_anomaly_ALL.csv
This folder contains the sea ice concentration anomalies for every emperor penguin colony in Antarctica within a 200km buffer around each colony, as well as the entire Ross Sea, Weddell Sea, and Amundsen/Bellinghausen Sea sectors. We compared the monthly average at each location for the monthly average spanning 1995-2024 and calculated the anomalies (how much higher or lower than the average). We then calculated the seasonal average for the anomalies. Seasons are defined based on timing relevant to the emperor penguin life cycle (JFM = Jan Feb Mar, AMJ = Apr May Jun, JAS = Jul Aug Sep, OND = Oct Nov Dec). The code for creating this data is found in SIC_extractAnomalise_2025.R.
Variable within the SIC_anomaly_ALL.csv dataset:
- colony = colony name
- month = the month that the average SIC as calculated for
- mean_SIC = the total monthly average SIC between 1995 and 2024for the cells contained within the 200 km buffer around this colony (or region).
- site_id_x = four-letter code relating to colony name
- value = refers to the average SIC value for this month, used to compare against mean_SIC to produce the anomaly value
- date = the day, month, and year for the SIC average (all data are month averages; 01 for day has been added to all dates for formatting purposes)
- year = year of observation
- season = identified whether the month falls in JFM, AMJ, JAS, or OND (defined above)
- anomaly = is 'value' (the current month average SIC value) divided by the 'mean_SIC'(average SIC for that area 1995-2024) value to calculate how that month's SIC compares to the average spanning 1995-2024. I.e., how much higher or lower is it that month than on previous years.
Seasonal averages were calculated by taking the average anomaly values for each season, each year, and each colony.
Covariates within EP_data_FINAL_Nov2025.csv are:
(Details on NAs are provided in the following section)
Colony details
- site_name: Full name of colony
- colony: Shortened name of colony
- site_id: Four-letter code to identify colony
- year: Categorical year variable
- year_X: Continuous year variable
- site_number: Number identifying colony
- N_mean: Estimated abundance index, calculated using the BSS model linked above. This value was converted to a growth rate to be analysed in GAMs and GAMMs.
- N_se: Standard error for the estimated abundance index
- N_q025: 2.5% quartile around the abundance index
- N_q05: 5% quartile around the abundance index
- N_median: median the abundance index
- N_q95: 95% quartile around the abundance index
- N_q975: 97.5% quartile around the abundance index, creates 95% CI when combined with N_q025
- growth_rate: growth rate calculated from N_mean (abundance index)
Environmental covariates
Southern Annular Mode (SAM)
- SAM_mean: Annually averaged SAM values (accessed through: https://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/aao/aao.shtml)
- SAM_sd: Standard error of the annually averaged SAM values
- SAM_lag1 SAM_lag2 SAM_lag3 SAM_lag4 SAM_lag5: 1-5 year lags for SAM_mean
El Niño Southern Oscillation Index (ENSO)
- ENSO_mean: Annually averaged ENSO values (accessed through: https://www.ncei.noaa.gov/access/monitoring/enso/soi)
- ENSO_sd: Standard error of the annually averaged ENSO values
- SOI_lag1SOI_lag2 SOI_lag3 SOI_lag4 SOI_lag5: 1-5 year lags for ENSO_mean
Fast ice persistence
- Breakout_date2: The date the fast ice broke out at the colony that year (calculated using the NASA Worldview tool, https://worldview.earthdata.nasa.gov/)
- Freeze_date2: The date the fast ice formed at the colony that year (calculated using the NASA Worldview tool)
- ice_days: The number of days between the formation and breakout dates described above to give a value of fast ice persistence
- ice_dayslag_1 ice_dayslag_2 ice_dayslag_3 ice_dayslag_4 ice_dayslag_5: 1-5 year lags for fast ice persistence
- Breakout_beforeDec15: binary covariate identifying whether fast ice broke out before Dec 15th that year (considered 'early' - unsure if chicks would be ready to be independent)
- breakout_lag_1 breakout_lag_2 breakout_lag_3 breakout_lag_4 breakout_lag_5: 1-5 year lagged values for Breakout_beforeDec15th variable
Sea ice covariates:
SIC data accessed through: https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/south/monthly/geotiff/
Sea ice concentration anomalies (SICa) calculated seasonally and compared between each month and the average for that month from 1995-2024. Colony-specific modelling looked at local-scale SICa (within 200 km, or foraging range) of emperor penguins during the breeding season. The metapopulation models looked at the regional-scale SICa (Ross Sea region) and the eastern Ross Sea moult sector (ERS) in summer (JFM)
- SICa_AMJ: Seasonally averaged SICa for autumn (Apr-May-Jun)
- SICa_AMJsd: Standard deviation of autumn SICa
- SICa_JAS: Seasonally averaged SICa for winter (Jul-Aug-Sep)
- SICa_JASsd: Standard deviation of winter SICa
- SICa_JFM: Seasonally averaged summer SICa (Jan-Feb-Mar)
- SICa_JFMsd: Standard deviation of summer SICa
- SICa_OND: Seasonally averaged spring SICa (Oct-Nov-Dec)
- SICa_ONDsd: Standard deviation of spring SICa
Lagged data (only analysed 1-y and 5-y lags due to hypotheses-driven approach and lack of a clear mechanism why other lags would have an influence on growth rates)
- SICa_AMJlag_1 SICa_AMJlag_2 SICa_AMJlag_3 SICa_AMJlag_4 SICa_AMJlag_5: 1-5 yr lags for Autumn SICa
- SICa_JASlag_1 SICa_JASlag_2 SICa_JASlag_3 SICa_JASlag_4 SICa_JASlag_5: 1-5 yr lags for Winter SICa
- SICa_JFMlag_1 SICa_JFMlag_2 SICa_JFMlag_3 SICa_JFMlag_4 SICa_JFMlag_5: 1-5 yr lags for Summer SICa
- SICa_ONDlag_1 SICa_ONDlag_2 SICa_ONDlag_3 SICa_ONDlag_4 SICa_ONDlag_5: 1-5 yr lags for Spring SICa
- ERS_JFM_SICa: Summer SICa in the eastern Ross Sea sector, where emperor penguins moult
- ERS_JFM_SICa_sd: Standard deviation of ERS SICa in summer
- ERS_SICa_JFMlag_1 ERS_SICa_JFMlag_2 ERS_SICa_JFMlag_3 ERS_SICa_JFMlag_4 ERS_SICa_JFMlag_5: 1-5 year lags for ERS SICa
Amundsen Sea Low (ASL)
ASL data accessed through: https://raw.githubusercontent.com/scotthosking/amundsen-sea-low-index/master/asli_era5_v3-latest.csv
- RCP = Relative Central Pressure
- lat = latitudinal position of the ASL
-lon = longitudinal position of the ASL - lon_AMJ: ASL longitude in autumn
- lat_AMJ: ASL latitude in autumn
- RCP_AMJ: ASL relative central pressure in autumn
- lon_JAS: ASL longitude in winter
- lat_JAS: ASL latitude in winter
- RCP_JAS: ASL relative central pressure in winter
- lon_JFM: ASL longitude in summer
- lat_JFM: ASL latitude in summer
- RCP_JFM: ASL relative central pressure in summer
- lon_OND: ASL longitude in spring
- lat_OND: ASL latitude in spring
- RCP_OND: ASL relative central pressure in spring
Distances between colonies
- dist_BEAU: distance from this colony to the Beaufort Island colony
- dist_COLB: distance from this colony to the Cape Colbeck island colony
- dist_COUL: distance from this colony to the Coulman island colony
- dist_CROZ: distance from this colony to the Cape Crozier colony
- dist_FRAN: distance from this colony to the Franklin island colony
- dist_ROGE: distance from this colony to the Cape Roget colony
- dist_WASH: distance from this colony to the Cape Washington colony
Rate of change
A covariate outlining the average rate of change at surrounding colonies within different radii. Was developed to be used in the GAMs as a possible driver of colony-specific variation, but was excluded prior to submission due to the conflating effect sizes of other variables.
- roc_100: rate of change at colonies within 100 km - closest colony pairs in Ross Sea
- roc_220: rate of change at colonies within 220 km - average distance between colonies around the continent
- roc_480: rate of change at colonies within 480 km - average distance between colonies in the Ross Sea
- roc_1100: rate of change at colonies within 100 km - maximum distance between colonies in the Ross Sea
NAs
Please note that NAs within this file are due to there being no data available for a variety of reasons.
- NA for Cape Crozier in 2001 was due to the data being excluded as the colony absent as the area where they breed was crushed by an iceberg that year
- NA within the "Breakout_date" or "Freeze_date" variable indicated that the fast ice did not break out at the colony that year, thus no date was recorded.
- NAs for the lagged variables are caused by the lag calculation and not having data from the years before the start of the analysis.
- NAs in the 'dist_xxxx' (for each colony) columns are due to the rows referencing the same colony in the variable, i.e., we did not record the distance from Beaufort to Beaufort, as they are the same location.
- NAs in the 'roc_xxx' variable indicated that there were no other colonies within that distance from the colony, so no average size of colonies within that distance could be reported.
-NAs in the ASL data (lon, lat, or RCP) indicated that there was no data available for those months
-The Ross Sea regional rows have NAs for the fast ice breakout dates and other site-specific covariates that were not regional scale, thus were not included in the regional-scale analysis
Description of all variables within each data file:
aerial_03-06-25_RossSea.csv
- site_name: name of colony
- site_id: 4 letter code for colony name
- cammlr_region: CCAMLR region where colony is found
- longitude_epsg_4326: longitude of colony location (rounded to 0.1 decimal places in accordance with GBIF recommendations for vulnerable species locations: Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.)
- latitude_epsg_4326: latitude of colony location (rounded to 0.1 decimal places in accordance with GBIF recommendations for vulnerable species locations: Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.)
- common_name: common name of target species
- day: day of count acquisition
- month: month of count acquisition\
- year: year of count acquisition
- season_starting: year that season started (relevant for species who breed over summer between 2-year dates)
- penguin_count: count of birds (used in BSS model)
- accuracy: how easy was it to count the birds
- count_type: whether chicks, nests, or adults counted
- vantage: how the count was conduction, aerial (flight), ground, or other
- reference: where the count was published and accessed through
satellite_03-06-2025_RossSea.csv
- site_name: name of colony
- site_id: 4 letter code for colony name
- Catalog_id: unique catalog ID for each satellite image
- img_lon: longitude of colony location (rounded to 0.1 decimal places in accordance with GBIF recommendations for vulnerable species locations: Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.)
- img_lat: latitude of colony location (rounded to 0.1 decimal places in accordance with GBIF recommendations for vulnerable species locations: Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.)
- img_date: date satellite image was captured
- img_day: day the satellite image was captured
- img_month: the month the satellite image was captured
- img_year: year satellite image was captured
- satellite: satellite platform
- area_m2: area covered by emperor penguins calculated within each image
- img_qualit: image quality covariate, how easy was it to quantify the bird area, 1 = very difficult, 2 = moderate, 3 = easy
- bpresent: birds present - were birds visible in the imagery
- polynya: What polynya is the colony associated with
- polynya_si: what size is the polynya that the colony is associated with (m2)
- ccamlr_reg: CCAMLR region that the colony is found in
- ice_reg: ice region that the colony is found in (see LaRue et al. 2024 "Advances in remote sensing..." for definitions)
- p_ice_reg: pack ice region the colony is found in (see LaRue et al. 2024 "Advances in remote sensing..." for definitions)
NAs within this dataset indicate either an image was not available for that location that year (if no image id is provided) or an image was available but the ice had broken out already and we were unable to detect any birds, or the image was too difficult to analyse due to cloud cover or another factor that prevented accurate area estimations from taking place.
colony_attributes.csv
- site_name: name of colony
- site_id: 4 letter code for colony name
- lat: latitude of colony location (rounded to 0.1 decimal places in accordance with GBIF recommendations for vulnerable species locations: Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.)
- lon: longitude of colony location (rounded to 0.1 decimal places in accordance with GBIF recommendations for vulnerable species locations: Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.)
- ccamlr_reg: CCAMLR region that the colony is found in
- ice_reg: ice region that the colony is found in (see LaRue et al. 2024 "Advances in remote sensing..." for definitions)
- p_ice_reg: pack ice region the colony is found in (see LaRue et al. 2024 "Advances in remote sensing..." for definitions)
MPA_current_name: The name of the MPA the colony is found in; if the colony is not within an MPA, it says NA - MPA_proposed_name: The name of the proposed MPA the colony is found in (these MPAs are currently under review at CCAMLR and are not currently providing protection to the colonies). If the colony is not found in an MPA, it says NA.
- in_MPA_current: whether the colony is within an MPA, yes or no
in_MPA_any: whether the colony is within a current or proposed MPA, yes or no. (no = not found in a current or proposed MPA)
global_abundance_summary05-24.csv
- Year: year of estimation
- N_mean: estimated abundance indices for the entire Ross Sea region (all seven colonies combined)
- N_se: standard error of abundance index
- N_q025: lower 2.5% quartile for abundance index estimates (used for 95% CI)
- N_q05: lower 5% quartile for abundance index estimates
- N_median: median abundance index value
- N_q95: upper 95% quartile for abundance index estimate
- N_q975: upper 97.5% quartile for abundance estimates (used for 95% CI)
global_change_summary05-24.csv
- Region: region of interest described in the data
- Prob_Decline: probability that the region declined across the study period 2005-2024
- Prob_30percent_Decline: probability that the region declined by 30% across the study period 2005-2024
- Prob_50percent_Decline: probability that the region declined by 50% across the study period 2005-2024
- 2.50%: lower 2.5% confidence quartile for estimated regional change (used for 95% CI)
- 5%: lower 5% confidence quartile for estimated regional change
- 50%: estimated median value for regional change
- 95%: upper 95% quartile for estimated regional change
- 97.50%: upper 97.5% quartile for estimated regional change (used for 95% CI)
- mean: the average estimated regional change in emperor penguin abundance indices between 2005 and 2024
- SE: standard error around the mean
global_trend_summary05-24.csv
- Region: region of interest
- 2.50%: lower 2.5% confidence quartile for estimated regional trend (used for 95% CI)
- 5%: lower 5% confidence quartile for estimated regional trend
- 50%: estimated median value for regional trend 2005-2024
- 95%: upper 95% quartile for estimated regional trend
- 97.50%: upper 97.5% quartile for estimated regional trend (used for 95% CI)
- mean: the average estimated regional trend for emperor penguin abundance indices in the Ross Sea between 2005 and 2024
- SE: standard error around the mean
- prob_negative: Probability that the Ross Sea region emperor penguins experienced a negative population trend between 2005 and 2024
Code/Software
Statistical analysis was conducted in R Statistical Software (v.4.3.1.). Packages required to conduct the analysis are outlined in each script.
