Where to live? Landfast sea ice shapes emperor penguin habitat around Antarctica
Data files
Mar 19, 2026 version files 1.22 GB
-
ADPE_colonies_20200416.csv
10.55 KB
-
Antarctic_Cx_11_class.nc
21.76 MB
-
cells_to_select_FIE.RData
136.53 KB
-
circumfi_1km_lats.img
105.75 MB
-
circumfi_1km_lons.img
105.75 MB
-
colonies_with_buffer3k_withcells_0_025_nocoastmask.csv
100.52 KB
-
colonies.xlsx
11.96 KB
-
colony-time-short_2009_fretwell.txt
1.56 KB
-
data_Alex_20102018_v2.mat
175.40 MB
-
data_Alex_fasticeextent_20102018.mat
627.05 KB
-
data_proba_presence_analysis_june22.RData
5.92 MB
-
datatot_fred_10kmbufferonabsence.xlsx
6.57 MB
-
datatot_fred_20kmbufferonabsence.xlsx
5.30 MB
-
datatot_fred_coords.xlsx
5.83 MB
-
datatot_plotdensitydistribution.xlsx
7.82 MB
-
empegrid_june_2010_2018.RData
6.67 MB
-
ice_sheet_GEBCO.RData
248.77 KB
-
magannualcycle_p4_v2.2.img
105.75 MB
-
maxtiming_p4_v2.2.img
105.75 MB
-
mintiming_p4_v2.2.img
105.75 MB
-
persistence_p4_v2.2.img
105.75 MB
-
README.md
34.06 KB
-
sara_distance_data_66694_432_float.dat
115.25 MB
-
time_steps.xlsx
16.52 KB
-
tot_env_pers_000001_oct21_20102018b.tif
17.92 MB
-
trend_p4_v2.2.img
105.75 MB
-
volatility_1m_p4_v2.2_nobug.img
105.75 MB
Abstract
Predicting species survival in the face of climate change requires understanding the drivers that influence their distribution. Emperor penguins (Aptenodytes forsteri) incubate and rear chicks on landfast sea ice, whose extent, dynamics, and quality are expected to vary significantly due to climate change. Until recently, this species’ continent-wide observations were scarce, and knowledge on their distribution and habitat limited. Advances in satellite imagery now allow their observation and characterization of habitats across Antarctica at high resolution. Using circumpolar high-resolution satellite images, unique fast ice metrics, and geographic and biological factors, we identified diverse penguin habitats across the continent, with no significant difference between areas with penguins or not. There is a clear geographic partitioning of colonies with respect to their defining habitat characteristics, indicating possible behavioral plasticity among different metapopulations. This coincides with geographic structures found in previous genetic studies. Given projections of quasi-extinction for this species in 2100, this study provides essential information for conservation measures.
Authors: Sara Labrousse, David Nerini, Alexander D. Fraser, Leonardo Salas, Michael Sumner, Frederic Le Manach, Stéphanie Jenouvrier, David Iles, and Michelle LaRue
Access this dataset on Dryad
This dataset provides all the information to prepare the environmental variables and then run the analysis of emperor penguin habitat modelling.
Description of the data and file structure
Each script is numbered in order of the research analysis and each script calls some dataset or produced dataset described below; in brackets is listed the software needed to open each of the files:
- ADPE_colonies_20200416.csv: coordinates of Adelie penguin colonies
- (1) ADPEname: name of the Adélie penguin colonies
- (2) Latitude: latitude in degrees
- (3) Longitude: longitude in degrees
- (4) ADPEcount: number of breeding pairs
- Antarctic_Cx_11_class.nc [to open with RStudio]: file with geomorphology data (data not used in the analysis of this study)
- FEAT_32KM provides whether the coastal morphology is a bay or a cap at a 32 km scale;
- FEAT_64KM provides whether the coastal morphology is a bay or a cap at a 64 km scale;
- FEAT_128KM provides whether the coastal morphology is a bay or a cap at a 128 km scale;
- MAG32 is the magnitude of coastal complexity at a 32 km scale regarding geomorphology - on dimensionless scale 0-100;
- MAG64 is the magnitude of coastal complexity at a 64 km scale regarding geomorphology - on dimensionless scale 0-100;
- MAG128 is the magnitude of coastal complexity at a 128 km scale regarding geomorphology - on dimensionless scale 0-100;
- ANGR32 is the compass direction of ‘MAG’ relative to the coastline at a 32 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc
- ANGR64 is the compass direction of ‘MAG’ relative to the coastline at a 64 km scale;
- ANGR128 is the compass direction of ‘MAG’ relative to the coastline at a 128 km scale.
- cells_to_select_FIE.RData [to open with RStudio] cell IDs to use to match EMPE grid data
- circumfi_1km_lats.img [to open with Matlab] Latitude for fast ice data files used in script 1
- circumfi_1km_lons.img [to open with Matlab] Longitude for fast ice data files used in script 1
- colonies_with_buffer3k_withcells_0_025_nocoastmask.csv [to open with RStudio] output file of script 2a, file with grid resolution of 0.025° (i.e., ∼1 km) containing emperor penguin presence cells (all the cells within 3 km of the 55 emperor penguin colony locations).
- (1) cell: number of cell
- (2) name:name of the emperor penguin colony
- (3) x: cell coordinate in x of the colony
- (4) y: cell coordinate in y of the colony
- (5) longitude: longitude in degrees
- (6) latitude: latitude in degrees
- colonies.xlsx [to open with RStudio] file with emperor colony locations for simple mapping from Fretwell et al. (2012), table 1- https://doi.org/10.1371/journal.pone.0033751. The precision of the geographic coordinates has been generalized with precision of 0.1 decimal degrees to align with the Dryad rules for such species.
- (1) name: name of the emperor penguin colonies
- (2) latitude: latitude in degrees
- (3) longitude: longitude in degrees
- colony-time-short_2009_fretwell.txt [to open with RStudio] file with colony size from the research of Fretwell et al. (2009) for figure S7. The precision of the geographic coordinates has been generalized with precision of 0.1 decimal degrees to align with the Dryad rules for such species.
- (1) year: year of the colony size
- (2) site_number: number of the colony name
- (3) site_id: abbreviation for the colony name
- (4) latitude: latitude in degrees
- (5) longitude: longitude in degrees
- (6) fretwell: colony size in number of penguins estimated by Fretwell et al. (2009)
- data_Alex_20102018_v2.mat [to open with RStudio and Matlab] dataset with all fast ice variables, output of script 1
- (1) lat: latitude in degrees
- (2) lon: longitude in degrees
- (3) magcycle: The magnitude of the annual cycle (v) is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1). The magnitude is zero both in regions of 0 and 100% persistence.
- (4) latitude: latitude in degrees
- (5) maxtime: The timing of fast ice maximum and minimum (iii and iv) represent the time of year when that pixel tends to reach max/ min coverage.
- (6) mintime: The timing of fast ice maximum and minimum (iii and iv) represent the time of year when that pixel tends to reach max/ min coverage.
- (7) persitence: Persistence (ii) is a simple measure of the mean fast ice residence across the 9 year time series.
- (8) trend: The fast ice trend (vi) is the trend calculated from March 2010 to 2018 per cell.
- (9) volati: Volatility (i) is a measure of the short time scale (1 month) variability of fast ice coverage.
- data_Alex_fasticeextent_20102018.mat [to open with RStudio and Matlab] dataset with all fast ice extent in October from 2010 to 2018
- (1) FIE mean: average fast ice extent (in km) across years for October for each lat/lon cells
- (2) FIE median: median fast ice extent (in km) across years for October for each lat/lon cells
- data_proba_presence_analysis_june22.RData [to open with RStudio] output file of script 2d, final dataset for further analysis
- Here is the list of variables numbered:
- meanslope: the slope grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in degrees);
- meanbathy: the bathymetric grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in m);
- cont300dist: distance of the cell to the 300-m isobath using the Near tool of ArcGIS (in km);
- cont800dist: distance of the cell to the 800-m isobath using the Near tool of ArcGIS (in km);
- ADPEdist: distance of the cell to the nearest Adélie penguin colony (in km).
- coords.x1: cell coordinate in longitude;
- coords.x2: cell coordinate in latitude;
- scaledmeanslope: scaled mean slope in a given cell;
- logmeanbathy: logarithm of the average bathymetry in a given cell;
- logcont300dist: logarithm of the distance to the 300-m isobath for a given cell;
- logcont800dist: logarithm of the distance to the 800-m isobath for a given cell;
- logADPEdist: logarithm of of the distance to the nearest Adélie penguin colony for a given cell;
- Trend: The fast ice trend is the trend calculated from March 2010 to 2018 for a given cell;
- Persistence: Persistence is a simple measure of the mean fast ice residence across the 9 year time series for a given cell;
- Min_timing: The timing of fast ice minimum represent the time of year when that pixel tends to reach min coverage;
- Max_timing: The timing of fast ice maximum represent the time of year when that pixel tends to reach max coverage;
- Volatility: Volatility is a measure of the short time scale (1 month) variability of fast ice coverage for a given cell;
- Mag_annual_cycle: The magnitude of the annual cycle is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1) for a given cell. The magnitude is zero both in regions of 0 and 100% persistence;
- empepresent: presence of an emperor penguin colony in a given cell (1; absent: 0)
- WESEdist: distance of the cell to Weddell seals (in km);
- logWESEdist: logarithm of the distance to to Weddell seals for a given cell;
- FIEmean: average fast ice extent (in km) across years for October for a given cell;
- FIEmedian: median fast ice extent (in km) across years for October for a given cell;
- EPnames: names of emperor penguin colony if present in a given cell;
- Region: corresponding region of a given cell;
- nearempedist: distance to the nearest Emperor penguin colony for a given cell (in km);
- lognearempedist: logarithm of the distance to the nearest Emperor penguin colony for a given cell;
- logmeanFIE: logarithm of the average fast ice extent (in km) across years for October for a given cell;
- logmedianFIE: logarithm of the median fast ice extent (in km) across years for October for a given cell;
- Here is the list of variables numbered:
- datatot_fred_10kmbufferonabsence.xlsx [to open with RStudio] all environmental and biological data used in the analysis for presence (for which a buffer of 10km was included) and absence data.
- Here is the numbered list of variables:
- .id: cluster number from the analysis;
- Slope: the slope grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in degrees);
- Bathymetry: the bathymetric grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in m);
- Distance_to_isobath_800: distance of the cell to the 800-m isobath using the Near tool of ArcGIS (in km);
- Distance_to_ADPE: distance of the cell to the nearest Adélie penguin colony (in km);
- Fast_ice_trend: The fast ice trend is the trend calculated from March 2010 to 2018 for a given cell;
- Fast_ice_persistence: Persistence is a simple measure of the mean fast ice residence across the 9 year time series for a given cell;
- Timing_of_fast_ice_min: The timing of fast ice minimum represent the time of year when that pixel tends to reach min coverage;
- Timing_of_fast_ice_max: The timing of fast ice maximum represent the time of year when that pixel tends to reach max coverage;
- fast_ice_volatility: Volatility is a measure of the short time scale (1 month) variability of fast ice coverage for a given cell;
- Magnitude_of_fast_ice_annual_cycle: The magnitude of the annual cycle is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1) for a given cell. The magnitude is zero both in regions of 0 and 100% persistence;
- Distance_to_WESE: distance of the cell to Weddell seals (in km);
- Fast_ice_extent: median fast ice extent (in km) across years for October for a given cell;
- Distance_to_EMPE: distance to the nearest Emperor penguin colony for a given cell (in km);
- presence: presence of an emperor penguin colony in a given cell (1; absent: 0);
- class: same as .id, cluster number from the analysis;
- coords.x1: cell coordinate in longitude (degrees E);
- coords.x2: cell coordinate in latitude (degrees N);
- rem: column mentionning there is no cell to remove (all equal to 0).
- Here is the numbered list of variables:
- datatot_fred_20kmbufferonabsence.xlsx [to open with RStudio] all environmental and biological data used in the analysis for presence (for which a buffer of 20km was included) and absence data
- The list of variables and order are the same as datatot_fred_10kmbufferonabsence.xlsx, only the last three variables are absent (coords.x, coords.x2 and rem).
- datatot_fred_coords.xlsx [to open with RStudio] all environmental and biological data for presence and absence data to create buffers in script 6
- The list of variables and order are the same as datatot_fred_10kmbufferonabsence.xlsx, only the last variable is absent (rem).
- datatot_plotdensitydistribution.xlsx [to open with RStudio] dataset with all variables in order to plot density distribution for presence and absence data
- Here is the numbered list of variables:
- Slope: the slope grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in degrees).
- Bathymetry: the bathymetric grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in m).
- Distance_to_isobath_800: distance of the cell to the 800-m isobath using the Near tool of ArcGIS (in km).
- Distance_to_ADPE: distance of the cell to the nearest Adélie penguin colony (in km).
- Fast_ice_trend: the fast ice trend is the trend calculated from March 2010 to 2018 for a given cell.
- Fast_ice_persistence: persistence is a simple measure of the mean fast ice residence across the 9-year time series for a given cell.
- Timing_of_fast_ice_min: the timing of fast ice minimum represents the time of year when that pixel tends to reach minimum coverage.
- Timing_of_fast_ice_max: the timing of fast ice maximum represents the time of year when that pixel tends to reach maximum coverage.
- fast_ice_volatility: volatility is a measure of the short time scale (1 month) variability of fast ice coverage for a given cell.
- Magnitude_of_fast_ice_annual_cycle: the magnitude of the annual cycle is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1) for a given cell. The magnitude is zero both in regions of 0% and 100% persistence.
- Distance_to_WESE: distance of the cell to Weddell seals (in km).
- Fast_ice_extent: median fast ice extent (in km) across years for October for a given cell.
- Distance_to_EMPE: distance to the nearest Emperor penguin colony for a given cell (in km).
- presence: presence of an emperor penguin colony in a given cell (1; absent: 0).
- Here is the numbered list of variables:
- empegrid_june_2010_2018.RData [to open with RStudio] output of script 2c, biological and environmental integrated dataset
-
Here is a numbered list of variables, with:
- coastalCellId: id of the coastal cells. Variable not used in the study;
- gridCellId: id of the grid cell;
- meanslope: the slope grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in degrees);
- meanbathy: the bathymetric grid was averaged across 10 × 10 500-m cells using the Aggregate tool of ArcGIS (in m);
- glacierdist: distance to the nearest glacier/ice tongue (in m). Variable not used in the study;
- distToShore: nearest distance to the shore (in m). Variable not used in the study;
- cont300dist: distance of the cell to the 300-m isobath using the Near tool of ArcGIS (in m). Variable not used in the study;
- cont800dist: distance of the cell to the 800-m isobath using the Near tool of ArcGIS (in m);
- DecemberIcePresence: presence of fast ice in the given cell in December. Variable not used in the study;
- Persistence2Years: persistence of fast ice over the last 2 years in the given cell. Variable not used in the study;
- PredictabilityDec5Years: Predictability of fast ice in December over past 5 years (0–5). Variable not used in the study;
- ADPEname: Name of Adélie penguin colonies. Variable not used in the study;
- ADPEdist: Distance of the cell to the nearest Adélie penguin colony (in m);
- coords.x1: cell coordinate in longitude;
- coords.x2: cell coordinate in latitude;
- Persistence3Years: Persistence of fast ice over past 3 years (0-3). Variable not used in the study;
- PredictabilityOct5Years: Predictability of fast ice in October over past 5 years (0–5). Variable not used in the study;
- fastIceRatio: ratio of distance to edge/ice width. Variable not used in the study;
- ADPEabund: size of nearest ADPE colony (breeding pairs). Variable not used in the study;
- scaledmeanslope: scaled mean slope in a given cell;
- logmeanbathy: Log-transformed version of Bathymetry — the bathymetric grid averaged across 10 × 10 500-m cells using ArcGIS (in m);
- logglacierdist: logarithm of the distance to coastal glacier/ice tongue. Variable not used in the study;
- logdistToShore: logarithm of the distance to shoreline. Variable not used in the study;
- logcont300dist: logarithm of the distance to the 300-m isobath for a given cell; Variable not used in the study;
- logcont800dist: logarithm of the distance to the 800-m isobath for a given cell;
- logadpedist: Log-transformed version of Distance_to_ADPE;
- logADPEabund: logarithm of the size of nearest ADPE colony (breeding pairs). Variable not used in the study;
- wesepresent: presence of Weddell seals in a given cell. Variable not used in the study;
- tep1: Trend: The fast ice trend is the trend calculated from March 2010 to 2018 for a given cell;
- tep2: Persistence: Persistence is a simple measure of the mean fast ice residence across the 9-year time series for a given cell;
- tep3: Min_timing: The timing of fast ice minimum represents the time of year when that pixel tends to reach minimum coverage;
- tep4: Max_timing: The timing of fast ice maximum represents the time of year when that pixel tends to reach maximum coverage;
- tep5:Volatility: Volatility is a measure of the short time scale (1 month) variability of fast ice coverage for a given cell;
- tep6: Mag_annual_cycle: The magnitude of the annual cycle is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1) for a given cell. The magnitude is zero both in regions of 0% and 100% persistence;
- tep7: FEAT_32KM provides whether the coastal morphology is a bay or a cap at a 32 km scale (negative values for bays). This variable was not used in the study.
- tep8: Cosinus of ANGR32 (the compass direction of ‘MAG’ relative to the coastline at a 32 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). This variable was not used in the study.
- tep9: Sinus of ANGR32 (the compass direction of ‘MAG’ relative to the coastline at a 32 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). This variable was not used in the study.
- tep10: FEAT_64KM provides whether the coastal morphology is a bay or a cap at a 64 km scale (negative values for bays). This variable was not used in the study.
- tep11: Cosinus of ANGR64 (the compass direction of ‘MAG’ relative to the coastline at a 64 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). This variable was not used in the study.
- tep12: Sinus of ANGR64 (the compass direction of ‘MAG’ relative to the coastline at a 64 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). This variable was not used in the study.
- tep13: MAG128 is the magnitude of coastal complexity at a 128 km scale regarding geomorphology, on a dimensionless scale 0–100 (negative values for bays). This variable was not used in the study.
- tep14: Cosinus of ANGR128 (the compass direction of ‘MAG’ relative to the coastline at a 128 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). This variable was not used in the study.
- tep15: Sinus of ANGR128 (the compass direction of ‘MAG’ relative to the coastline at a 128 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). This variable was not used in the study.
- empepresent: Presence of an emperor penguin colony in a given cell (1; absent: 0);
- ADPEcount: number of Adélie penguin in the nearest colony. Variable not used in the study;
- WESEdist: Distance of the cell to Weddell seals (in km);
- logwesedist: Log-transformed version of Distance_to_WESE;
- WESEcell: id cell with the presence of Weddells seals. Variable not used in the study;
-
- ice_sheet_GEBCO.RData [to open with RStudio] Antarctic contours
- magannualcycle_p4_v2.2.img [to open with Matlab] Magnitude of the fast ice annual cycle; fast ice data files used in script 1
- maxtiming_p4_v2.2.img [to open with Matlab] The timing of fast ice maximum, the time of year when that pixel tends to reach max coverage; fast ice data files used in script 1
- mintiming_p4_v2.2.img [to open with Matlab] The timing of fast ice minimum, the time of year when that pixel tends to reach min coverage; fast ice data files used in script 1
- persistence_p4_v2.2.img [to open with Matlab] Persistence, simple measure of the mean fast ice residence across the 9 year time series; fast ice data files used in script 1
- sara_distance_data_66694_432_float.dat [to open with Matlab] fast ice extent from 2000 to 2018, calculated from the 15 days median for each month per cell in kilometer.
- This file contains fast ice extent values (in km) for 66694 cells and 432 time steps.
- time_steps.xlsx [to open with RStudio] time steps for extracting fast ice extent
- (1) index: index corresponding of the different time steps;
- (2) year: year name;
- (3) month: month name.
- tot_env_pers_000001_oct21_20102018b.tif [to open with RStudio] raster stack saved into a GeoTIFF file containing all environmental variables, output of script 2b
-
Here is a numbered list of raster stack:
- Raster stack 1: Trend – The fast ice trend is the trend calculated from March 2010 to 2018 for a given cell.
- Raster stack 2: Persistence – Persistence is a simple measure of the mean fast ice residence across the 9-year time series for a given cell.
- Raster stack 3: Min_timing – The timing of fast ice minimum represents the time of year when that pixel tends to reach minimum coverage.
- Raster stack 4: Max_timing – The timing of fast ice maximum represents the time of year when that pixel tends to reach maximum coverage.
- Raster stack 5: Volatility – Volatility is a measure of the short time scale (1 month) variability of fast ice coverage for a given cell.
- Raster stack 6: Mag_annual_cycle – The magnitude of the annual cycle is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1) for a given cell. The magnitude is zero both in regions of 0% and 100% persistence.
- Raster stack 7: FEAT_32KM provides whether the coastal morphology is a bay or a cap at a 32 km scale (negative values for bays). Not used in the study.
- Raster stack 8: Cosinus of ANGR32 (the compass direction of ‘MAG’ relative to the coastline at a 32 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). Not used in the study.
- Raster stack 9: Sinus of ANGR32 (the compass direction of ‘MAG’ relative to the coastline at a 32 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). Not used in the study.
- Raster stack 10: FEAT_64KM provides whether the coastal morphology is a bay or a cap at a 64 km scale (negative values for bays). Not used in the study.
- Raster stack 11: Cosinus of ANGR64 (the compass direction of ‘MAG’ relative to the coastline at a 64 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). Not used in the study.
- Raster stack 12: Sinus of ANGR64 (the compass direction of ‘MAG’ relative to the coastline at a 64 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). Not used in the study.
- Raster stack 13: MAG128 is the magnitude of coastal complexity at a 128 km scale regarding geomorphology, on a dimensionless scale 0–100 (negative values for bays). Not used in the study.
- Raster stack 14: Cosinus of ANGR128 (the compass direction of ‘MAG’ relative to the coastline at a 128 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). Not used in the study.
- Raster stack 15: Sinus of ANGR128 (the compass direction of ‘MAG’ relative to the coastline at a 128 km scale – e.g. directly offshore being 0°, facing directly left 270°, facing right 90°, etc). Not used in the study.
Let me know if you'd like to label the unused variables with placeholders or remove them entirely.
-
- trend_p4_v2.2.img [to open with Matlab] fast ice trend, trend calculated from March 2010 to 2018 per cell; fast ice data files used in script 1
- volatility_1m_p4_v2.2_nobug.img [to open with Matlab] Volatility, a measure of the short time scale (1 month) variability of fast ice coverage, fast ice data files used in script 1
Usage Notes
- The csv, txt and xlsx files are spreadsheets that can be accessed with, for example, LibreOffice.
- The mat, RData, tif and nc files can be opened in R using the functions rmatio::read.mat (e.g. mat<- rmatio::read.mat("data_Alex_20102018_v2.mat"), base::load (e.g. load("wesegrid.RData"), raster::stack (e.g. stk<-stack("tot_env_pers_000001_oct21_20102018b.tif"), and ncdf4::nc_open (e.g. data<-ncdf4::nc_open("Antarctic_Cx_11_class.nc")).
- The img and dat files are binary files and can be opened in Matlab using the function fopen and fread by providing the number of cells (e.g. for fid = fopen('circumfi_1km_lats.img'); lat = fread(fid, [5625 4700], 'float32')).
Code/Software
Script 1 – Matlab software / open fast ice data from Alexander Fraser;
The script opens each fast ice variables and combine them into two single data sets “data_Alex_20102018_v2.mat” and “data_Alex_fasticeextent_20102018.mat”.
Details of each variable is provided below:
Fast ice variables:
Volatility (i) is a measure of the short time scale (1 month) variability of fast ice coverage.
Persistence (ii) is a simple measure of the mean fast ice residence across the 9 year time series.
The timing of fast ice maximum and minimum (iii and iv) represent the time of year when that pixel tends to reach max/ min coverage. We then transform the date for fast ice maximum into a categorical value as follows: early was −5 (July), median 0 (September), and late was 5 (December). Values of min and max timing of fast ice are discarded (multiplied by 0) when the magnitude of the annual cycle is below 0.4, because these values represent regions of extremely high or low fast ice persistence, so timing results are noisy/biased. For fast ice minimum, the timing was coded as follows: early was −5 (December), median 0 (March), late was 5 (May).
The magnitude of the annual cycle (v) is a measure of the magnitude of the annual cycle of fast ice (values between 0 and 1). The magnitude is zero both in regions of 0 and 100% persistence.
The fast ice trend (vi) is the trend calculated from March 2010 to 2018 per cell.
The fast ice extent (vii) was calculated from the median of the first 15 days of October among years per cell in kilometer.
Script 2a – R software / create a buffer of 3 km around colony locations;
The script creates a buffer of 3 km around emperor penguin colonies, resulting on a file with an initial grid resolution of 0.025° (i.e., ∼1 km). Emperor penguin presence cells were all the cells within 3 km of the 55 emperor penguin colony locations. The output file is “colonies_with_buffer3k_withcells_0_025_nocoastmask.xlsx”.
Script 2b - R software / prepare fast ice data;
The script performs various data processing tasks related to spatial analysis and visualization.
Steps:
- Open Fast Ice Data: Reads fast ice data from a MATLAB file.
- Redimension the Data: Reshapes the data into raster format and assigns values to different variables representing trends, persistence, minimum timing, maximum timing, volatility, and magnitude of the annual cycle.
- Plot to Verify: Generates plots to verify the loaded data.
- Correct Fast Ice Min and Max Timing: Adjusts extreme values for minimum and maximum timing based on the histogram and quantiles of the data.
- Open Coastline: Loads coastline data for visualization.
- Stack All Variables: Combines all variables into a raster stack.
- Transform and Project the Raster Stack: Projects the raster stack to fit the presence/absence data.
- Open File with Colony Location: Reads a file containing colony location data.
- Open Geomorphology Data: Reads geomorphology data from a NetCDF file and performs rasterization (not used in this study, therefore not detailed).
- Mask on the Fast Ice Persistence: Masks the raster data based on fast ice persistence.
- Stack All Variables Together: Combines all variables into a single raster stack.
- Stack All Without Trend but With All Geomorphology Data (stack from 7 to 15, not used in this study): Combines all variables, excluding trend, with geomorphology data into a single raster stack. Values and units are described above. "Trend" = fast ice trend, "Persistence" = fast ice persistence, "Min_timing" and "Max_timing" = time of year when that pixel tends to reach max/ min coverage,"Volatility" = measure of the short time scale (1 month) variability of fast ice coverage,"Mag_annual_cycle" = measure of the magnitude of the annual cycle of fast ice
- Write Output: Writes the final raster stack to a GeoTIFF file “tot_env_pers_000001_oct21_20102018b.tif”
Script 2c - R software / combine all data (biological and fast ice variables + presence/absence) on a grid;
The script performs various data processing tasks related to spatial analysis and integration of environmental variables and species presence data.
Steps:
- Take Initial Grid and Reproject to Polar Stereographic: Reprojects the initial grid to a Polar Stereographic projection.
- Convert to Brick: Converts the raster stack to a raster brick object.
- Aggregate to 5 Km Grid: Aggregates the grid to a 5 Km resolution.
- Add Covariates to the Grid: Adds environmental covariates to the grid.
- Add Emperor Penguin Presence Data: Reads emperor penguin presence data and integrates it with the grid.
- Add Adelie Penguin Data: Reads Adelie penguin data and integrates it with the grid.
- Find Distance to Nearest Weddell Seal: Calculates the distance to the nearest Weddell seal for each grid cell.
- Save Output: Saves the final integrated dataset “empegrid_june_2010_2018.RData”.
Script 2d - R software / final preparation of the dataset;
The script involves final data processing tasks to prepare the final dataset for analysis.
Steps:
- Load EMPE Grid Data: Loads the emperor penguin grid data.
- Add Adelie Penguin Presence: Determines the presence or absence of Adelie penguins in each region.
- Add Fast Ice Extent Data: Loads and adds fast ice extent data to the dataset.
- Add Emperor Penguin Colony Names: Adds emperor penguin colony names to the dataset.
- Change Units of Fast Ice Variables: Adjusts the units of certain fast ice variables.
- Remove Unused Variables: Removes variables that will not be used in the analysis.
- Save Final Dataset: Saves the final dataset for further analysis "data_proba_presence_analysis_june22.RData".
Script 3 - R software / plot map colonies
The script uses ggplot2 to plot the colony locations and adds labels to enhance readability.
Script 4 - R software / density plots with all variables – Figure S5
The script utilizes ggplot2 to create density plots for each variable, comparing the distribution between presence and absence of penguin colonies.
Script 5 - R software / analysis
The script performs all the following analysis.
Data Preprocessing
The presence analysis data is processed to include relevant variables and clean unnecessary columns. Absence/presence data is separated, and principal component analysis (PCA) is performed on the presence data.
Principal Component Analysis (PCA)
PCA is conducted on the presence data to identify the main contributing variables affecting the presence of emperor penguin colonies. The script calculates the contribution of each variable to the principal components and visualizes the results.
Visualization
The script generates several figures for visualization, including:
Figure 2a: Visualization of PCA variables on the first and second principal components.
Figure 2b: Variance contribution of PCA variables on the first two principal components.
Figure S2a, S2b, S2c: Geographic representation of emperor penguin colony presence based on the first, second, and third principal components, respectively.
Mixture modeling: The script performs mixture modeling (clustering) using the Mclust package on Principal Component Analysis (PCA) factors.
Visualization: It generates scatter plots (Figure_4.png) to visualize the classification results obtained from the mixture modeling.
Probability calculations: The script computes probabilities and uncertainties based on the clustering results.
Geospatial plotting: It creates spatial plots (Figure_3a.png, Figure_3b.png, Figure_6a.png, and Figure_6b.png) showing the spatial distribution of clusters and probabilities on a map.
Boxplots: The script generates boxplots (Figure_5.png) to compare the distribution of variables across different clusters and between presence and absence classes.
Statistical tests: It performs statistical tests (Wilcoxon tests) to compare variables between clusters and between presence and absence classes.
Visualization: link between population size from 2009 and clusters (Figure_S7.png)
Output: Finally, it saves the generated plots and possibly some data tables as images or Excel files.
Script 6 - R software / analysis
The script creates buffers of 10km, 20km, and 30km around colony locations to study their relationship with sea ice. It generates boxplots to visualize the distribution of various environmental variables concerning the presence or absence of penguin colonies including these buffers. The statistical significance of the presence/absence of penguin colonies concerning environmental variables is assessed using Wilcoxon tests.
Scripts are organized based on the sequence of data preparation and analysis, with datasets being invoked and stored within the workflow.
