A dynamic foraging habitat distribution estimate for green turtles in the Great Barrier Reef
Data files
Jan 06, 2026 version files 54.58 GB
-
20240704ssm.Rds
76.90 MB
-
20240806AppendedAll.csv
880.44 MB
-
bathy.tif
2.04 GB
-
bathy.tif.aux.xml
1.52 KB
-
COMMUNITY500.tif
81.77 MB
-
crwunmod_preds.asc
135.20 MB
-
crwunmod_preds2022.asc
135.18 MB
-
dist2coast.tif
2.04 GB
-
dist2coast.tif.aux.xml
418 B
-
dist2recboats.tif
2.04 GB
-
dist2recboats.tif.aux.xml
441 B
-
dist2reefs.tif
2.04 GB
-
dist2reefs.tif.aux.xml
422 B
-
dist2rivers.tif
2.04 GB
-
dist2rivers.tif.aux.xml
424 B
-
EFI2010.tif
2.04 GB
-
EFI2010.tif.aux.xml
447 B
-
EFI2022.tif
2.04 GB
-
EFI2022.tif.aux.xml
450 B
-
geohab500.tif
81.77 MB
-
geohab500.tif.aux.xml
424 B
-
geomorph500.tif
81.77 MB
-
mangroves2010.tif
2.04 GB
-
mangroves2010.tif.aux.xml
420 B
-
mangroves2022.tif
2.04 GB
-
mangroves2022.tif.aux.xml
417 B
-
mask.tif
2.04 GB
-
mask.tif.aux.xml
380 B
-
mean_cur2010.tif
2.04 GB
-
mean_cur2010.tif.aux.xml
440 B
-
mean_cur2022.tif
2.04 GB
-
mean_cur2022.tif.aux.xml
440 B
-
mean_wspeed2010.tif
2.04 GB
-
mean_wspeed2010.tif.aux.xml
437 B
-
mean_wspeed2022.tif
2.04 GB
-
mean_wspeed2022.tif.aux.xml
435 B
-
README.md
17.62 KB
-
ruggedness.tif
2.04 GB
-
ruggedness.tif.aux.xml
439 B
-
salt2010.tif
2.04 GB
-
salt2010.tif.aux.xml
436 B
-
salt2022.tif
2.04 GB
-
salt2022.tif.aux.xml
437 B
-
seagrassp.tif
2.04 GB
-
seagrassp.tif.aux.xml
425 B
-
Secchi2010.tif
2.04 GB
-
Secchi2010.tif.aux.xml
420 B
-
Secchi2022.tif
2.04 GB
-
Secchi2022.tif.aux.xml
435 B
-
slope.tif
2.04 GB
-
slope.tif.aux.xml
441 B
-
temp2010.tif
2.04 GB
-
temp2010.tif.aux.xml
434 B
-
temp2022.tif
2.04 GB
-
temp2022.tif.aux.xml
436 B
-
tidalexposure.tif
2.04 GB
-
tidalexposure.tif.aux.xml
411 B
-
ZooL_N2010.tif
2.04 GB
-
ZooL_N2010.tif.aux.xml
436 B
-
ZooL_N2022.tif
2.04 GB
-
ZooL_N2022.tif.aux.xml
437 B
Abstract
A detailed understanding of how protected species use their habitats can guide management interventions in areas of high human use. For marine turtles, different food availability and physical habitat characteristics can underpin turtle presence at anthropogenically modified compared to unmodified sites. We develop telemetry-based habitat models with boosted regression trees to identify the environmental characteristics underpinning foraging habitat suitability for green turtles in the Great Barrier Reef region. We fit models to green turtle Fastloc GPS tracks from both modified and unmodified inshore foraging sites and using pseudo-absences (simulated correlated random walks). We assess model performance by the ability to predict known foraging areas, true skill statistic, explanatory power (percent deviance explained) and predictive skill (AUC) of the models. We then predict potentially suitable foraging areas for green turtles in the Great Barrier Reef region using the model for unmodified habitats. Between 2010 and 2022, the total area of suitable foraging habitat declined by 41.2%, and nearshore habitat suitability retracted. These areas are likely affected by floods, development and increased turbidity. In 2022, 50% of predicted suitable habitat fell within habitat protection zones, and 19.4% in Marine National Park Zones of the Great Barrier Reef Marine Park. A detailed foraging distribution of the species has not previously been compiled at this regional scale. Identifying biophysical drivers of habitat suitability can inform identification of possible foraging habitat in less data rich regions in Australia and overseas. Evaluating changes over time in habitat distribution provides insights into the degree to which broad-scale environmental changes and anthropogenic activities influence the condition and function of habitats, even within protected area boundaries.
The dataset contains simulated presences and pseudo-absences derived from Fastloc-GPS and ARGOS locations of green turtles (Chelonia mydas) tracked in inshore foraging habitats in eastern Queensland, Australia. It also contains the gridded environmental variables used to predict turtle distribution using telemetry-based habitat modelling. The environmental data files were derived from external, publically available sources. The full analysis workflow can be completed in R software using the code available on Github at https://github.com/egwebster/SSM-SDM-public.
Data and file structures
Primary data
The simulated turtle presences and pseudo-absences to be used as input for telemetry-based habitat modelling runs.
"20240704ssm.Rds" = R data file generated from prep&presences.Rmd, containing estimated locations of turtles at 12 hour intervals using movement persistence models in aniMotum. Variables are:
- id: turtle individual identifier. Turtle primary flipper tag number from Queensland Turtle Conservation Program database, device number deployed on turtle (PTT), and integer denumerating independent portion of track (independent portions are determined where there is a gap in the data of 72 hours or more), separated by "_".
- ssm: an aniMotum ssm fitted model object
- converged: True/False. Whether ssm model converged
- pdHess: Whether the Hessian matrix was positive-definite and could be solved to obtain parameter standard errors
- pmodel: ssm model type. 'mp' stands for movement persistence
"20240806AppendedAll.csv" = R data file containing presences and pseudo-absences with values of each environmental predictor at the corresponding location and timepoint. The datasets from which environmental predictors were derived for telemetry-based habitat modelling are given in Table 1:
| Data source | Variables selected | Resolution | Temporal resolution | Temporal range | Scale | Append method OR Transformations applied |
|---|---|---|---|---|---|---|
| eReefs hydrodynamic model (Herzfeld et al. 2016; CSIRO 2023a) | Temperature, salinity, mean seawater velocity, mean wind speed | 4 km | Daily | September 2010 to present | Great Barrier Reef | Extracted values at corresponding pixels (in space and time) to presences/pseudo absences |
| eReefs biogeochemical model (CSIROb; Baird et al. 2020) | Secchi depth, ecology fine inorganics (EFI i.e., sum of fine sediment and mud concentrations, derived as total suspended solids/1000), large zooplankton nitrogen | 4 km | Daily | December 2010 to April 2019 | Great Barrier Reef | Extracted values at corresponding pixels (in space and time) |
| Digital Earth Australia (Lymburner et al. 2020) | Mangrove canopy cover (Landsat): ‘Distance to mangroves’ | 30 m | Annual | 1987-2022 | Australia | Distance to pixels of at least 20% canopy cover in corresponding year |
| Geoscience Australia Intertidal Model Relative Extents (Geoscience Australia 2016) | Tidal exposure | 25 m | Static | 1987-2015 | National | Extracted values at corresponding pixels (in space only) |
| Carter (NESP TWQ Project 5.4, TropWATER, JCU) (Carter et al. 2021) | Seagrass probability | 30 m | Static | 1984-2023 | Coastal Great Barrier Reef | Extracted values at corresponding pixels (in space only) |
| Carter (NESP TWQ Project 5.4, TropWATER, JCU) (Carter et al. 2021) | Seagrass community type | Categorical | Static | 1984-2023 | Coastal Great Barrier Reef | Overlap with polygon feature |
| Heap and Harris (2008) | Seafloor geomorphological feature types | Categorical | Static | 2008 | Australia | Overlap with polygon feature |
| Queensland transport and main roads (Department of Transport and Main Roads 2022) | Recreational boating facilities: ‘Distance to boat ramps’ | Categorical | Static | 2021 | Queensland | Distance to pixels containing features |
| Geoscience Australia Intertidal Model Relative Extents (Geoscience Australia 2016) | Distance to coast | 25 m | Static | Queensland | Distance to nearest pixel exposed at highest 80%–100% of the observed tidal range (land) | |
| GBRMPA GBR features (Great Barrier Reef Marine Park Authority 2017) | Distance to reefs | Categorical | Static | 2003 | Queensland | Distance to nearest ‘reef’ pixel |
| Geoscience Australia Surface Hydrology Polygons (Crossman and Li 2015) | Distance to rivers | Categorical | Static | 2015 | Queensland | Distance to nearest ‘river’ pixel |
| Geoscience Australia (Dyall et al. 2005) | Geomorphic habitats of Australia: ‘Geohabitat’ | 30 m | Static | 2005 | Coastal Australia | Overlap with polygon feature |
| Beaman (2017) | Bathymetry, rugosity, slope | 100 m | Static | 2018 | Great Barrier Reef | Extracted values at corresponding pixels (in space only). Ruggedness was calculated with QGIS ‘Terrain Ruggedness Index’ and Slope with QGIS Raster analysis ‘Slope’ tool. |
Generated from Pseudoabsences.R & AppendtoEnviroData.Rmd. Variables are:
- X: unique row identifyer
- id: turtle individual identifyer. Turtle primary flipper tag number from Queensland Turtle Conservation Program database, device number deployed on turtle (PTT), and integer denumerating independent portion of track (independent portions are determined where there is a gap in the data of 72 hours or more), separated by "_".
- date: Calendar date GMT (YYYY-mm-dd HH:MM:SS)
- data_type: 'track' = presence; 'background' or 'crw' = pseudo-absences
- builtfeature: Binomial. Whether point is within spatial buffer of 0.0014 decimal degrees of a built feature.
- geohab: Categorical. Geomorphic habitats of Australia.
- geomorph: Categorical. Geomorphology.
- iteration: for pseudo-absences, simulation iteration number as an integer.
- bathy1: Bathymetry (m)
- Dist2Rivers1: Distance to rivers (km)
- dist2reefs1: Distance to reefs (km)
- dist2coast1_2: Distance to coast (km)
- dist2recboatfeat: Distance to boat ramps (km)
- seagrassP1: Seagrass probability (out of 1)
- tidalexposure1: Tidal exposure (percentile)
- mmpfrequency1: Longterm frequency of water types 1&2: "Acute flood frequency", calculation described in Gruber et al. 2024
- mmpexposure1: Longterm exposure to above guideline value concentrations of land-sourced pollutants: "Chronic floodwater exposure", calculation described in Gruber et al. 2024.
- ruggedness1: Ruggedness (m)
- slope1: Slope (degrees)
- aus.date: Calendar date AEST (YYYY-mm-dd HH:MM:SS)
- year: calendar year GMT (YYYY)
- deamangrove: Distance to mangroves (m)
- lon: Longitude (decimal degrees)
- lat: Latitude (decimal degrees)
- salt: Salinity (PSU)
- temp: Temperature (degrees C)
- mean_cur: Mean seawater velocity (m/s)
- mean_wspeed: Mean wind speed (m/s)
- ZooL_N: Large zooplankton nitrogen (mg N m⁻³)
- EFI: Ecology fine inorganics (i.e., sum of fine sediment and mud concentrations, derived as total suspended solids/1000)
- Secchi: Secchi depth (m)
- COMMUNITY: Categorical. Seagrass community type as defined by Carter et al. 2021
Secondary data
Input rasters for grid predictions provided as .tif files. Excludes acute flood frequency and chronic floodwater exposure variables derived from the TropWATER marine monitoring program datasets (Gruber et al. 2024) which are available on reasonable request. Units are given above.
- class : RasterLayer
- dimensions : 23531, 21698, 510575638 (nrow, ncol, ncell)
- resolution : 100, 100 (x, y)
- extent : 961853.5, 3131654, -3445441, -1092341 (xmin, xmax, ymin, ymax)
- crs : +proj=aea +lat_0=0 +lon_0=132 +lat_1=-18 +lat_2=-36 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs
- bathy.tif - Bathymetry
- COMMUNITY500.tif - Seagrass community type, resampled to 500m pixels. No auxiliary file required for this layer
- dist2coast.tif - Distance to coast
- dist2recboats.tif - Distance to boat ramps
- dist2reefs.tif - Distance to reefs
- dist2rivers.tif - Distance to rivers
- EFI2010.tif - Ecology fine inorganics December 2010
- EFI2022.tif - Ecology fine inorganics December 2022
- geohab500.tif - Geomorphic habitats resampled to 500m pixels
- geomorph500.tif - Geomorphology resampled to 500m pixels. No auxiliary file required for this layer
- mangroves2010.tif - Distance to mangroves 2010
- mangroves2022.tif - Distance to mangroves 2022
- mask.tif - Modelled extent
- mean_cur2010.tif - Mean surface velocity December 2010
- mean_cur2022.tif - Mean surface velocity December 2022
- mean_wspeed2010.tif - Mean wind speed December 2010
- mean_wspeed2022.tif - Mean wind speed December 2022
- ruggedness.tif - Ruggedness
- salt2010.tif - Salinity December 2010
- salt2022.tif - Salinity December 2022
- seagrassp.tif - Seagrass probability
- Secchi2010.tif - Secchi depth December 2010
- Secchi2022.tif - Secchi depth December 2022
- slope.tif - Slope
- temp2010.tif - Temperature December 2010
- temp2022.tif - Temperature December 2022
- tidalexposure.tif - Tidal Exposure
- ZooL_N2010.tif - Large zooplankton Nitorogen December 2010
- ZooL_N2022.tif - Large zooplankton Nitrogen December 2022
- Each of the above (except for COMMUNITY500.tif and geomorph500.tif) was generated in QGIS and is accompanied by an auxiliary metadata file (XX.tif.aux.xml) containing raster statistics, coordinate systems and projection information.
Final model outputs are provided as ascii grids named:
-
"crwunmod_preds.asc" - predictor data are from December 2010
-
"crwunmod_preds2022.asc"- predictor data are from December 2022
Cell values represent probability of turtle presence out of 1.
Usage Instructions
The analysis workflow is documented in the following scripts which should be run in R software (see packages and versions below) in this order:
- prep&presences.Rmd data filtering and movement persistence modelling using the aniMotum R package (https://ianjonsen.github.io/aniMotum/) to derive presences from raw tracking data. Calls LoadPackages_PA.R (adapted from Hazen et al. 2021) & raw data (available upon reasonable request from Queensland Department of Environment, Science and Innovation's Threatened Species Unit).
- Pseudoabsences.R simulates pseudo-absences (background sampling and correlated random walks). Calls PseudoFunctions.R (adapted from Hazen et al. 2021).
- AppendtoEnviroData.Rmd workflow for appending environmental covariates to presences and pseudo-absences (see Table 1 in manuscript). Calls the following R scripts:
- ExtractEReefs.R, ExtractEReefsCRW.R & ExtractEReefsbackground.R append daily eReefs variables to presences/pseudo-absences. Run on a containerised environment on the JCU HPC.
- habitatmodel.Rmd create BRT models, quantify relative importance of environmental covariates, generate partial plots and predict to GBR grid at 2 time points.
The .tif and .asc files can be viewed in GIS software. GIS analysis was conducted in QGIS Desktop 3.34.1.
R software information
sessionInfo() R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale: [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8 LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8
time zone: Australia/Brisbane tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached): [1] gtable_0.3.4 dplyr_1.1.2 compiler_4.3.1 visdat_0.6.0 tidyselect_1.2.0 Rcpp_1.0.11 spatialEco_2.0-2 scales_1.2.1
[9] yaml_2.3.7 fastmap_1.1.1 lattice_0.21-8 ggplot2_3.4.3 R6_2.5.1 generics_0.1.3 pdp_0.8.1 knitr_1.43
[17] iterators_1.0.14 tibble_3.2.1 munsell_0.5.0 pillar_1.9.0 rlang_1.1.1 utf8_1.2.3 sp_2.0-0 terra_1.7-39
[25] xfun_0.40 cli_3.6.1 magrittr_2.0.3 digest_0.6.33 foreach_1.5.2 grid_4.3.1 rstudioapi_0.15.0 lifecycle_1.0.4
[33] vctrs_0.6.3 evaluate_0.23 glue_1.6.2 raster_3.6-23 codetools_0.2-19 fansi_1.0.4 colorspace_2.1-0 rmarkdown_2.24
[41] tools_4.3.1 pkgconfig_2.0.3 htmltools_0.5.6
JCU HPC information
R v4.1.2u2 software is available in a singularity container.
Platform: x86_64-pc-linux-gnu (64-bit)
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.2
Code supporting this dataset is available at https://github.com/egwebster/SSM-SDM-public
Turtles were tracked during foraging at Port Curtis, Shoalwater Bay and post-nesting from Raine Island. Turtles were tracked with Fastloc-GPS (mean error 40m) and/or ARGOS tags (mean error ranging <1km to >10km) opportunistically between 2010 and 2021, resulting in a mixed sample of sex and maturity cohorts across the three study sites. The tracking dataset included 85 turtles at Port Curtis (mean±SD CCL= 84.78±21.66 cm), four at Shoalwater Bay (mean±SD CCL= 92.93±2.33 cm) and five who migrated to foraging grounds from Raine Island (mean±SD CCL= 102.89±2.72 cm, Appendix 2). The Port Curtis turtles comprised of 35 females, 36 males and 14 unidentified sex. Eighteen were juveniles, 24 were subadult turtles and the remainder were mature (49% of turtles in modified habitats were immature). At Shoalwater Bay one was female andthree were males, only one of the four was a juvenile. The Raine Island turtles were all post-nesting adult females (11% of turtles in unmodified habitats were immature). Capture of turtles, transmitter attachment and data collection for this study were approved by either the JCU Animal Ethics Committee or the Department of Agriculture and Fisheries Ethics Committee and conducted within an approved Queensland Government project.
We discarded the first 24 hours of the tracks from Port Curtis and Shoalwater Bay. We removed duplicates and empty rows, spurious locations determined from unrealistic travel speeds and turning angles (max speed 9.9km/h and threshold inner angle 90 degrees), and locations falling above the high tide line using the SDLfilter R package (25). We treated tracks as independent when a single track contained gaps of more than 72 hours between successive locations, and only retained independent tracks consisting of more than 20 locations. We used both Fast-loc GPS and ARGOS locations where available, but only retained ARGOS points with location classes 1, 2 and 3. We removed nesting and post-nesting tracks of Raine Island turtles by visually identifying the end of directed travel from Raine Island. We also removed possible breeding migrations of Port Curtis turtles identified as a long-distance (greater than 150 km) departure from their release site to known breeding sites for the species during summer months. The filtered tracking data contained 21,578 ARGOS and 56,409 Fast-loc GPS locations.
To identify the environmental characteristics that make foraging habitat suitable for green turtles we developed telemetry-based habitat models using the tracking data. These models combine species presences from telemetry data, and pseudo-absences generated by models detailed below, with environmental data to first identify environmental predictors underpinning species presence and then predict distributions in space and time. We adopted the approach of Hazen et al. 2021 (26) whereby telemetry-based habitat models are constructed using interpolated locations of tracked animals as presences, and pseudo-absences are generated using correlated random walks. Though similar to species distribution modelling, telemetry-based habitat modelling does not use abundance or true presence data to simulate a realized distribution of a species. Instead, presences derived from tracking are spatially and temporally autocorrelated and therefore spatially biased, and pseudo-absences are simulated to control for these biases. For example, pseudo-absences can be generated from simulations that mimic the movement process so that both presences and pseudo-absences have the same autocorrelation structure. The model output reflects the habitat conditions that best explain turtle presence.
a) Presences and pseudo-absences
We derived presences from the tracking data by interpolating locations at regular intervals, accounting for telemetry error, with time-varying move persistence models in the aniMotum R package (27). We have attempted to reduce autocorrelation and imperfect detection biases with state space models (movement persistence models), which interpolated the irregularly sampled data points to produce an estimated location at at least 12-hour intervals (27,28). We assumed the data had symmetric spatial error, using aniMotum’s default error margins for ARGOS locations, and estimated error from (21) for Fast-loc GPS. For each independent track, we examined state-space models with diagnostic visualisations of 1) a time-series, b) qq-plots and c) autocorrelation functions. By visual examination of candidate state-space model outputs we determined the most appropriate timestep, 12 hours where possible, or lower resolution models for tracks with sparser data (24 or 48 hours). Sparse data resulted in non-convergence or unrealistic patterns (straight lines or perfect loops), which were discarded.
We generated pseudo-absences via correlated random walks (CRW, i.e., where the animal could have gone based on track simulations defined by step lengths and turning angles from the interpolated tracking data; each simulated point is overlayed with the ‘worldHires’ map from the mapdata R package (29)and if the point is on land it is not used and a new point is generated, Appendix 2) as per (26). CRW models were used to infer drivers of turtle presences on local scales. Our approach assumes presences and pseudo-absences are independent, despite tracking data being inherently autocorrelated in space and time. Turtles tracked for longer are therefore more represented in the presence/pseudo-absences, as are turtles tracked in Port Curtis compared to other sites (Appendix 2).
b) Environmental predictors
To investigate the relationship between probability of turtle presence and environmental conditions, we developed a set of environmental predictor layers from which to ascertain the conditions at each tracked location in space and time. We catalogued freely available environmental datasets with spatial coverage of the coastal Great Barrier Reef region, and selected variables with known relevance to green turtle foraging ecology based on current literature, with at least 4 km spatial resolution, and appropriate temporal coverage of the tracking period (static datasets developed since 2010 or time-varying spatial data covering late 2010 to late 2019). Selected datasets are summarized in Table 1.
We sampled values of each environmental predictor for each presence and pseudo-absence point corresponding to their timestamp and location. For static datasets (a single, non-time-varying layer) (Table 1) we used terra::extract() for grid data, and sp::over() for shapefile (categorical) data. We extracted values of eReefs variables via a custom R script to query the AIMS THREDDS server. We downloaded annual mangrove maps (30) from the Digital Earth Australia data cube using Dask and used these to generate annual maps of Euclidean distance to pixels of at least 20% canopy cover [i.e., minimum class where mangroves were present in the mangrove dataset (30)] in QGIS. Additionally, we generated 30m (to match the high resolution of bathymetry data) grids of distance to recreational boating facilities, distance to coast, distance to reefs, and distance to rivers in QGIS (details in Table 1)]. Prior to modelling, we evaluated the temporal coverage of the variables derived from the eReefs biogeochemical model with data hosted on the THREDDS server only being available from late 2010 until April 2019. We therefore removed presences and pseudo-absences corresponding with timestamps after April 2019, corresponding to entire tracks of 22 turtles that were tracked in early 2010 or after April 2019. The final collection of turtle tracks used for model development is summarized in Appendix 1.
c) Modelling approach
We used boosted regression trees to develop the telemetry-based habitat models for 1) modified (Port Curtis) and 2) unmodified (Shoalwater Bay and foraging sites of Raine Island turtles) sites separately, with the gbm.fixed() function of the dismo R package (67). These models use presences and pseudo-absences as the response, and the environmental values as predictors. This approach assumes that presences and pseudo-absences are not autocorrelated (26,68). For each model formulation we used five-fold cross validation, whereby the dataset was split into five parts, with each part acting as a test set for one model, and the remaining parts as the training set. We used a tree complexity of five, learning rate of 0.005, a maximum of 10000 trees, and a bag fraction of 0.75. We plotted the relationship of each predictor in the input data with the response, to inform our specification of the var.monotone argument of gbm.fixed(). The initial folds were generated including all of the input predictors and a large (>10,000) number of trees. We then re-ran the cross-validation, removing those predictors that were relatively unimportant (in the lowest 10% in every fold) to simplify the model and pruned using the out of bag method to adjust the n.trees parameter to encompass the best number of iterations of all the folds. Variable importance is assessed as the total contribution by each variable to a reduction in the loss function.
We evaluated model performance with averages of the performance metrics from the five folds: predictive skill with area under the curve (AUC), true skill statistic (TSS), true positive rate (TPR) and explanatory power with percent deviance explained. The final model was trained on 100% of the input data with the refined hyper-parameters obtained during the cross-validation process.
To investigate environmental predictors in relation to foraging habitat for green turtles, we obtained a ranking of the most influential environmental predictors. We examined the shape of the relationship with the response for each predictor with variable importance greater than 100/number of predictors in the model, with partial deviance plots. Influential two-way interactions were identified with gbm.interactions().
