Skip to main content

Hull fouling marine invasive species pose a very low, but plausible, risk of introduction to East Antarctica in climate change scenarios

Cite this dataset

Holland, Oakes; Shaw, Justine; Stark, Jonathan; Wilson, Kerrie (2021). Hull fouling marine invasive species pose a very low, but plausible, risk of introduction to East Antarctica in climate change scenarios [Dataset]. Dryad.


Aims: To identify potential hull fouling marine invasive species that could survive in East Antarctica presently and in the future.

Location: Australia's Antarctic continental stations: Davis, Mawson and Casey, East Antarctica; and subantarctic islands: Macquarie Island and Heard and McDonald Islands.

Methods: Our study uses a novel machine-learning algorithm to predict which currently known hull fouling MIS could survive in shallow benthic ecosystems adjacent to Australian Antarctic research stations and subantarctic islands, where ship traffic is present. We used gradient boosted machine learning (XGBoost) with four important environmental variables (sea surface temperature, salinity, nitrate and pH) to develop models of suitable environments for each potentially invasive species. We then used these models to determine if any Australia's three Antarctic research stations and two subantarctic islands could be environmentally suitable for MIS now and under two future climate scenarios.

Results: Most of the species were predicted to be unable to survive at any locations between now and the end of the century, however, four species were identified as potential current threats, and five as threats under future climate change. Asterias amurensis was identified as a potential threat to all locations.

Main conclusions: This study suggests that the risk are very low, but plausible, that known hull fouling species could survive in the shallow benthic habitats near Australia's East Antarctica locations and suggest a precautionary approach is needed by way of surveillance and monitoring in this region, particularly if propagule pressure increases. Whilst some species could survive as adults in the region, their ability to reach these locations and undergo successful reproduction is considered unlikely based on current knowledge.


A full description of the methods can be found in the manuscript. A short summary is provided here:

1. As we were interested in hull fouling species, the global port network was used to build the model of environmental suitability for each species. Port occurring in fresh water environments, for example, the North American Great Lakes, or ports that had insufficient data were excluded from the dataset.

2. The Global Invasive Species Database ( was used to identify marine invasive species with an association to hull fouling. If there was insufficient data on the GISD record, primary literature was used to ascertain any hull fouling association. Whilst many of the species identified have a stronger association with ballast water as a means of transport, all species identified in this study do have an association with hull fouling also. This yielded a list of 160 species, though most belonged to the Didemnum spp. group and were not adequately resolved to species level. As most Didemnum species have no evidence of being invasive, only the known invasive member of this group Didemnum vexiluum was included in this study.

3. The Ocean Biogeographic Information System mapper was used to find occurrence records for each of the species identified. These occurrences were overlaid with the port locations and species were matched to ports when they were within 1.0 decimal degrees of each other. Data was condensed by port so that only one record of occurrence per species per port was taken. Where a species was matched to 10 or fewer ports, it was excluded as we did not have enough data to build the model.

4. Species were only considered for model building when:

  • For the subantarctic locations - the species had a recorded distribution where the minimum temperature was 11°C or less to align with maximum temperatures expected in the subantarctic with climate change.
  • For the Antarctic locations -  the species had a recorded distribution that included sub-freezing temperatures.
  • This resulted in a total of 33 species for consideration for the subantarctic and a subset of 20 species considered for the Antarctic locations.

5. We used four environmental variables for our models: Sea surface temperature, sea surface salinity, nitrate, and pH.

  • To build the model: SST and SSS were from the World Ocean Atlas version 2 ( at 1° resolution using objectively analysed means and monthly averages for 2005 - 2012, inclusive.
  • To build the model: NO3 and pH were from the CMIP5 CanESM2 RCP4.5 model as monthly averages for 2006-2012, inclusive.
  • Two models were built: one with annual aggregation of the environmental variables, and seasonal aggregation of the environmental variables.
  • Predictions were made for the current environment at each of the five Australian Antarctic locations.

6.  The same four environmental variables were used to make our predictions. Two future climate models were considered: RCP 4.5 (curbing of emissions) and RCP 8.5 (business-as-usual). These were all from the CMIP5 CanESM2 family of models, for RCP 4.5 and RCP 8.5. Predictions were made for:

  • 2030 - using monthly averaged data for 2026-2030
  • 2050 - using monthly averaged data for 2046-2050
  • 2100 - using monthly averaged data for 2096-2100, except for November and December as the RCP 8.5 models did not cover these dates so these were average monthly data for 2096-2099.

7. Data for model building were divided into training (70%) and testing (30%), using 'createDataPartition' in the R package 'caret'.

8. Due to imbalanced data, resampling using four methods was done to each training dataset for each species. The resampling methods were:

  • Over-sampling of the minority class
  • Under-sampling of the majority class
  • Both over- and under-sampling
  • Synthetic minority over-sampling
  • Further details of each of these resampling methods is found in the Supplementary Material of the manuscript.

9. Extreme gradient boosting (XGBoost) was used to build the models. Examples of the code used are found in the Supplementary Material of the manuscript.

10. The resampled model which most accurately predicted the presence of species was chosen, as long as the ability to predict absences was also high. That model was then used for predictions.

11. Predictions were made for the five Australian Antarctic and subantarctic locations.

Usage notes

All data in this dataset is from free, publically available datasets. Researchers looking to do similar work are advised to ensure the data they are using is from the latest version of these datasets. For example, World Ocean Atlas 2013 was used in this study, however, there is now a World Ocean Atlas 2018. Further, future predictions should make use of the CMIP6 datasets which are being made available now.