Dataset of biofouling epibionts on microalgae compiled from literature and environmental variables from open access databases
Data files
Mar 21, 2025 version files 7 MB
-
DESCRIPTION.md
2.49 KB
-
env_bibli_5000.csv
3.67 MB
-
Main_script_Kelps_Epibionts_Glasgow.Rmd
123.97 KB
-
README.md
17.02 KB
-
Taxo_ok8.csv
3.19 MB
Abstract
Macroalgae are increasingly studied for their critical contributions to coastal ecosystems, their potential to mitigate climate change, and their promise as a sustainable food source. While wild macroalgae host diverse epiphytic and invertebrate epibiont communities that enhance biodiversity and ecosystem functioning, biofouling epibionts on farmed macroalgae can negatively impact growth, physiology, and product quality. Although an increasing number of longitudinal studies are trying to establish the drivers of macroalgae biofouling, localized approaches are lacking sufficient contrasts in environmental conditions to reveal macroecological patterns in epibiont occurrence. To gain these contrasts, we analyze data on macroalgae and epibiont taxonomy, study location, and environmental conditions that we have compiled from a systematic literature review and from the Marine Copernicus and NASA OBPG databases of marine data. Our results show that 58.18% of macroalgae epibiont studies focus on the North-East Atlantic coast, which is particularly useful in understanding the potential for expansion of seaweed aquaculture in this region. Bryozoan fouling depends on sea surface temperature (SST), and an increased biofouling risk was predicted for latitudes greater than 58° in the NE Atlantic coast and around coastal areas in Scotland with cold freshwater inflows. Hydrozoans and gastropods showed a higher probability of occurring on farmed or planted as opposed to wild kelp, whereas gastropods tended to be absent at salinities lower than 30 psu. Our findings provide a first basis for understanding seaweed biofouling risks in the North-East Atlantic and can serve for spatial planning of the positioning of new seaweed farms.
Dataset DOI: 10.5061/dryad.sxksn03d8
Description of the data and file structure
Author: victoria.delannoy@protonmail.com (phd student, Marbec, France)
Supervisor: Dr Sofie Spatharis, senior lecturer at Glasgow University
The first file provided (env_biblio_5000), is the one used for model computation. It was built using a list of epibionts observations from the literature, textual information about the sampling site, taxonomical enrichment using WoRMS website, and environmental parameters obtained from Copernicus Marine data center and NASA OBPG website.
The second file (Taxo_ok8) contains the same list of epibionts observations from the literature, textual information about the sampling site, taxonomical enrichment using WoRMS website, and is regularly called throughout the script for dataset quality description or diversity analysis, for instance. Any complementary information can be found in Delannoy et al. 2025 10.1111/raq.70017 .
Files and variables
File: Taxo_ok.csv
Variables
-
Publication_N: ID of each paper, except for ID 55 which represents our own observation, for more information, please see 10.1111/raq.70017
-
Algae_sc: Algae species (or the smallest level of identification provided) on which the epibionts were observed.
-
Collection_season: Season at which samples were taken, taking into account for northern and southern hemisphere. Several modalities are available : Spring, Summer, Autumn/Fall, Winter or All. Authors may have pulled the list of observations from different seasons into one list, in those cases, Collection_season may contain a list of several seasons like “Winter, Spring”, class is still ‘character’.
-
Year_sample: Year of sampling (or estimated year of sampling when the exact year was not provided, year of sampling were estimated using : year of publication - 3 years). When a single list of epibionts was given for several sampling across different years, we retained the median of those years, rounded down to the nearest whole number. Those estimations have little influences as the environmental values are averaged on all the period of available environmental data.
-
Latitude: Latitude of the sampling point, in decimal degrees, some points were estimated, please see column ‘Position’ and 10.1111/raq.70017.
-
Longitude: Longitude of the sampling point, in decimal degrees, some points were estimated, please see column ‘Position’ and 10.1111/raq.70017.
-
Position: Indicates whether the sampling position was given in the original paper or not. There are five modalities : ‘Original’ = the GPS coordinates were used. ‘Representative’ = rough estimation. ‘Estimated’ = estimated (+/- 3 km). ‘Barycenter’ = several samplings were done very close to one another (<3 km) and the barycenter of all the points done the same day on the same algae have been used as GPS coordinates for the sampling event. ‘Unknow’ when a too broad area was given by the authors.
-
Seaweed_section: Tells where on the macroalgae the epibionts were collected/observed. There are five possibilities: ‘Blade’ when epibionts were collected above the stipe. ‘All’ when the full macroalgae was removed to then list the epibionts. ‘Holdfast’ when the epibiont was observed on the holdfast. ‘Thallus’ which correspond to blade + stipe.
-
Dominance: Estimate the relative abundance of an epibiont in the sampling. ‘3’ = dominant, ‘2’ = subdominant, ‘1’ = present. ‘1.1’ means the epibionts were present but the author just gave presence/absence information and it was not possible to deduce information on dominance. ‘0’ when the epibiont was mentionned by the author but not recognized in WoRMS.
-
Collection_month: This column specifies at which month the samples have been collected. (Many authors group together the sampling list of a same season.)
-
Farmed_Wild_Artificial: Tells in which context the algae grew. “Farmed” indicates that the algae supporting the epibionts were in a seaweed farm. “Wild” indicates that the sampling has been done on wild algae. “Artificial” means that the author has moved the algae from their native position to another one or that they have used algae on ropes for their study. “NA” when unsufficient information were provided by authors.
-
Hydro: Indicates the level of exposure to the waves. There are 3 levels “Exposed”>”Moderate”>”Sheltered”. This level is based on the textual information provided by each paper, NA means no information was available, “All” means a single list of epibionts was given for several sampling, close to one another but with different exposition to waves.
-
Depth_min: Is the minimum depth of epibionts collection, in m.
-
Depth_max: Is the maximum depth of epibionts collection, in m.
-
A_Phylum: Is the phylum of the algae on which epibionts were collected, according to WoRMS.
-
A_Class: Is the class of the algae on which epibionts were collected, according to WoRMS.
-
A_Order: Is the order of the algae on which epibionts were collected, according to WoRMS.
-
A_Family: Is the family of the algae on which epibionts were collected, according to WoRMS.
-
A_Genus: Is the genus of the algae on which epibionts were collected, according to WoRMS.
-
Algae_original: Is the original name of the algae, as provided in the paper.
-
Year_publication: Year in which the source paper was published.
-
Ocean: Ocean in which data were collected.
-
Sea: Sea in which data were collected.
-
AphiaID_Algae: AphiaID of the algae, assigned by WoRMS.
-
Location: Location of the study.
-
Title: Title of the source paper.
-
Epibionts_original: epibiont name as provided in the source paper.
-
Epibionts_sc: Epibiont species name as assigned by WoRMS.
-
AlphiaID_Epibionts: AlfiaID of the epibionts, assigned by WorMS.
-
E_Phylum: Is the phylum of the epibionts, according to WoRMS.
-
E_Class: Is the class of the epibionts, according to WoRMS.
-
E_Order: Is the order of the epibionts, according to WoRMS.
-
E_Family: Is the family of the epibionts, according to WoRMS.
-
E_Genus: Is the genus of the epibionts, according to WoRMS.
-
original_name: epibiont name as provided in the source paper.
-
Sampling: ID of the sampling event, each ‘sampling event’ is a unique combination of 3 parameters : season of collect/observation of epibionts + location + algae species on which epibionts were collected/observed. The ID code is in 2 parts, first : the number identifies the publication data come from, then an underscore, then a letter to differentiate different sampling from a same paper.
-
Group: Algae group used for our study. The modalities are : Rhodophyta, Chlorophyta, Laminariales and the other Ochrophyta.
File: env_bibli_5000.csv
Description: List of the available information (in column): Each line is an observation of an epibiont species on a macroalgae.
Variables
-
X: Epibionts observation number (from 1 to 5793)
-
Sampling: ID of the sampling event, each ‘sampling event’ is a unique combination of 3 parameters : season of collect/observation of epibionts + location + algae species on which epibionts were collected/observed. The ID code is in 2 parts, first : the number identifies the publication data come from, then an underscore, then a letter to differentiate different sampling from a same paper.
-
buffer: Radius (in m) of the area in which the environmental values were averaged. The circle is centered on the GPS position given in column x and y.
-
chloro_weighted: Chlorophyll-a mean value in mg.m -3 for each sampling point, weighted by the surface of each pixel included in the buffer and over the time period : 01-2003 to 12-2019. Data were extracted from NASA OBPG: AQUA MODIS Level-3 Binned Chlorophyll, Version 2022 10.5067/AQUA/MODIS/L3M/CHL/2022
-
temp_weighted: Mean Sea Surface Temperature (SST) at nighttime in °C, weighted by the surface of each pixel included in the buffer and over the time period : 01-2003 to 12-2019. Data were extracted from NASA OBPG: MODIS_AQUA_L3_SST_MID-IR_DAILY_4KM_NIGHTTIME_V2019.0 10.5067/MODAM-1D4N9
-
salinity_weighted: Salinity value in psu for each sampling point, weighted by the surface of each pixel included in the buffer and over the time period : 01-1993 to 12-2019. Data were extracted from Copernicus Marine data center: GLOBAL_MULTIYEAR_PHY_001_030 10.48670/MOI-00021
-
wave_height25: Wave height value in m for each sampling point, weighted by the surface of each pixel included in the buffer and over the time period : 01-1993 to 12-2019. Data were extracted from Copernicus Marine data center: GLOBAL_MULTIYEAR_WAV_001_032 10.48670/MOI-00022
-
Publication_N: ID of each paper, except for ID 55 which represents our own observation. If you want more information about how data were collected, you can check at 10.1111/raq.70017
-
Algae_sc: Algae species (or the smallest level of identification provided) on which the epibionts were observed.
-
Collection_season: Season at which samples were taken, taking into account for northern and southern hemisphere. Several modalities are available : Spring, Summer, Autumn/Fall, Winter or All. Authors may have pulled the list of observations from different season into one list, in those cases, Collection_season may contain a list of several seasons like “Winter, Spring”, class is still ‘character’.
-
Year_sample: Year of sampling (or estimated year of sampling when the exact year was not provided, year of sampling were estimated using : year of publication - 3 years). When a single list of epibionts was given for several sampling across different years, we retained the median of those years, rounded down to the nearest whole number. Those estimations have little influences as the environmental values are averaged on all the period of available environmental data.
-
Latitude: Latitude of the sampling point, in decimal degrees, some points were estimated, please see column ‘Position’ and 10.1111/raq.70017.
-
Longitude: Longitude of the sampling point, in decimal degrees, some points were estimated, please see column ‘Position’ and 10.1111/raq.70017.
-
Position: Indicates whether the sampling position was given in the original paper or not. There are five modalities : ‘Original’ = the GPS coordinates were used. ‘Representative’ = rough estimation. ‘Estimated’ = estimated (+/- 3 km). ‘Barycenter’ = several samplings were done very close to one another (<3 km) and the barycenter of all the points done the same day on the same algae have been used as GPS coordinates for the sampling event. ‘Unknow’ when a too broad area was given by the authors.
-
Seaweed_section: Tells where on the macroalgae the epibionts were collected/observed. There are five possibilities: ‘Blade’ when epibionts were collected above the stipe. ‘All’ when the full macroalgae was removed to then list the epibionts. ‘Holdfast’ when the epibiont was observed on the holdfast. ‘Thallus’ which correspond to blade + stipe.
-
Dominance: Estimate the relative abundance of an epibiont in the sampling. ‘3’ = dominant, ‘2’ = subdominant, ‘1’ = present. ‘1.1’ means the epibionts were present but the author just gave presence/absence information and it was not possible to deduce information on dominance. ‘0’ when the epibiont was mentionned by the author but not recognized in WoRMS.
-
Collection_month: This column specifies at which month the samples have been collected. (Many authors group together the sampling list of a same season.)
-
Farmed_Wild_Artificial: Tells in which context the algae grew. “Farmed” indicates that the algae supporting the epibionts were in a seaweed farm. “Wild” indicates that the sampling has been done on wild algae. “Artificial” means that the author has moved the algae from their native position to another one or that they have used algae on ropes for their study. “NA” when unsufficient information were provided by authors.
-
Hydro: Indicates the level of exposure to the waves. There are 3 levels “Exposed”>”Moderate”>”Sheltered”. This level is based on the textual information provided by each paper, NA means no information was available, “All” means a single list of epibionts was given for several sampling, close to one another but with different exposition to waves.
-
Depth_min: Is the minimum depth of epibionts collection, in m.
-
Depth_max: Is the maximum depth of epibionts collection, in m.
-
A_Phylum: Is the phylum of the algae on which epibionts were collected, according to WoRMS.
-
A_Class: Is the class of the algae on which epibionts were collected, according to WoRMS.
-
A_Order: Is the order of the algae on which epibionts were collected, according to WoRMS.
-
A_Family: Is the family of the algae on which epibionts were collected, according to WoRMS.
-
A_Genus: Is the genus of the algae on which epibionts were collected, according to WoRMS.
-
Algae_original: Is the original name of the algae, as provided in the paper.
-
Year_publication: Year in which the source paper was published.
-
Ocean: Ocean in which data were collected.
-
Sea: Sea in which data were collected.
-
AphiaID_Algae: AphiaID of the algae, assigned by WoRMS.
-
Location: Location of the study.
-
Title: Title of the source paper.
-
Epibionts_original: epibiont name as provided in the source paper.
-
Epibionts_sc: Epibiont species name as assigned by WoRMS.
-
AphiaID_Epibionts: AphiaID of the epibiont, assigned by WoRMS.
-
E_Phylum: Is the phylum of the epibionts, according to WoRMS.
-
E_Class: Is the class of the epibionts, according to WoRMS.
E_Order: Is the order of the epibionts, according to WoRMS.
E_Family: Is the family of the epibionts, according to WoRMS.
E_Genus: Is the genus of the epibionts, according to WoRMS.
original_name: epibiont name as provided in the source paper.
Group: Algae group used for our study. The modalities are : Rhodophyta, Chlorophyta, Laminariales and the other Ochrophyta.
Code/software
Delannoy_etal_Epibionts_review_2023_Glasgow_organized.Rproj
Script “Main_script_Kelps_Epibionts_Glasgow_organized”
Author : victoria.delannoy@protonmail.com (phd student, Marbec, France)
Supervisor : Dr Sofie Spatharis, senior lecturer at Glasgow University
Script is split in 3 phases :
####### PHASE A. Cleanning step of the original excel file in which were gathered all the pubications.
This step is proper to our data set and not directly relevant for the paper.
It contains : - corrections of typo : lines 18 to 194 and from 574 to 800
\- conversion of GPS format : lines 200 to 268
\- semi-authomatised addition of taxonomical informations from WoRMS (for algae
and epibionts) : lines 269 to 572
\- conversion from dominance of epibionts to presence/abscence
Thoses steps lead to the construction of the dataframe “Taxo.ok8” If you want to rerun analysis we advise to start at line 803 (PHASE B) from this file (Taxo.ok8).
####### PHASE B. Describe our dataset.
5.1 provides numbers such as : - number of paper, sampling, site, total richness observed at different taxonomical level.
5.2 contains FIGURE 1A (l 1133, samplings location colored per algae group).
6.1 contains FIGURE 1B (l 1221, nb of observation for each epibiont taxa among algae group (in absolut value and in percentage)).
7 represents the total number of observation for each taxa.
###### PHASE C. Species distribution model
This phase starts with steps to build a dataframe that contains both taxonomic information and environmental information at each sampling location. Then, there is one paragraphe per model mentionned in the publication.
- 8.1 to 8.5 extract the environnemental information around sampling location. 8.6 merge taxonomic data and environmental data. This section use dataset from Data/Environmental_data and the output merging taxonomic data and environmental data is in Data/env_bibli.
- steps 9 convert dominance to presence abscence (because around half the data only had presence-absence information)
- steps 10 run the glmm at different taxonomical scale
- steps 11 plot probability of distribution on european map
- steps 12 represent the probability of distribution according to the nature of the support : wild kelp vs not wild kelp
###### PHASE D. Short analysis of epibionts diversity per algae group