Code and data for "Effects of phytoplankton species distribution on particulate organic carbon dynamics along a coastal gradient"

Wiethase, Joris 1 ; Uth, Catharina1; Stoffers, Tjardo1; Asmala, Eero1 2; Lewandowska, Aleksandra1

Published Jul 07, 2025 on Dryad. https://doi.org/10.5061/dryad.kd51c5bjf

Data files

Jul 07, 2025 version files 141.74 MB

code_data_submission.zip
141.73 MB
README.md
5.86 KB

Abstract

Phytoplankton communities affect carbon dynamics worldwide, strongly influencing the quality and quantity of organic carbon in coastal ecosystems. Yet, we still know little about the impacts of changing phytoplankton community composition on the potential carbon pathways in estuaries and coasts. Here, we sampled 25 sites along a coastal salinity and nutrient gradient, collecting water for water chemistry and phytoplankton for community composition analyses. For each site, we determined phytoplankton taxonomic diversity and used Bayesian joint species distribution models considering species interactions, taxonomic relatedness, and traits to identify key environmental factors driving phytoplankton community composition. Subsequently, we used structural equation modelling to establish direct and indirect links between the identified key environmental factors, taxonomic diversity (richness and evenness), and particulate organic carbon (POC). We found that the phytoplankton distribution along the estuarine gradient was mainly driven by changes in salinity. Increasing salinity (ranging between 0.8 - 6.4) benefited motile species and reduced the phytoplankton richness, which resulted in a decrease in POC concentration. This indirect effect of salinity on POC was stronger than a direct one, highlighting the mediating role of phytoplankton richness. This emphasizes the importance of diversity in regulating coastal biogeochemical processes and suggests that future changes in salinity might shift coastal carbon dynamics due to changes in phytoplankton community composition.

https://doi.org/10.5061/dryad.kd51c5bjf

Description of the data and file structure

We conducted a field sampling campaign comprising 25 sampling sites along a coastal salinity and nutrient gradient, collecting water for water chemistry and phytoplankton for community composition analyses. For each site, we determined taxonomic diversity (based on genera) and used joint species distribution models to identify key environmental factors driving phytoplankton community composition. Subsequently, we used structural equation modelling to establish direct and indirect links between the identified key environmental factors, taxonomic diversity, and particulate organic carbon (POC).

Files and variables

File: code_data_submission.zip

Combined R Studio project repository, containing raw community data (folder data_community), processed environmental and community data files (folder data_processed), HMSC model evaluation step outputs (folder model_evals), HMSC model files (folder models), collected R scripts for the analysis (R) and source file containing custom R functions (source).

Code/software

R Studio to open the project file, and CRAN R.

Contents - scripts (folder `R`)

S1_define_joint_model.R
- Defines HMSC models for the analysis. This involves importing pre-processed files in folder data_community and data_processed,
  doing quick model runs to check for initial errors, and exporting the unfitted model as well as further processed data files.
S2a_init_models_HPC.R
- Imports the unfitted HMSC model and creates jsons for HMSC-HPC model runs on HPC.
S2b_import_posteriors_HPC.R
- Imports and combines posterior estimates derived from the HPC steps.
S3_evaluate_MCMC_convergence.R
- Checks the MCMC convergence of the fitted HMSC models.
S4a_make_kfold_partitions_HPC.R
- Partitions data into folds and creates jsons for HMSC-HPC model runs on HPC, using partitioned data, for model evaluation.
S4b_process_kfold_posteriors_HPC.R
- Imports and combines posterior estimates derived from the HPC steps, using partitioned data, for model evaluation. Writes model evaluation files.
S5_evaluate_model_fit.R
- Imports and plots model evaluation results.
S6_explore_parameter_estimates.R
- Imports fitted HMSC models and creates a number of results figures based on these models.
S7_fit_SEM.R
- Fits piecewise structural equation models.

Contents - scripts (folder `source`)

misc_functions.R
- Collection of custom R functions used in different stages of the analysis.

Contents - data (folder `data_community`)

Spatial_Phyto_Carbon_pg_L.csv
- Raw phytoplankton community data at a given site in unit pg/L (columns Exposed_1 to Extra_4) alongside taxonomy (columns Phylum to Species).

Contents - data (folder `data_processed`)

carbonmass_genus_processed.csv
- Raw phytoplankton community data at a given site (rownames) in unit pg/L, for every genus (columns Achnanthes to Thalassiosira) in the study.
env_processed.csv
- Contextual environmental data to the phytoplankton samples. site_ID: Name given to the site. x: x-coordinates in meter units (EPSG:3067). y: y-coordinates in meter units (EPSG:3067). date: Date of sampling in day/month/year format. map_Id: Running numeric ID of each site. Salinity: Measured salinity in psu. DIN_mmol_L: Measured dissolved inorganic nitrogen in mmol/L. DIP_mmol_L: Measured dissolved inorganic phosphorus in mmol/L. DIN_DIP_molmol: Ratio of DIN_mmol_L to DIP_mmol_L. POC_mmol_L: Measured particulate organic carbon in mmol/L. windspeed_max: Derived maximum windspeed of the three days prior to sampling. inflow: River inflow in cubic meters per second, measured at the mouth of the river Karjaanjoki.
taxonomy_labels.csv
- Taxonomic structure used to group and label genera in resutls figures. Columns Phylum to Genus provide taxonomic levels, column manuscript provides the grouping variable used in results plots.
traits_final.csv
- Trait data for each genus in the HMSC analysis. Genus: The genus. motility: 1 if motile genus, 0 if non-motile. chain: 1 if chain-forming genus, 0 if non-chain-forming. log_biovol_cell: Log-transformed cell biovolume.
YData.csv
- Same as carbonmass_genus_processed.csv, but with rare taxa removed.

Contents - data (folder `model_evals`)

Model evaluation files created with script S4b_process_kfold_posteriors_HPC.R. Each response data type in Hmsc models has its own model evaluation file. spatCampaign_2023_eval_files_25folds_genus_abundance_twoCovars_4chains_1000samples_800thin.RData: Evaluation file based on HMSC model using abundance (i.e. carbon biomass) data. spatCampaign_2023_eval_files_25folds_genus_COP_twoCovars_4chains_1000samples_800thin.RData: Evaluation file based on HMSC model using abundance (i.e. carbon biomass) data conditional on presence. spatCampaign_2023_eval_files_25folds_genus_PA_twoCovars_4chains_1000samples_800thin.RData: Evaluation file based on HMSC model using presence/absence data.

Contents - data (folder `models`)

Hmsc model files. spatCampaign_2023_genus_PA_abundance_COP_twoCovars_unfitted.RData: Unfitted HMSC model object. spatCampaign_2023_genus_PA_abundance_COP_twoCovars_HPC_TFfitted_chains4_samples1000_thin800_800.RData: Fitted HMSC model object.

Instructions

Execute the scripts in numerical order for a complete workflow.
Modify the paths and parameters as needed for your specific dataset and analysis.

Code and data for "Effects of phytoplankton species distribution on particulate organic carbon dynamics along a coastal gradient"

Data files

Abstract

README: Code and data for “Effects of phytoplankton species distribution on particulate organic carbon dynamics along a coastal gradient”

Description of the data and file structure

Files and variables

File: code_data_submission.zip

Code/software

Contents - scripts (folder R)

Contents - scripts (folder source)

Contents - data (folder data_community)

Contents - data (folder data_processed)

Contents - data (folder model_evals)

Contents - data (folder models)