Code and data for "Effects of phytoplankton species distribution on particulate organic carbon dynamics along a coastal gradient"
Data files
Jul 07, 2025 version files 141.74 MB
-
code_data_submission.zip
141.73 MB
-
README.md
5.86 KB
Abstract
Phytoplankton communities affect carbon dynamics worldwide, strongly influencing the quality and quantity of organic carbon in coastal ecosystems. Yet, we still know little about the impacts of changing phytoplankton community composition on the potential carbon pathways in estuaries and coasts. Here, we sampled 25 sites along a coastal salinity and nutrient gradient, collecting water for water chemistry and phytoplankton for community composition analyses. For each site, we determined phytoplankton taxonomic diversity and used Bayesian joint species distribution models considering species interactions, taxonomic relatedness, and traits to identify key environmental factors driving phytoplankton community composition. Subsequently, we used structural equation modelling to establish direct and indirect links between the identified key environmental factors, taxonomic diversity (richness and evenness), and particulate organic carbon (POC). We found that the phytoplankton distribution along the estuarine gradient was mainly driven by changes in salinity. Increasing salinity (ranging between 0.8 - 6.4) benefited motile species and reduced the phytoplankton richness, which resulted in a decrease in POC concentration. This indirect effect of salinity on POC was stronger than a direct one, highlighting the mediating role of phytoplankton richness. This emphasizes the importance of diversity in regulating coastal biogeochemical processes and suggests that future changes in salinity might shift coastal carbon dynamics due to changes in phytoplankton community composition.
https://doi.org/10.5061/dryad.kd51c5bjf
Description of the data and file structure
We conducted a field sampling campaign comprising 25 sampling sites along a coastal salinity and nutrient gradient, collecting water for water chemistry and phytoplankton for community composition analyses. For each site, we determined taxonomic diversity (based on genera) and used joint species distribution models to identify key environmental factors driving phytoplankton community composition. Subsequently, we used structural equation modelling to establish direct and indirect links between the identified key environmental factors, taxonomic diversity, and particulate organic carbon (POC).
Files and variables
File: code_data_submission.zip
Combined R Studio project repository, containing raw community data (folder data_community
), processed environmental and community data files (folder data_processed
), HMSC model evaluation step outputs (folder model_evals
), HMSC model files (folder models
), collected R scripts for the analysis (R
) and source file containing custom R functions (source
).
Code/software
R Studio to open the project file, and CRAN R.
Contents - scripts (folder R
)
S1_define_joint_model.R
- Defines HMSC models for the analysis. This involves importing pre-processed files in folder
data_community
anddata_processed
,
doing quick model runs to check for initial errors, and exporting the unfitted model as well as further processed data files.
- Defines HMSC models for the analysis. This involves importing pre-processed files in folder
S2a_init_models_HPC.R
- Imports the unfitted HMSC model and creates jsons for HMSC-HPC model runs on HPC.
S2b_import_posteriors_HPC.R
- Imports and combines posterior estimates derived from the HPC steps.
S3_evaluate_MCMC_convergence.R
- Checks the MCMC convergence of the fitted HMSC models.
S4a_make_kfold_partitions_HPC.R
- Partitions data into folds and creates jsons for HMSC-HPC model runs on HPC, using partitioned data, for model evaluation.
S4b_process_kfold_posteriors_HPC.R
- Imports and combines posterior estimates derived from the HPC steps, using partitioned data, for model evaluation. Writes model evaluation files.
S5_evaluate_model_fit.R
- Imports and plots model evaluation results.
S6_explore_parameter_estimates.R
- Imports fitted HMSC models and creates a number of results figures based on these models.
S7_fit_SEM.R
- Fits piecewise structural equation models.
Contents - scripts (folder source
)
misc_functions.R
- Collection of custom R functions used in different stages of the analysis.
Contents - data (folder data_community
)
Spatial_Phyto_Carbon_pg_L.csv
- Raw phytoplankton community data at a given site in unit pg/L (columns
Exposed_1
toExtra_4
) alongside taxonomy (columnsPhylum
toSpecies
).
- Raw phytoplankton community data at a given site in unit pg/L (columns
Contents - data (folder data_processed
)
carbonmass_genus_processed.csv
- Raw phytoplankton community data at a given site (rownames) in unit pg/L, for every genus (columns
Achnanthes
toThalassiosira
) in the study.
- Raw phytoplankton community data at a given site (rownames) in unit pg/L, for every genus (columns
env_processed.csv
- Contextual environmental data to the phytoplankton samples.
site_ID
: Name given to the site.x
: x-coordinates in meter units (EPSG:3067).y
: y-coordinates in meter units (EPSG:3067).date
: Date of sampling in day/month/year format.map_Id
: Running numeric ID of each site.Salinity
: Measured salinity in psu.DIN_mmol_L
: Measured dissolved inorganic nitrogen in mmol/L.DIP_mmol_L
: Measured dissolved inorganic phosphorus in mmol/L.DIN_DIP_molmol
: Ratio ofDIN_mmol_L
toDIP_mmol_L
.POC_mmol_L
: Measured particulate organic carbon in mmol/L.windspeed_max
: Derived maximum windspeed of the three days prior to sampling.inflow
: River inflow in cubic meters per second, measured at the mouth of the river Karjaanjoki.
- Contextual environmental data to the phytoplankton samples.
taxonomy_labels.csv
- Taxonomic structure used to group and label genera in resutls figures. Columns
Phylum
toGenus
provide taxonomic levels, columnmanuscript
provides the grouping variable used in results plots.
- Taxonomic structure used to group and label genera in resutls figures. Columns
traits_final.csv
- Trait data for each genus in the HMSC analysis.
Genus
: The genus.motility
: 1 if motile genus, 0 if non-motile.chain
: 1 if chain-forming genus, 0 if non-chain-forming.log_biovol_cell
: Log-transformed cell biovolume.
- Trait data for each genus in the HMSC analysis.
YData.csv
- Same as
carbonmass_genus_processed.csv
, but with rare taxa removed.
- Same as
Contents - data (folder model_evals
)
- Model evaluation files created with script
S4b_process_kfold_posteriors_HPC.R
. Each response data type in Hmsc models has its own model evaluation file.spatCampaign_2023_eval_files_25folds_genus_abundance_twoCovars_4chains_1000samples_800thin.RData
: Evaluation file based on HMSC model using abundance (i.e. carbon biomass) data.spatCampaign_2023_eval_files_25folds_genus_COP_twoCovars_4chains_1000samples_800thin.RData
: Evaluation file based on HMSC model using abundance (i.e. carbon biomass) data conditional on presence.spatCampaign_2023_eval_files_25folds_genus_PA_twoCovars_4chains_1000samples_800thin.RData
: Evaluation file based on HMSC model using presence/absence data.
Contents - data (folder models
)
- Hmsc model files.
spatCampaign_2023_genus_PA_abundance_COP_twoCovars_unfitted.RData
: Unfitted HMSC model object.spatCampaign_2023_genus_PA_abundance_COP_twoCovars_HPC_TFfitted_chains4_samples1000_thin800_800.RData
: Fitted HMSC model object.
Instructions
- Execute the scripts in numerical order for a complete workflow.
- Modify the paths and parameters as needed for your specific dataset and analysis.