Data from: Abrupt alkalinization alters microbial diversity and promotes the proliferation of marine parasites in coastal microcosm experiments
Data files
May 14, 2026 version files 3.22 GB
-
oae_mesocosm_manuscript.zip
3.22 GB
-
README.md
21.32 KB
Abstract
Mitigation of anthropogenic climate interference will likely require the removal of legacy atmospheric carbon dioxide (CO2). Ocean alkalinity enhancement (OAE) is an abiotic marine carbon dioxide removal (mCDR) approach that accelerates the natural Earth process of rock weathering, but its effects on marine ecosystems remain uncertain. Here we used outdoor microcosm experiments to investigate the effects of abrupt limestone-inspired and NaOH alkalinity additions of ~750 umol kg-1, reflecting model-predicted OAE scenarios that produce severe localized impacts (e.g., large variations in pH and W). We assess the response of seasonal marine microbial communities (phytoplankton, bacteria) and viruses from the Santa Barbara Channel, analyzed by high-throughput amplicon sequencing and flow cytometry. Alkalinization, particularly under low-nutrient conditions, altered microbial diversity and promoted the proliferation of parasites (Syndiniales), suggesting that abrupt alkalinization could alter marine ecosystem composition, and potentially its function, near coastal alkalinity deployment “hotspots”. We highlight the need for rigorous environmental risk assessments prior to implementation of OAE technologies.
This repository contains the data and code supporting the results presented in:
"Abrupt alkalinization alters microbial diversity and promotes the proliferation of marine parasites in coastal microcosm experiments."
Notes:
- The structure of this repository is designed to function as an R project. R markdown files and scripts are annotated internally.
- Figure images have been omitted from this repository as they appear in the associated manuscript, which is published under a CC BY license.
- "Mesocosm" was changed to "microcosm" for the published manuscript, but "mesocosm" is retained in all R markdown files and scripts to avoid code breaks.
Overview of Repository Contents (oae_mesocosm_manuscript.zip)
oae_mesocosm_manuscript/
├── data/
│ ├── all_18S_seq_data/ # 18S rDNA amplicon sequencing data and processed outputs
│ ├── all_media_prep/ # Preparation of alkalinity stock solutions and spiking of filtered seawater for both experiments
│ ├── apr23_mesocosm/ # Spring 2023 mesocosm experiment data
│ ├── metadata/ # Sample metadata for both experiments
│ ├── nov23_mesocosm/ # Fall 2023 mesocosm experiment data
│ ├── sst_map/ # Sea surface temperature (SST) and coastal map data
│ └── upwelling_indices/ # Biologically Effective Upwelling Transport Index (BEUTI) time series data files
├── src/
│ ├── dna_seq/ # R Markdown files and scripts for 18S amplicon analysis and outputs
│ ├── dsi_experiment/ # R Markdown files for post hoc dissolved silicic acid (DSi) experiment
│ ├── mesocosm_combined/ # R Markdown files for producing physicochemical and FCM outputs
│ └── R/ # R helper scripts sourced by R Markdown files
Data Files
data/all_18S_seq_data/
This folder contains raw FASTQ files and processed outputs from 18S rDNA amplicon sequencing.
FASTQ files (fastq_files/spring_m2/ and fastq_files/fall_m3/)
Raw paired-end Illumina sequencing reads. Files follow the naming convention:
{EXPERIMENT}_{TREATMENT}[_BLK]_{TIMEPOINT}_{REPLICATE}_{SAMPLE_NUMBER}_L001_R{READ}_001.fastq
Example: M2_NAOH_T3_B_S7_L001_R1_001.fastq
| Field | Description |
|---|---|
EXPERIMENT |
M2 (spring) or M3 (fall); also NEG_CONT or NEG_CONTROL for negative controls and AA1 for mock communities |
TREATMENT |
CTRL, NAOH, or CACO3 |
_BLK |
Present if the sample is from a filter-sterilized carboy (i.e., 0.22 um filtered seawater) |
TIMEPOINT |
T0, T3, T5, or T9 (days since treatment) |
REPLICATE |
Triplicate carboy A, B, or C |
SAMPLE_NUMBER |
Illumina sample index (e.g., S7) |
L001 |
Lane number |
R1/R2 |
Forward or reverse read |
OAE_all18s_M2M3_Sep2024_sample_data.xlsx / .csv
Sample sheet linking FASTQ file names to metadata fields from oae_mesocosm_2.xlsx/csv and oae_mesocosm_3.xlsx/csv.
seqtab.rds
R data file (RDS format). Amplicon sequence variant (ASV) count table output from DADA2.
seqtab_taxfiltered_20241015.rds
R data file. ASV count table after removal of non-target taxa.
eTax_w_troph_20241015.rds / .csv
Taxonomy table with trophic mode assigned to each ASV.
no_troph_20241015.csv
Subset of ASVs that could not be assigned a trophic mode.
meso_combined.csv
Table combining sample data for the spring and fall mesocosm experiments (e.g., carbonate chemistry, nutrients) to be used with ASVs.
mock_community_info_wManualBlast_20230124.csv
Mock community ASVs and relative abundances.
data/all_media_prep/
oae_mesocosm_media.xlsx
Preparation of 1 M alkalinity stock solutions for both experiments. Seawater collection coordinates are also noted here. Metadata for this file is noted on the third excel tab.
data/metadata/
oae_mesocosm_metadata.xlsx
Sample metadata table for both mesocosm experiment files (oae_mesocosm_2.csv/xlsx and oae_mesocosm_3.csv/xlsx). Columns include: Variable, Unit, and Description.
data/apr23_mesocosm/ — Spring 2023 Experiment (M2)
oae_mesocosm_2.xlsx / .csv
Main data table for the spring mesocosm experiment. Each row is one sample. See /data/metadata/oae_mesocosm_metadata.xlsx for variable information. Units are also included in column labels (e.g., din_umol_l = dissolved inorganic nitrogen [µmol/L]).
dsi_experiment/oae_dsi_raw.xlsx / .csv
Raw data from the post hoc DSi addition experiment conducted after the spring mesocosm experiment (only conducted in the spring). Variables include sampling_date (date of collection), date label (tube label), bottle_type (type of bottle that solution was stored in; i.e., borosilicate, PP, or PC), treatment (alkalinity treatment), solution_starting_mass_g (starting solution mass), solute_concentration_mol_l (molar concentration of solution), hcl_umol_l (molar concentration of acid solution used to neutralize alkalinity stock solution prior to analysis), dilution_factor (dilution factor resulting from HCl addition), dsi_umol_l (DSi concentration), dsi_umol_l_corr (DSi concentrations adjusted for instrument method detection limits), dsi_umol_l_final (final DSi accounting for dilution factor), sample_collector (collecting researcher).
fcm_bigelow/apr23_fc_phyto.xlsx / .csv
Flow cytometry counts for phytoplankton populations (spring experiment). Variables include sampling_date (sample collection date), FCM file (file generated by FCM), tube number (cryovial tube number), sample_label (sample label), type (treatment), Flow rate (µl/min), Acq time (acquisition time; min), dilution (dilution factor), and cell abundance columns for distinct populations (e.g., Crypto counts, Nanoeuk1 counts ; units = cells mL⁻¹).
fcm_bigelow/apr23_fc_virbact.xlsx / .csv
Flow cytometry counts for virus and bacteria populations (spring experiment). Variables include sampling_date, FCM file, tube number, sample_label, type, Flow rate (µl/min), Acq time (min), dilution, cell abundance counts for bacteria (cells mL⁻¹), and virus-like particle counts (particles mL⁻¹).
fcm_bigelow/apr23_spring_bfc_sample_list_updated_FCM results_01072025.xlsx
Raw sample list and results file as received from Bigelow Laboratory flow cytometry facility.
icp_oes/oae_m2_icp_oes.xlsx / .csv
Concentrations of Na, Ca, Mg, Si, and P (all in ppm) as measured by ICP-OES. Metadata included on third excel tab (this metadata applies to icp_oes/oae_m3_icp_oes.xlsx as well).
irradiance/m2_irradiance.xlsx / .csv
Photosynthetically active radiation (PAR) measurements within incubation tank. Columns: Record, Date, Time_pt, INPUT1, INPUT2 (scalar LI-COR sensor 1 and 2; µmol photons m⁻² s⁻¹).
temp/oae_m2_temp.xlsx / .csv
Continuous temperature logger data for spring mesocosm incubation tank. Variables: Date Time, PT (date and time of measurement; Pacific Time), Temp, C (temperature; Celsius).
data/nov23_mesocosm/ — Fall 2023 Experiment (Mesocosm M3)
Structure and variable names mirror the spring experiment (see above). Files are:
oae_mesocosm_3.xlsx/.csv— Main data table (same variables asoae_mesocosm_2).fcm_bigelow/nov23_fc_phyto.xlsx/.csv— Flow cytometry phytoplankton counts.fcm_bigelow/nov23_fc_virbact.xlsx/.csv— Flow cytometry bacteria/virus-like particle counts.fcm_bigelow/nov23_fall_Bigelow_FC_RNA_sample_list_FCM analyses_01072025.xlsx— Raw Bigelow facility file.icp_oes/oae_m3_icp_oes.xlsx/.csv— ICP-OES data.irradiance/m3_irradiance.xlsm/.csv— PAR data.temp/oae_m3_temp.xlsx/.csv— Continuous temperature data.
data/sst_map/
Contains SST data and a folder (gshhg_data) for the coastline file used to produce SST maps in oae_mesocosm_sbc_sst.Rmd. Users will need to download GSHHG coastlines, rivers, and borders in native binary format (gshhs_f.b file) from https://www.soest.hawaii.edu/pwessel/gshhg/index.html (last accessed 20260512) to reproduce the published maps.
data/upwelling_indices/
Folder for monthly BEUTI csv files called within oae_mesocosm_pt1.Rmd. "BEUTI provides estimates of vertical nitrate flux near the coast (i.e., the amount of nitrate upwelled/downwelled), which may be more relevant than upwelling strength when considering some biological responses" (see Jacox MG, CA Edwards, EL Hazen, SJ Bograd (2018) Coastal upwelling revisited: Ekman, Bakun, and improved upwelling indices for the U.S. west coast, J. Geophysical Research, 123(10), 7332-7350).
The monthly BEUTI can be downloaded at https://mjacox.com/upwelling-indices/ (last accessed 20260512).
Two files are called within the R markdown file (not included in repository):
BEUTI_monthly.csv
Monthly BEUTI for the California Current System (2023).
BEUTI_monthly_1988-2022.csv
Extended historical BEUTI time series (1988–2022).
Note: Users will need to separate the downloaded monthly BEUTI data into individual files (i.e., 2023 and 1988-2022) or adjust the code to wrangle the data from a single file.
Code (R Markdown Files and Scripts)
R Markdown files must be run in order within each subfolder (pt1 → pt2 → pt3...) because intermediate outputs are saved and read by subsequent files. The exception is oae_mesocosm_sbc_sst.Rmd, which is standalone.
src/dna_seq/ — 18S Amplicon Sequencing Analysis
Each R Markdown file is annotated internally. Run in this order:
| File | Description |
|---|---|
pt1_pre_processing.Rmd |
Create phyloseq object, taxonomic filtering for 18S community composition analysis |
pt2_spring_mesocosm.Rmd |
Community composition analysis for the spring (M2) experiment: relative abundance (RA) plots, CLR transformation, PERMANOVA, PERMDISP, and differential abundance testing. |
pt3_fall_mesocosm.Rmd |
Same analyses as pt2 applied to the fall (M3) experiment. |
pt4_dna_combined.Rmd |
Produces combined figures and tables reported in the main manuscript. |
figures/ — Output folder for figures created in R Markdown files (figures not included; see manuscript).
tables/ — Output folder for tables created in R Markdown files (tables not included; see manuscript).
taxonomy/dada2/ — DADA2 pipeline scripts:
| File | Description |
|---|---|
dada2_ML_infer_seq_variants_Nov2018_dc.R |
ASV inference, dereplication, merging of paired reads for each library |
dada2_ML_filtering_Nov2018_dc.R |
Quality trimming, filtering raw MiSeq reads |
dada2_ML_remove_chimera_Jan2019.R |
Chimera removal |
dada2_ML_inspect_Qscores_Nov2018.R |
Quality score inspection |
taxonomy/eri_server_scripts/ — Scripts run on an external computing cluster for taxonomic assignment:
| File | Description |
|---|---|
dada2_bayes_tax_silva.R |
Bayesian classifier against SILVA database |
dada2_bayes_tax_pr2.R |
Bayesian classifier against PR2 database |
taxonomy/other_scripts/ — Scripts used for creating a taxonomic ensemble:
| File | Description |
|---|---|
idtaxa_pr2.R |
IDTAXA classification against PR2 and SILVA |
ensemble_tax_filter_phytoid.R |
Ensemble taxonomy assignment and filtering, phytoid/trophic group mapping |
make_fasta_4_blast.R |
Converts ASV table to FASTA format for BLAST |
mock_chex.R |
Mock community quality check |
src/mesocosm_combined/ — Physicochemical and Flow Cytometry Analysis
Each R Markdown file is annotated internally. Run in this order:
| File | Description |
|---|---|
oae_mesocosm_pt1.Rmd |
Data import, cleaning, and merging of carbonate chemistry and nutrient data for both experiments; irradiance, temperature, and BEUTI plots. |
oae_mesocosm_pt2_carb_chem.Rmd |
Carbonate chemistry visualization and analysis. |
oae_mesocosm_pt3_nutrients.Rmd |
Nutrient (e.g., DIN, DIP, DSi, POC, PON) visualization and statistical comparisons across treatments. |
oae_mesocosm_pt4_fcm_spring.Rmd |
Flow cytometry analysis for spring experiment (phytoplankton, bacteria, virus-like particles). |
oae_mesocosm_pt5_fcm_fall.Rmd |
Flow cytometry analysis for fall experiment. |
oae_mesocosm_sbc_sst.Rmd |
Standalone. Uses Santa Barbara Channel SST data and GSHHG coastline shapefiles. |
figures/ — Output folder for figures created in R Markdown files (figures not included; see manuscript).
tables/ — Output folder for tables created in R Markdown files (tables not included; see manuscript).
src/dsi_experiment/ — Post Hoc DSi Experiment
| File | Description |
|---|---|
dsi_exp_test.Rmd |
Analysis of the post-hoc DSi leaching experiment. |
figures/ — Output folder for figures created in R Markdown file (figures not included; see manuscript).
src/R/ — Helper Scripts
These R scripts are sourced within the R Markdown files and do not need to be run independently (each is annotated internally):
| File | Description |
|---|---|
utils.R |
Shared utility functions (e.g., data formatting, custom ggplot themes, functions for descriptive statistics) |
kw_dunn_test.R |
Kruskal-Wallis and post hoc Dunn test wrapper function |
carb_chem_plots.R |
Carbonate chemistry time series plots |
carb_chem_plots_ab.R |
Carbonate chemistry time series plots for filter-sterilized carboys |
carb_chem_differences.R |
Computes treatment differences in carbonate chemistry |
nutrient_plots.R |
Nutrient time series plots |
nutrient_differences.R |
Compute treatment differences in nutrients |
m2_fc_medmad_plots.R |
Median ± MAD flow cytometry time series plots for spring experiment |
m3_fc_medmad_plots.R |
Median ± MAD flow cytometry time series plots for fall experiment |
pom_plots_medmad.R |
Median ± MAD time series plots for particulate organic matter (POC/PON), BSi, etc. |
omega_ca_correction_factor.R |
Correction factor calculation for calcite and aragonite saturation states |
Acronyms
A reference list of acronyms used in this repository.
Approaches & Materials
| Acronym | Definition |
|---|---|
| LI | Limestone-Inspired |
| mCDR | Marine Carbon Dioxide Removal |
| PC | Polycarbonate |
| PP | Polypropylene |
| PS | Polystyrene |
| OAE | Ocean Alkalinity Enhancement |
Biological & Ecological
| Acronym | Definition |
|---|---|
| ASV | Amplicon Sequence Variant |
| FCM | Flow Cytometry (sometimes noted as FC) |
| VLP | Virion-Like Particle |
Chemical & Biogeochemical
| Acronym | Definition |
|---|---|
| BSi | Biogenic Silica |
| DIC | Dissolved Inorganic Carbon |
| DIN | Dissolved Inorganic Nitrogen |
| DIP | Dissolved Inorganic Phosphate |
| DSi | Dissolved Silicic Acid |
| pCO2 | Partial Pressure of CO2 |
| PIC | Particulate Inorganic Carbon |
| POC | Particulate Organic Carbon |
| PON | Particulate Organic Nitrogen |
| TA | Total Alkalinity |
Statistical
| Acronym | Definition |
|---|---|
| clr | Centered Log-Ratio |
| MAD | Median Absolute Deviation |
| MDL | Method Detection Limit |
| PERMANOVA | Permutational Multivariate Analysis of Variance |
| PERMDISP | Permutational Analysis of Multivariate Dispersion |
| RDA | Redundancy Analysis |
| SD | Standard Deviation |
Physical & Oceanographic
| Acronym | Definition |
|---|---|
| BEUTI | Biologically Effective Upwelling Transport Index |
| PAR | Photosynthetically Active Radiation |
| SBC | Santa Barbara Channel |
| SST | Sea Surface Temperature |
Databases & Software
| Acronym | Definition |
|---|---|
| DADA2 | Divisive Amplicon Denoising Algorithm 2 |
| PR2 | Protist Ribosomal Reference database |
| SILVA | Ribosomal RNA gene database (SILVA) |
Instrumentation & Organizations
| Acronym | Definition |
|---|---|
| GPS | Global Positioning System |
| GSHHG | Global Self-Consistent Hierarchical High-Resolution Geography |
| ICP-OES | Inductively Coupled Plasma Optical Emission Spectrometry |
| MODIS | Moderate Resolution Imaging Spectroradiometer |
| NASA | National Aeronautics and Space Administration |
| NOAA | National Oceanic and Atmospheric Administration |
