Turf algae redefine the chemical landscape of temperate reefs, limiting kelp forest recovery
Data files
Apr 02, 2025 version files 13.88 GB
-
Algal_Biomass.csv
449 KB
-
Bioactive_Compounds_from_turf_extracts.csv
105.73 KB
-
Canopus_formula_summary.csv
19.33 MB
-
Canopus_Predictions.csv
20.25 MB
-
DOM_Annotations_318.csv
93 KB
-
DOM_Tissue_GNPS_quant.csv
74.61 MB
-
DOM_Tissue_meta.csv
79.32 KB
-
DOM_Tissue_MS_MS_Raw_Data.zip
13.56 GB
-
Formula_identifications.csv
4.26 MB
-
Kelp_Recruitment_Counts.csv
6.34 KB
-
Library_hits.csv
301.23 KB
-
README.md
35.29 KB
-
Settlement_Assay_All_Results.csv
12.77 KB
-
Turf_Cover.csv
3.71 KB
-
Turf_Extracts_MS_MS_Raw_Data.zip
202.40 MB
-
Unique_Library_hits.csv
487.53 KB
Abstract
In temperate regions experiencing rapid ocean warming, kelp forests are being replaced by chemically-rich turf algae. Yet, the extent to which these turf algae alter the surrounding chemical environment or impact the rebound potential of kelp forests via allelopathy remains unknown. Here, we used underwater visual surveys, comprehensive chemical profiling, and laboratory experiments to reveal that turf algae release bioactive compounds into the water that fundamentally alter the reef “chemical landscape” and directly suppress kelp recruitment. Our study, therefore, reveals that allelopathy is critical in shaping modern kelp forest ecosystems and their resilience. Further, it demonstrates that reversing climate-driven state shifts will require not only curbing global carbon emissions, but also targeted local interventions that break harmful ecological feedback loops and foster recovery.
Dataset DOI: https://doi.org/10.5061/dryad.kd51c5bhd
Description of the data and file structure
Algal_Biomass.csv
Description:
This dataset contains measurements of seaweed biomass (kelp, turf algae, and other seaweeds) collected from 1m² quadrats across six study sites over three years. (See “Quantifying seaweed community structure on the reef” in the methods section for detailed methods)
- To note for this paper “Turf” communities (i.e., those comprised of red, green, and/or brown filamentous and uniseriate algae, ~1 cm to ~15 cm in canopy height, that form upright filaments, mats, or tufts) can include any of the following species: Acrosiphonia_spp, Antithamnion_spp, Antithamnionella_spp, Audouinella_spp, Bonnemaisonia_hamifera, Carradoriella_elongata, Ceramium_spp, Cladophora_spp, Dasysiphonia_japonica, Ectocarpus_spp, Kaprunia_schneideri, Leptosiphonia_spp, Melanothamnus_harveyi, Other, Polysiphonia_stricta, Pterothamnion_plumula, Red_tufts, Rhodomela_spp, Scagelia_pylaisaei, Spermothamnion_repens, Sphacelaria_spp, Stylonema_alsidii, Vertebrata_fucoides, Vertebrata_nigra, Vertebrata_spp
- Kelps include Agarum_clathratum, Alaria_esculenta, Laminaria_digitata, Saccharina_latissima, and Laminarian_juvenile. All other species are no kelp canopy-forming species (Desmarestia_viridis and Desmarestia_aculeata) and understory species (all others).
Variables:
- Year – The year in which the survey took place (2021, 2022, or 2023).
- Site – The site at which the survey took place.
- Turf-dominated sites: Halfway_Rock, Damariscove_Island, Allen_Island
- Kelp-dominated sites: Little_Drisko_Island, Crumple_Island, Ram_Island
- Meter_Mark – The specific meter mark along the transect where the survey took place (5, 10, 15, 20, 25, 30, 35, and 40). Surveys happened at 4-6 meter marks per site.
- Name – The name of the seaweed species or complex identified (e.g., red_tufts, red_tubes). Other = indiscernible mixture of taxa that could not be teased apart.
- Wet_weight_g – The weight of the seaweed in grams after being spun to remove excess water. 0 = true absence, 0.001 = present but below scale detection limit (.01g)
- Phylum – The phylum the seaweed or complex belongs to. Other = indiscernible mixture of taxa that could not be teased apart.
- Class – The class the seaweed or complex belongs to. Other = indiscernible mixture of taxa that could not be teased apart.
- Order – The order the seaweed or complex belongs to. Other = indiscernible mixture of taxa that could not be teased apart.
- Genus – The genus the seaweed or complex belongs to if identifiable. Other = indiscernible mixture of taxa that could not be teased apart.
- Species – The species the seaweed or complex belongs to if identifiable.
- survey_area_m2 – The area the seaweed was surveyed in (1m² for large brown seaweed, 0.25m² for all other seaweeds).
- m2_weight_g – Standardized biomass per 1m². Since some seaweeds were surveyed in 0.25m², those values were multiplied by 4 to be comparable to 1m².
Kelp_Recruitment_Counts.csv
Description:
This dataset contains kelp recruitment counts from 1m² quadrats across six study sites over three years. (See “Quantifying seaweed community structure on the reef” in the methods section for detailed methods)
- Kelps include Agarum_clathratum, Alaria_esculenta, Laminaria_digitata, Saccharina_latissima, and Laminarian_juvenile.
Variables:
- Year – The year in which the survey took place (2021, 2022, or 2023).
- Site – The site at which the survey took place.
- Turf-dominated sites: Halfway_Rock, Damariscove_Island, Allen_Island
- Kelp-dominated sites: Little_Drisko_Island, Crumple_Island, Ram_Island
- Meter_Mark – The specific meter mark along the transect where the survey took place (5, 10, 15, 20, 25, 30, 35, and 40). Surveys happened at 8 meter marks per site.
- Name – The column is labeled “laminaria_juvenile” because at early life stages, different kelp species cannot be distinguished, so all recruits are lumped together.
- Count – The number of juvenile kelp individuals per 1m².
Turf_Cover.csv
Description:
This dataset contains percent cover estimates turf algae from 0.25m² quadrats across six study sites over three years. (See “Quantifying seaweed community structure on the reef” in the methods section for detailed methods)
Variables:
- Year – The year in which the survey took place (2021, 2022, or 2023).
- Site – The site at which the survey took place.
- Turf-dominated sites: Halfway_Rock, Damariscove_Island, Allen_Island
- Kelp-dominated sites: Little_Drisko_Island, Crumple_Island, Ram_Island
- Meter_Mark – The specific meter mark along the transect where the survey took place (5, 10, 15, 20, 25, 30, 35, and 40). Surveys happened at 8 meter marks per site.
- Percent_cover – The percent cover of all filamentous and turf algal species within a 0.25m² area. Note percent cover can be over 100% due to the multilayered nature of the seaweed community.
DOM_Tissue_MS_MS_Raw_Data.zip
Description:
This dataset contains centroided mzML files generated from raw LC-MS/MS data. A total of 562 samples were included in this mass spectrometry run. All samples besides instrument blanks and QC mix samples were included in the mzmine, GNPS, and Sirius jobs. However, for our main analysis, we subset to only specific samples of interest (see metadata file). For methods on the collection of DOM and seaweed tissue samples see “Dissolved organic matter and seaweed tissue collections” and for LC-MS/MS parameters see “Generating MS/MS Data”. Please refer to the code (Scripts 1-6) to determine which samples were included in the analysis. To view mzML data suggested programs at mzmine or the GNPS Dashboard (https://dashboard.gnps2.org)
DOM_Tissue_meta.csv
Description:
This dataset contains metadata for DOM and algal tissue samples used in the GNPS analysis (i.e., the compound annotation library). It includes information on sample collection, site details, and processing. Samples from multiple studies (i.e., multiple seasons and ecosystems) were included in the raw MS/MS dataset, to increase annotation rates within GNPS. However, our core analyses in the paper were focused on a particular set of samples specific to our questions of interest. Specifically, we filtered the dataset to focus only on “Core Water” (Core_w) and “Core_tissue” (Core_t) samples from oceanographic summer (Late_Summer) at our main study sites (Halfway_Rock, Damariscove_Island, Allen_Island, Little_Drisko, Crumple_Island, Ram_Island) and only DOM samples collected from the benthos (UWWCD). Each site had a DOM extraction blank to use for blank subtraction in downstream analysis (see code for details). N/As indicated not applicable metadata information for the sample.
Variables:
- filename – A unique name used to label the sample and corresponding LC-MS/MS file.
- ATTRIBUTE_Tray – The tray or batch the sample was run on (only recorded for DOM water samples).
- ATTRIBUTE_Injection_Order – The order in which the samples were run (only recorded for DOM water samples).
- ATTRIBUTE_Date_Collected – The date the sample was collected, formatted as Month_Day_Year.
- ATTRIBUTE_Site – The site where the sample was collected.
- Sites included in this study: Halfway_Rock, Damariscove_Island, Allen_Island, Little_Drisko, Crumple_Island, Ram_Island.
- Sites with combined names = Procedural blanks from SPE filtering for a separate longitudinal study, not applicable here.
- Isle_au_Haut and Damariscove Lake were not included in this study
- ATTRIBUTE_Bottle_Number – The physical collection bottle number (recorded for DOM samples).
- ATTRIBUTE_Sample_Device – The device used for DOM collection:
- UWWCD – Benthic samples collected via the benthic organic matter samplers.
- Bottle – Midwater samples.
- ATTRIBUTE_mL_filtered – The amount of water filtered through SPE for DOM.
- ATTRIBUTE_HPLC_number – The HPLC vial number in which the DOM SPE was eluted.
- ATTRIBUTE_Meter – The meter mark on the transect where the water sample was collected, linking visual surveys to water collections.
- ATTRIBUTE_Quad_Rep – The quadrat replicate number at the site (1-6) (only recorded for DOM samples).
- ATTRIBUTE_Replicate – The replicate number for DOM collected for a timeseries (not used in this study).
- ATTRIBUTE_Sample_Type – The type of sample collected:
- Water – DOM water samples.
- Tissue – Algal tissue samples.
- Blanks – Procedural blanks (1000ml LC-MS grade water filtered through a SPE)
- Pooled_QC – Quality control DOM pooled samples.
- Tissue_Pooled_QC – Pooled quality control for tissue samples.
- ATTRIBUTE_Month – The month the sample was collected.
- ATTRIBUTE_Season – The season in which the sample was collected:
- Early_Season – April/May.
- Late_Season – August/September.
- Blank, Pooled_QC, Tissue_Pooled_QC – Quality control samples.
- Not included in this study: Timeseries_Water, Timeseries_Tissue, Lake samples.
- ATTRIBUTE_Phase_State – The ecosystem state in which the sample was collected:
- Turf – Turf-dominated sites.
- Kelp – Kelp-dominated sites.
- Blank, Pooled_QC, Tissue_Pooled_QC – Quality control samples.
- Not included in this study: Timeseries_Water, Timeseries_Tissue, Lake samples.
- ATTRIBUTE_Functional_Group – For tissue samples, the broad functional group of the algae.
- ATTRIBUTE_Color – For tissue samples, the major algal group (Brown, Red, or Green).
- ATTRIBUTE_Species – For tissue samples, the genus/species of the algae.
- ATTRIBUTE_Tissue_Study – Used to subset the data for this study:
- Core_Water, Core_Tissue, Pooled_QC, Tissue_Pooled_QC – Included in this study.
- Lake, Timeseries_Water, Timeseries_Tissue – Not included in this study.
DOM_Tissue_GNPS_quant.csv
Description:
This dataset is an output file from mzmine. It is structured as a Sample × Feature matrix, where samples are represented as columns and features as rows. This table is used both in our analysis to understand differences in the kelp-dominated and turf-dominated reef metabolomes. For detailed mzmine information see “Generating the GNPS Quantification table using mzmine3”.
Variables:
- row ID – A unique identifier assigned to each feature (or row) detected in the mass spectrometry analysis. Represents a specific molecular feature, such as a compound or ion, detected at a particular retention time and m/z value.
- row m/z – The mass-to-charge ratio (m/z) of the detected ion or compound, used to identify and differentiate between compounds based on their mass and charge in the mass spectrometer.
- row retention time – The retention time (in minutes) of the compound in the chromatography phase of the mass spectrometry analysis. Indicates the time at which the compound eluted (was detected) during chromatography separation.
- row ion mobility – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- row ion mobility unit – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- row CCS – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- correlation group ID – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- annotation network number – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- best ion – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- auto MS2 verify – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- identified by n= – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- partners – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- neutral M mass – An unused relic of MZmine for which no relevant information is available. All cells blank and not applicable.
- Columns N-UY – These represent unique samples from either water or algal tissues
Canopus_Predictions.csv
Description: This dataset is an output file from the program Sirius, used for predicting molecular formulas, structures, and compound classifications based on mass spectrometry data. We used this in our analysis to see how the reef metabolome changed (at the chemical superclass level) on turf-dominated vs. kelp-dominated reefs. For detailed information on the Sirius parameters see “Generating the formula identification and canopus predictions from SIRIUS.” N/As indicated unknown predictions.
Variables:
- id – The unique identifier for each mass spectrometry scan from Sirius. The last number in this string corresponds to a specific molecular feature detected during the mass spectrometry analysis (i.e., #Scan#).
- molecularFormula – The molecular formula representing the composition of atoms (e.g., C, H, O, N) in the detected compound, based on mass spectrometric data and computational prediction.
- adduct – The type of adduct ion (e.g., [M+H]+, [M-H]−) that was formed during mass spectrometry, indicating how the molecule was ionized.
- NPC#pathway – The predicted biochemical pathway in which the compound might be involved, based on the Network of Primary and Secondary Metabolites and Pathways (NPC) database.
- NPC#pathway Probability – The probability of that prediction.
- NPC#superclass – The broadest level of chemical classification for the compound, as defined by the NPC database. This groups compounds into high-level categories based on chemical structure.
- NPC#superclass Probability – The probability of that prediction.
- NPC#class – A more specific chemical classification within the NPC database, grouping compounds into functional or structural classes.
- NPC#class Probability – The probability of that prediction.
- ClassyFire#most specific class – The most specific class assigned by ClassyFire, a hierarchical chemical classification system. This represents the most granular level of classification for the compound.
- ClassyFire#most specific class Probability – The probability of that prediction.
- ClassyFire#level 5 – A higher-level (broader) classification in the ClassyFire taxonomy, showing where the compound sits within the hierarchical classification (e.g., moving one level up from “most specific class”).
- ClassyFire#level 5 Probability – The probability of that prediction.
- ClassyFire#subclass – A subclass level of classification from the ClassyFire taxonomy. This groups related compounds within a subclass based on structural characteristics.
- ClassyFire#subclass Probability – The probability of that prediction.
- ClassyFire#class – The class level from the ClassyFire taxonomy, representing a broader group of compounds that share core structural characteristics.
- ClassyFire#class Probability – The probability of that prediction.
- ClassyFire#superclass – The highest level of classification in the ClassyFire taxonomy, grouping compounds into broad chemical families.
- ClassyFire#superclass probability – The probability of that prediction.
- ClassyFire#all classifications – A string containing all classification levels assigned to the compound.
Formula_identifications.csv
Description:
This dataset is an output file from Sirius, used for predicting molecular formulas based on mass spectrometry data. For detailed information on the Sirius parameters see “Generating the formula identification and canopus predictions from SIRIUS.” N/A indicates unknown predictions.
Variables:
- rank – The ranking of the molecular formula based on the Sirius scoring system.
- molecularFormula – The molecular formula of the compound identified, including elemental composition (e.g., C, H, O, N).
- adduct – The adduct ion associated with the compound (e.g., [M+Na]+, [M+H]+).
- precursorFormula – The molecular formula of the precursor ion used in the analysis.
- SiriusScore – The overall score assigned by Sirius based on the confidence of the molecular formula identification.
- TreeScore – A sub-score reflecting the quality of the fragmentation tree for the molecular formula.
- IsotopeScore – A score reflecting the match between the observed and theoretical isotopic patterns of the compound.
- numExplainedPeaks – The number of peaks in the mass spectrum that are explained by the proposed molecular formula.
- explainedIntensity – The proportion of the spectral intensity explained by the proposed molecular formula.
- medianMass – The median m/z value of the peaks considered for formula identification.
- medianAbsoluteError – The median absolute error between measured and theoretical masses for identified peaks.
- massErrorPrecursor – The mass error (in ppm) of the precursor ion compared to the proposed molecular formula.
- lipidClass – The lipid class or subclass assigned to the identified molecular formula, if applicable.
- ionMass – The measured mass-to-charge ratio (m/z) of the ion.
- retentionTime – The retention time of the compound in the chromatographic separation.
- id – A unique identifier assigned to each Sirius formula identification result.
Canopus_formula_summary.csv
Description:
This dataset is an output file from the program Sirius using the CSI-fingerprintID module, used for predicting molecular formulas, structures, and compound classifications based on mass spectrometry data. We used this in our analysis to see how the reef metabolome changed (at the chemical superclass level) on turf-dominated vs. kelp-dominated reefs. For detailed information on the Sirius parameters see “Generating the formula identification and canopus predictions from SIRIUS.” N/As indicated unknown predictions.
Variables:
- id – The unique identifier for each mass spectrometry scan from Sirius. The last number in this string corresponds to a specific molecular feature detected during the mass spectrometry analysis.
- molecularFormula – The molecular formula representing the composition of atoms (e.g., C, H, O, N) in the detected compound, based on mass spectrometric data and computational prediction.
- adduct – The type of adduct ion (e.g., [M+H]+, [M-H]−) that was formed during mass spectrometry, indicating how the molecule was ionized.
- NPC#pathway – The predicted biochemical pathway in which the compound might be involved, based on the Network of Primary and Secondary Metabolites and Pathways (NPC) database.
- NPC#pathway Probability – The probability of that prediction.
- NPC#superclass – The broadest level of chemical classification for the compound, as defined by the NPC database. This groups compounds into high-level categories based on chemical structure.
- NPC#superclass Probability – The probability of that prediction.
- NPC#class – A more specific chemical classification within the NPC database, grouping compounds into functional or structural classes.
- NPC#class Probability – The probability of that prediction.
- ClassyFire#most specific class – The most specific class assigned by ClassyFire, a hierarchical chemical classification system. This represents the most granular level of classification for the compound.
- ClassyFire#most specific class Probability – The probability of that prediction.
- ClassyFire#level 5 – A higher-level (broader) classification in the ClassyFire taxonomy, showing where the compound sits within the hierarchical classification (e.g., moving one level up from “most specific class”).
- ClassyFire#level 5 Probability – The probability of that prediction.
- ClassyFire#subclass – A subclass level of classification from the ClassyFire taxonomy. This groups related compounds within a subclass based on structural characteristics.
- ClassyFire#subclass Probability – The probability of that prediction.
- ClassyFire#class – The class level from the ClassyFire taxonomy, representing a broader group of compounds that share core structural characteristics.
- ClassyFire#class Probability – The probability of that prediction.
- ClassyFire#superclass – The highest level of classification in the ClassyFire taxonomy, grouping compounds into broad chemical families.
- ClassyFire#superclass probability – The probability of that prediction.
- ClassyFire#all classifications – A string containing all classification levels assigned to the compound.
Library_hits.csv
Description: This dataset is the annotation table from the GNPS job for DOM and algal tissue samples. It contains information on identified compounds based on spectral matching against the GNPS library. This file is a condensed version (unnecessary columns were removed) of what will be included if the library hits are downloaded again from GNPS. The job results can be accessed at: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=bee692af684e4097b83bde2f8e12ab10
Variables:
- Compound_Name – Name of the identified compound (Level 2 ID).
- Adduct – The ionized form of the molecule detected during mass spectrometry, indicating how the molecule was ionized (e.g., M+H indicates a proton was added).
- Precursor_MZ – The mass-to-charge ratio (m/z) of the precursor ion. This represents the molecular ion or the ionized version of the compound selected for fragmentation in the mass spectrometer.
- #Scan# – The scan number corresponding to the specific scan in which the precursor ion was detected by the mass spectrometer.
- MQScore – The Match Quality Score, which indicates the quality of the match between the query spectrum and the reference spectrum from the GNPS library. It ranges from 0 to 1, with higher scores indicating better matches.
- TIC_Query – The Total Ion Count (TIC) of the query spectrum, reflecting the total signal intensity of the ions in the mass spectrum.
- RT_Query – The retention time (in minutes) of the query ion during chromatography, representing when the compound was detected during the run.
- MZErrorPPM – The mass error in parts per million (PPM) between the observed m/z and the expected m/z for the precursor ion. Lower values indicate higher accuracy in the measurement.
- SharedPeaks – The number of peaks shared between the query spectrum and the library spectrum, indicating how many peaks match between the two spectra.
- MassDiff – The difference in mass between the precursor ion in the query spectrum and the library spectrum, typically expressed in Daltons (Da).
- LibMZ – The m/z value of the precursor ion in the GNPS library spectrum that matches the query spectrum.
- SpecMZ – The m/z value of the precursor ion in the query spectrum itself.
· MoleculeExplorerDatasets – Datasets associated with the Molecule Explorer tool.
Unique_Library_hits.csv
Description:
This dataset is the annotation table from the GNPS job for DOM and algal tissue samples. It contains only unique compound annotations, meaning no duplicate compound names. The job (same as above) results can be accessed at:\
https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=bee692af684e4097b83bde2f8e12ab10
N/A indicates information was not provide and unknown for the GNPS job.
Variables:
-
SpectrumID – Unique identifier for each spectrum in the GNPS database.
- Compound_Name – Name of the compound identified in the GNPS library.
- Ion_Source – Type of ionization source used in mass spectrometry (e.g., ESI, APCI).
- Instrument – Type of mass spectrometer used for data collection.
- Compound_Source – The origin of the compound (e.g., natural, synthetic, commercial).
- PI – Principal investigator responsible for the dataset.
- Data_Collector – The individual or lab that collected the spectral data.
- Adduct – Specifies the ion adduct (e.g., [M+H]+, [M-H]-) detected in the mass spectrometry data.
- Precursor_MZ – The mass-to-charge ratio (m/z) of the precursor ion selected for fragmentation.
- ExactMass – The exact monoisotopic mass of the identified compound.
- Charge – The charge state of the ion.
- CAS_Number – CAS registry number for the identified compound, if available.
- Pubmed_ID – PubMed reference associated with the compound, if available.
- Smiles – Simplified Molecular Input Line Entry System (SMILES) representation of the compound’s structure.
- INCHI – International Chemical Identifier (InChI) for the compound.
- INCHI_AUX – Auxiliary information for the InChI identifier.
- Library_Class – Classification of the reference library (e.g., GNPS Gold, in-house libraries).
- IonMode – Indicates whether the spectrum was acquired in positive or negative ion mode.
- UpdateWorkflowName – Name of the workflow used to process the dataset.
- LibraryQualityString – String indicating the quality of the spectral match.
- #Scan# – Unique scan number for the spectrum in the GNPS dataset.
- SpectrumFile – The file from which the spectrum was extracted.
- LibraryName – Name of the spectral library used for matching.
- MQScore – Matching quality score indicating how well the query spectrum matches the library.
- Organism – The biological source of the compound (if applicable).
- TIC_Query – Total ion current (TIC) of the queried spectrum.
- RT_Query – Retention time (RT in minutes) of the spectrum query during chromatography.
- MZErrorPPM – Mass measurement error reported in parts per million (ppm).
- SharedPeaks – Number of shared peaks between the query spectrum and the library spectrum.
- MassDiff – Mass difference between the library spectrum and the query spectrum.
- LibMZ – Mass-to-charge ratio of the matched library spectrum.
- SpecMZ – Mass-to-charge ratio of the query spectrum.
- SpecCharge – Charge state of the queried spectrum.
- FileScanUniqueID – Unique identifier for each spectrum in the file.
- NumberHits – Number of matching hits for the query spectrum.
- tags – Tags or annotations associated with the spectrum.
- MoleculeExplorerDatasets – Datasets associated with the Molecule Explorer tool.
- MoleculeExplorerFiles – Files associated with the Molecule Explorer tool.
- InChIKey – A hashed version of the InChI identifier for database searching.
- InChIKey-Planar – The planar form of the InChIKey used for structural comparison.
- superclass – Broad chemical classification of the compound (e.g., Alkaloids, Benzenoids).
- class – More specific chemical class of the compound (e.g., Fatty Acids, Phenylalanine).
- subclass – Subdivision within the chemical class (e.g., Flavonoids, Alkaloids).
- npclassifier_superclass – Superclass classification from the Natural Products Classifier.
- npclassifier_class – Class classification from the Natural Products Classifier.
- npclassifier_pathway – Predicted biosynthetic pathway from the Natural Products Classifier.
- InternalFilename – Internal filename used to store the processed spectral data.
DOM_Annotations_318.csv
Description:
This dataset is the annotation table from the GNPS job for DOM and algal tissue samples. It contains only unique compound annotations from the dissolved organic matter collected in for kelp and turf dominated reefs. It does not take into account the annotations from the algal tissues. We explicitly reference these 318 annotations in text. N/A indicates information was not provide and unknown for the GNPS job.
Variables:
- #Scan# – Unique scan number for the spectrum in the GNPS dataset.
- mz – The mass-to-charge ratio (m/z).
- RT - The retention time in minutes
- Compound_Name – Name of the compound identified in the GNPS library.
- SpectrumID – Unique identifier for each spectrum in the GNPS database.
- Adduct – Specifies the ion adduct (e.g., [M+H]+, [M-H]-) detected in the mass spectrometry data.
- Charge – The charge state of the ion.
- Smiles – Simplified Molecular Input Line Entry System (SMILES) representation of the compound’s structure.
- MQScore – Matching quality score indicating how well the query spectrum matches the library.
- MZErrorPPM – Mass measurement error reported in parts per million (ppm).
- SharedPeaks – Number of shared peaks between the query spectrum and the library spectrum.
- MassDiff – Mass difference between the library spectrum and the query spectrum.
- LibMZ – Mass-to-charge ratio of the matched library spectrum.
- SpecMZ – Mass-to-charge ratio of the query spectrum.
- InChIKey – A hashed version of the InChI identifier for database searching.
- superclass – Broad chemical classification of the compound (e.g., Alkaloids, Benzenoids).
- class – More specific chemical class of the compound (e.g., Fatty Acids, Phenylalanine).
- subclass – Subdivision within the chemical class (e.g., Flavonoids, Alkaloids).
- npclassifier_superclass – Superclass classification from the Natural Products Classifier.
- npclassifier_class – Class classification from the Natural Products Classifier.
- npclassifier_pathway – Predicted biosynthetic pathway from the Natural Products Classifier.
Settlement_Assay_All_Results.csv
Description:
This dataset contains results from kelp gametophyte settlement assays. See methods section “Collecting seaweed, generating extracts, and conducting turf vs. gametophyte allelopathy trials” for detailed information.
Variables:
- Well – Replicate within the settlement plate (1-10).
- Total – Total number of gametophytes in the well at the end of 96hrs.
- Dead – Total number of gametophytes dead based on dead cell fluorescence uptake at the end of 96hrs.
- Alive – Total number of gametophytes alive based on dead cell fluorescence uptake at the end of 96hrs.
- Dead_Proportion – Proportion of gametophytes dead (Dead/Total).
- Live_Proportion – Proportion of gametophytes alive (Alive/Total).
- Treatment – Name of the treatment or control in the assay.
- For algae treatments: Ceramium, Vertebrata, DJ (D. japonica), M. harveyi, P. stricta, or control.
- For the DOM assay (site-based):
- Turf-dominated sites: Damariscove, Halfway Rock, Allen.
- Kelp-dominated sites: Little Drisko, Crumple, Ram.
- Controls for each site: Labeled as site name_control.
- Experiment – Type of experiment conducted.
- Turf_DOM – Dissolved organic matter from turf sites.
- Kelp_DOM – Dissolved organic matter from kelp sites.
- Surface extracts – Surface extracts from five dominant turf species.
- Water_Borne – Water-soluble exudates released from the five most abundant turf algal species.
Bioactive_Compound_Annotations_Turf_Extracts.csv
Description:
This dataset is the annotation table from the GNPS job from the turf extracts (waterborne and surface-bound) which significantly decreased kelp survival. It contains only compound annotations and analog annotations from these extracts. We explicitly reference these annotations in text. N/A indicates information was not provide and unknown for the GNPS job.
Variables:
- #Scan# – Unique scan number for the spectrum in the GNPS dataset.
- MQScore – Matching quality score indicating how well the query spectrum matches the library.
- MZErrorPPM – Mass measurement error reported in parts per million (ppm).
- SharedPeaks – Number of shared peaks between the query spectrum and the library spectrum.
- MassDiff – Mass difference between the library spectrum and the query spectrum.
- SpecMZ – Mass-to-charge ratio of the query spectrum.
- SpecCharge – The charge state of the ion.
- Compound_Name – Name of the compound identified in the GNPS library.
- Adduct – Specifies the ion adduct (e.g., [M+H]+, [M-H]-) detected in the mass spectrometry data.
- CAS_Number – CAS registry number for the identified compound, if available.
- Pubmed_ID – PubMed reference associated with the compound, if available.
- Smiles – Simplified Molecular Input Line Entry System (SMILES) representation of the compound’s structure.
- INCHI – International Chemical Identifier (InChI) for the compound.
- INCHI_AUX – Auxiliary information for the InChI identifier.
- InChIKey – A hashed version of the InChI identifier for database searching.
- InChIKey-Planar – The planar form of the InChIKey used for structural comparison.
- superclass – Broad chemical classification of the compound (e.g., Alkaloids, Benzenoids).
- class – More specific chemical class of the compound (e.g., Fatty Acids, Phenylalanine).
- subclass – Subdivision within the chemical class (e.g., Flavonoids, Alkaloids).
- npclassifier_superclass – Superclass classification from the Natural Products Classifier.
- npclassifier_class – Class classification from the Natural Products Classifier.
- npclassifier_pathway – Predicted biosynthetic pathway from the Natural Products Classifier.
Turf_Extracts_MS_MS_Raw_Data.zip
Description:
This dataset contains centroided mzML files generated from raw LC-MS/MS data for the surface extract of M. harveyi (the only bioactive surface extract) and the extracts of waterborne exudates from M. harveyi, Ceramium spp., D. japonica, P. stricta, and V. fucoides. For LC-MS/MS parameters see “Generating MS/MS Data” in the methods section. To view mzML data suggested programs at mzmine or the GNPS Dashboard (https://dashboard.gnps2.org)
Metadata for code (Zenodo: 10.5281/zenodo.13912353)
For code, please go to 10.5281/zenodo.13912353
The code is contained in 6 scripts. In scripts 1-4 we use the mzmine quantification file (DOM_Tissue_GNPS_quant.csv) and merge it with (Formula_identifications.csv, Canopus_Predictions.csv, and Library_hits.csv) to link our features with annotations and predictions. Next, we subtract blanks, impute and normalize the data (all in script 1). In scripts 2-4 we perform numerous univariate and multivariate statistics to understand the differences between the metabolomes of kelp-dominated and turf-dominated reefs. (In these analyses, we use the Formula_indentifications.csv and Canopus_formula_summary.csv). In script 5 we use (Turf_Cover.csv, Algal_Biomass.csv, and Kelp_Recruitment_Counts.csv) to produce summary statistics form our field surveys. Lastly, in script 6 we use (Settlement_Assay_All_Results.csv) to assess the survival of kelp gametophytes when exposed to reef DOM and seaweed exudates, relative to controls.
Quantifying seaweed community structure on the reef
Files
Algal_Biomass.csv
Kelp_Recruitment_Counts.csv
Turf_Cover.csv
At each site, SCUBA divers deployed a 40-meter transect on the reef, set perpendicular to shore and contouring the 5-7 m depth isobath (mean lower low water). Within replicate 1 m2 quadrats deployed at set intervals along the transect (n = 8 per site), we visually counted the number of juvenile kelp recruits. Then, in a portion (0.25 m2 area) of each quadrat, we estimated the abundance of turf algae (percent cover). To identify the species composition of the turf algae community (which can be difficult to discern underwater), we next harvested all kelps (in the 1 m2 area) and all low-lying seaweeds residing under the canopy, including bladed, foliose, and turf algae species (in the 0.25 m2 area) from a subset of quadrats (n = 4-6 per site). Collected seaweeds were kept cool on the boat and brought back to the lab within 6 hours, where they were sorted to species, spun (20 revolutions in a salad spinner), and weighed to estimate their biomass.
Dissolved organic matter and seaweed tissue collections
Files
DOM_Tissue_meta.csv
Water collection for nontargeted metabolomics
To characterize the metabolome of each site, in 2022, we collected water samples containing dissolved organic matter (DOM) and subjected them to non-targeted metabolomics (see below). We studied all six sites (see above) within a 4-week period to avoid conflating site with seasonal effects. Water samples (1L, n = 6 per site) were collected within the first 6 survey quadrats (see above), making them spatially linked. Water samples were taken from within the seaweed matrix (0.5 – 3 cm above the reef) using a custom designed benthic organic matter sampler (Fig. S7). These watertight cylinders – made from chemically inert HDPE and Teflon – were filled with air on the surface, and the inlets on each end were closed. Underwater, once on the benthos, both inlets were opened, with air escaping from the top, creating suction on the bottom to collect the DOM. Samples were brought to the surface, stored on ice, and transported to the lab to be immediately filtered and extracted. Additional samples were collected from the benthos in May 2022 (n = 36), and from mid-water stations (1-2 meters above the reef) in both May and August 2022 (n = 72); data from these samples were used to augment our chemical reference library (i.e., help increase feature annotations and maximize annotation propagation within feature-based molecular networks) but were not included in our study or core analyses because they were not linked in space or time with our questions of interest
Seaweed tissue collections for nontargeted metabolomics
To characterize the internal chemistry of seaweed and whether this chemistry is also found in the waterborne reef metabolome (i.e., exuded into the surrounding seawater), divers haphazardly collected and individually bagged the dominant seaweed species at each site. Care was taken to collect only clean thalli with few to no epiphytes. Seaweeds were brought aboard the boat, immediately rinsed with raw seawater, cleaned of any epiphytic organisms, placed in a precleaned and muffled 20 mL scintillation vial, and placed on dry ice to stop metabolic activity.
Generating MS/MS Data
Files
DOM_Tissue_MS_MS_Raw_Data.zip
Turf_Extracts_MS_MS_Raw_Data.zip
Sample preparation for UHPLC-MS/MS
Reef metabolome (i.e., DOM) extracts were re-dissolved in 100 µL methanol (LC-MS grade) and 1% formic acid (LC-MS grade). Two standards were created to account for instrument drift and batch effects – an internal positive control containing six synthesized compounds and a pooled standard containing 1 µL of 50 DOM samples combined into one. Seaweed tissue extracts were redissolved in methanol (LC-MS grade) and 1% formic acid (LC-MS grade) and diluted to 50 mg/mL. For seaweed chemical analysis, the same internal positive control was used, while a seaweed tissue pooled standard was created by combining 1 µL of 50 seaweed tissue extracts. Samples were randomized before being subjected to UHPLC-MS/MS.
UHPLC-MS/MS
We subjected our water and seaweed tissue samples to UHPLC-MS/MS using previously developed methods (64). For chromatographic separation, we used a C18 core-shell column (Kinetex, 150 × 2.1 mm, 1.8 µm particle size, 100 A pore size, Phenomenex, Torrance, USA) with a flowrate of 0.5 mL/minute (Solvent A: H2O + 0.1% formic acid (FA), Solvent B: Acetonitrile (ACN) + 0.1% FA). After injection, the samples were eluted with a linear gradient from 0 to 0.5 minutes, 5% B, 0.5 to 8 minutes, 5 to 50% B, 8 to 10 minutes, 50 to 99% B, followed by a 3-minute washout phase at 99% B and 3-minute re-equilibration phase at 5% B.
Electrospray ionization (ESI) parameters were set as follows: Gas flows were 50 L/minute for sheath, 12 L/minute for auxiliary, and 1 L/minute for sweep. The auxiliary gas temperature was 400°C. The spray voltage was set to 3.5kv with the inlet capillary at 250°C. Additionally, a 50 V S-lens was applied. For the full scan (MS1) acquisition, the scan range was 150–1,500 m/z with a resolution at m/z 200 (Rm/z 200) of 120,000 with one micro-scan, and the scan polarity was set in positive mode. Automated gain control (AGC) was set to 1.0E6 with a maximum ion injection time of 100 milliseconds. MS/MS spectra were recorded in data-dependent acquisition (DDA) mode (65). In addition to MS1 survey a maximum of 5 MS/MS scans of the most abundant ions per duty cycle were measured with Rm/z 200 of 15,000 with one micro-scan. Automatic gain control targets were set to 5.0E5 with a minimum 10% C-trap filling for MS/MS. MS/MS precursor selection windows were set to m/z 1. The normalized collision energy was increased from 25 to 35 to 45%, with z = 1 as the default charge state. An apex trigger was applied to MS/MS experiments with 2-15 seconds from their first occurrence. Dynamic exclusion was set to 5 seconds. Ions with unassigned charge states were excluded from DDA and isotope peaks.
UHPLC-MS/MS: extracts of turf algae exudates
To explore potential bioactive compounds in the extracts of waterborne exudates from turf algae (which caused significant kelp gametophyte mortality), we subjected each extract to UHPLC-MS/MS using the same methods as described above. Post-processing followed the same procedures using MZmine3, SIRIUS, and GNPS (Datefile S2).
Generating the GNPS Quantification table using mzmine3
Files
DOM_Tissue_GNPS_quant.csv
Post-processing of UHPLC-MS/MS data (MZmine3 and SIRIUS)
Before post-processing, the 6 compounds that made up the internal positive control were assessed to account for mass to charge (m/z) and retention time (RT) shifts (Figs. S8-S9). Thermo.raw datasets were converted to .mzXML in centroid mode using MSConvert (66). Centroided data were processed in batch mode with MZmine3 for feature extraction, characterization, and quantification (32). Noise levels for MS1 and MS2 mass detection were 2.0E5 and 1.0E3, respectively. Ion chromatograms were built with a minimum group size of 3, group intensity threshold of 5.0E5, minimum peak height of 1.0E6, and relative mass tolerance of 3 ppm. Chromatographic deconvolution was performed using a local minimum resolver, with a chromatographic threshold of 80%, minimum peak height 1.5E6, minimum ratio of peak top/edge 1.5, and peak duration between 0.01 - 5 minutes. For isotope peak grouping, mass and retention time tolerances were set to 3 ppm and 0.1 minutes, respectively, with a maximum charge of 2. Extracted ion chromatograms were aligned using the join aligner with the same mass and retention time tolerances as above. Only extracted chromatograms that contained 2 isotope peaks and occurred in 5 samples were considered. The peak list was further refined using a duplicate peak filter with a mass and retention time tolerance of 3 ppm and 0.1 minutes, respectively. Finally, the gap-filling function was used with intensity tolerance at 20%, mass tolerance of 3 ppm, and retention time tolerance of 0.1 minutes. The MZmine3 output quantification table and .mgf files were used to create a feature-based molecular network (FBMN) in GNPS (31).
Generating the formula identification and canopus compound predictions from SIRIUS.
Files
Formula_identifications.csv
Canopus_Predictions.csv
Canopus_formula_summary
Post-processing of using SIRIUS
SIRIUS (5.6.3) was used to annotate and predict molecule identities using the tandem mass spectrometry data (35). Using the SIRIUS module, molecular formulas were computed by matching experimental with predicted isotopic patterns from the fragmentation trees analysis of MS2. Parameters for SIRIUS were as follows: Instrument: Orbitrap, MS/MS ppm: 5, Isotope scorer: ignore, Candidates stored:10, Min candidates per Ion: 10, Databases used: no selections, Possible Ionizations: Pos, Tree timeout:0, Compound timeout:0, Use heuristics above m/z: 300, Use heuristics only above m/z: 650. Furthermore, in silico structure annotations were obtained with CSI:FingerID (33), using Bio Database, while class annotations were obtained with CANOPUS with the NPClassifier ontology (36).
Generating molecular feature identification using GNPS
Files
Library_hits.csv
Unique_Library_hits.csv
DOM_Annotations_318.csv
Bioactive_Compounds_from_turf_extracts.csv
GNPS Jobs for both the DOM and tissue samples, along with the waterborne/surface extracts, can be found at:
DOM/Tissue
(https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=bee692af684e4097b83bde2f8e12ab10)
Turf Extracts
https://gnps2.org/status?task=5843f1ba09ce4157ab8f854d8e860fe5
Collecting seaweed, generating extracts, and conducting (turf seaweed vs gametophyte) allelopathy trials.
Allelopathy trials: effects of the metabolomes from kelp-dominated and turf-dominated reefs
To quantify the effects of the reef metabolome on kelp recruitment, we combined DOM collected from each study site with kelp gametophytes at natural concentrations (1.1-1.8 µg/mL) and assessed the effects of DOM on gametophyte survival, relative to controls. For each site, we combined 200 µL from each of the 6 DOM extracts (see Water collection for nontargeted metabolomics) into a pooled, site-level extract (to account for within-site variation). Ecologically relevant concentrations (µg/mL) were calculated by dividing the total volume of water collected by the dried extract mass. DOM extracts (i.e., treatments) were resuspended in methanol, and a 0.25 µL aliquot was combined with ~20 gametophytes in a single well of a 96-well flat bottom plate, along with 300 µL of pre-made sterile seawater (i.e., Instant Ocean, MilliQ water, and nutrients f/2 which had been autoclaved to prevent microbial contamination). We used sterile seawater to avoid confounding factors (such as bacterial activity, pathogens, or variation in nutrient concentrations in raw seawater) that could have obscured our ability to directly test and quantify the effects of allelopathy in this ecosystem. Controls – SPE extracts of 1L of water (LC-MS grade) that were filtered at the same time as DOM samples – were resuspended in methanol, and a 0.25 µL aliquot was combined with ~20 gametophytes and 300 µL pre-made sterile seawater (n = 10 replicate wells per treatment and control). Assays were kept in a 10°C incubator in the dark for 96 hours, at which time they were scored (see below). Clonal male gametophytes were cultivated at the Aquaculture Research Institute (ARI) at the University of Maine’s Darling Marine Center.
Collecting turf algae for species-specific allelopathy trials
We collected the five most abundant species of turf algae (Polysiphonia stricta, Dasysiphonia japonica, Ceramium spp., Vertebrata fucoides, and Melanothamnus harveyi) for use in allelopathy trials (see below). These species comprised 83% of the total turf algae biomass we have observed across seasons and years in this ecosystem (2021-2023; unpublished data), indicating that they consistently underpin the turf algal community in this system. Collections occurred in August 2023 to coincide seasonally with DOM sampling, making our evaluation of the chemical effects of these turf algae on kelp gametophytes spatially and temporally linked to the metabolome trials (see above). For collections, seaweeds free of epiphytic organisms were haphazardly collected, kept in fresh seawater in the dark, and returned to the lab for extraction.
Generating waterborne metabolite extracts from turf algae, for allelopathy trials
To obtain waterborne turf algae exudates, 35.2 g of a given turf algae species was placed in a glass container with 1L of artificial seawater (Instant Ocean and MilliQ water) and incubated at 12°C for 1 hour. To calculate this ecologically relevant concentration (grams of algae/volume seawater), we first calculated the average total biomass of turf algae at our turf-dominated sites (2021-2023) and standardized this value to 100% cover. We then estimated the active space of waterborne chemistry on the reef – in terms of volume per m2 – to be 1m x 1m x 0.015m (15L), resulting in a concentration of 528 g algae/15L (or 35.2 g/L). The conditioned water was then filtered through a GF/C filter (1.2 µm, 120 mm diameter) and split into two 500 mL portions to limit saturating the SPE cartridge. The filtrates were then extracted via two 0.2 g bed mass SPE PPL cartridges (as described above). Both cartridges were eluted with 2 mL methanol into a 20 mL scintillation vial and dried. The two extracts were resuspended in 2 mL of methanol, combined, and transferred to a pre-weighed 2 mL HPLC vial and dried under vacuum.
Generating surface-bound metabolite extracts from turf algae, for allelopathy trials
To generate turf algae surface extracts, we first determined an appropriate ratio of hexane:dichloromethane (LC-MS grade) that would extract surface-bound molecules – but not lyse the cell walls of turf algae – via dipping each species in increasing concentrations of DCM (0%, 2%, 4%, 6%, or 8%) for 30 seconds. The resultant turf algae were stained with Evans Blue, and cell lysis was quantified using a Leica DMi8 microscope with 640nm light for excitation and a 700nm emission filter to visualize the Evans Blue fluorescence using a low-powered (5x) objective lens and a Leica DFC9000GT sCMOS camera. Images were captured using LASX software. If cell lysis occurred, algae would take up the cell impermeable stain. Once appropriate ratios were determined, surface-bound molecules were extracted by dipping whole thalli (n = 5 for each species) in 60 mL of a hexane / DCM mixture (percentages based on the above tests) for 30 seconds with agitation. Each thallus was then dried to remove excess water, weighed, and spread flat for imaging to determine surface area. Extracts were dried via rotary evaporation. To remove residual salt carryover, we partitioned each extract using 5 mL ethyl acetate (LC-MS grade) and 5 mL water (LC-MS grade). After partitioning, we discarded the water fraction while the non-polar ethyl acetate fraction was saved and dried into a pre-weighed 20 mL scintillation vial. To incorporate any intraspecific variation between thalli, all 5 replicates per species were pooled (by combining all 5 individual extracts and drying down the pooled extract) and pooled extracts were used in the assays (see below). Ecologically relevant concentrations (mg/cm2) were calculated by dividing the dried extract mass by the 2-D surface area of the associated alga (determined via ImageJ).
Allelopathy trials: effects of waterborne metabolites from turf algae
To test the lethality of waterborne exudates from turf algae on kelp recruitment, we subjected kelp gametophytes to the extracts of exudates from each of the 5 most abundant turf species (at ecologically relevant concentrations, 0.4-2.3 µg/mL). Extracts were resuspended in methanol; for each replicate, a 0.25 µL aliquot of extract was combined with ~20 gametophytes and 300 µL of sterile seawater (consisting of Instant Ocean, MilliQ water, and nutrients f/2) in a single well of a 96-well flat bottom plate. Control extracts – SPE cartridges that eluted 1000 mL of LC-MS grade water and were then extracted and dried with the same methods as above – were resuspended in methanol. For each control replicate, a 0.25 µL aliquot of extract was combined with ~20 gametophytes and 300 µL of pre-made sterile seawater (consisting of Instant Ocean, MilliQ water, and nutrients f/2) (n = 10 wells per treatment and control). Gametophytes were kept in a 10°C incubator in the dark for 96 hours, at which time they were scored (see below).
Allelopathy trials: effects of surface-bound metabolites from turf algae
To test the effects of surface-bound molecules from turf algae on kelp gametophytes, we painted surface extracts from turf algae on the bottoms of wells within 96-well flat bottom plates using hexane (LC-MS grade) (0.5-3.8 µg/mL). After complete evaporation of the hexane, ~20 gametophytes and 300 µL of pre-made sterile seawater (consisting of Instant Ocean, MilliQ water, and nutrients f/2) were added to each well (n = 10 replicate wells per treatment). A control (hexane only) was also employed (n = 10 replicate wells). Gametophytes were kept in a 10°C incubator in the dark for 96 hours, at which time they were scored (see below).
Scoring and statistical analysis of kelp gametophyte mortality
After 96 hours, each well was stained with CellTox Green – a cell impermeant DNA stain – to assess gametophyte mortality under epifluorescence. Live kelp gametophytes exhibit no fluorescence with Celltox Green staining, while dead gametophytes become permeable to the dye, which binds DNA and induces bright fluorescence. Following staining, the number of living and dead gametophytes in each well was immediately scored using a Leica DMi8 microscope with 470nm light for excitation and a 590 emission filter to visualize the CellTox fluorescence in dead gametophytes using a low-powered (20x) objective lens and a Leica DFC9000GT sCMOS camera. Given uncertainty regarding how fast allelochemicals degrade in nature, we limited our assay duration to 96 hours. The duration of allelochemical exposure to gametophytes was thus short, relative to what they likely experience in nature (where nearby turf algae would be exuding waterborne metabolites frequently, over weeks to months of interaction time). Our results are, therefore, likely to be conservative.
To test for differences in kelp gametophyte mortality between a given treatment and its paired control, we employed a test for equality of proportions with continuity correction. We also employed a generalized linear mixed effects model using the glmmTMB package (66) within the program R to assess whether treatments were confounded by gametophyte starting concentration. Survival was unaffected by starting concentration; thus, the tests for equality of proportions with continuity correction are presented.