Data from: Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios

Published Dec 23, 2024 on Dryad. https://doi.org/10.5061/dryad.g79cnp5xt

Abstract

Intraspecific biodiversity can have ecosystem-level consequences and may affect the accuracy of ecological forecasting. For example, rare genetic variants may have traits that prove beneficial under future environmental conditions. The cyanobacterium responsible for most freshwater harmful algal blooms worldwide, Microcystis aeruginosa, occurs in at least three types. While the dominant type occurs in eutrophic environments and is adapted to thrive in nutrient rich conditions, two additional types have recently been discovered that inhabit oligotrophic and eutrophic environments and have genomic adaptations for survival under nutrient limitation. Here, we show that these oligotrophic types are widespread throughout the Eastern USA. By pairing an experimental warming study with gene expression analyses, we found that the eutrophic type was most susceptible to climate warming. In comparison, oligotrophic types maintained their growth better and persisted longer under warming. As a mechanistic explanation for these patterns, we found that oligotrophic types responded to warming by widespread elevated expression of heat shock protein genes. Reduction of nutrient loading has been a historically effective mitigation strategy for controlling harmful algal blooms. Our results suggest that climate warming may benefit oligotrophic types of M. aeruginosa, potentially reducing the effectiveness of such mitigation efforts. In depth study of intraspecific variation may therefore improve forecasting for understanding future whole ecosystem dynamics.

Description of the data and file structure

This repository is associated with Kuijpers, M. C. M.; Quigley, C. V.; Bray, N. C.; Ding, W.; White, J. D.; & Jackrel, S. L. (2025). Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios. PRSB. In Press.

In this study, we first show that previously discovered oligotrophic types of Microcystis aeruginosa are widespread throughout the Eastern USA. By pairing an experimental warming study with gene expression analyses, we then show that the common eutrophic type may be most susceptible to climate warming. In comparison, we find that oligotrophic types maintained their growth better and persisted longer under warming. As a potential mechanistic explanation for these patterns, we then show that oligotrophic types responded to warming by widespread elevated expression of heat shock protein genes. Reduction of nutrient loading has been a historically effective mitigation strategy for controlling harmful algal blooms. Our results suggest that climate warming may benefit oligotrophic types of M. aeruginosa, potentially reducing the effectiveness of such mitigation efforts.

This repository contains the scripts and raw data necessary to perform all analyses in the associated paper and to produce all figures with the exception of the base maps which were produced with opensource data with QGIS and the raw sequence data which is available on NCBI with accession numbers: PQ666794-PQ666862. The analysis script was written in R and run in RStudio. All packages used are open source and free to access.

Files are listed in this document in order of use. Any files in this repository not listed below are unused formats of the data. For example, “All data with comments.xlsx” is a file with all data on separate sheets for easier upload but is not actually used in the analysis and RScript.

Files and variables

File: Microcystis_Thermal_Experiment_Supplementary_AllAnalysisCode.R

Description:

This RScript contains all the code to analyse our experimental data and create the figures in the paper and supplement with the exception of maps and phylogenies. Maps figures were made with QGIS. Phylogenies which were first built with RAxML (see supplemental information for Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios) and then imported into this RScript for plotting.

File: RAxML_bipartitions.tree.txt

Description:

This file was created with RAxML and contains the phylogeny of Microcystis strains using ITSc sequences built by RAxML (on NCBI with accession numbers: PQ666794-PQ666862). We constructed phylogenies using RAxML version 8.2.12 with an outgroup of Synechococcus strain NC 006576 obtained from NCBI, and a GTRGAMMA evolutionary model with bootstrap analyses of 10,000 repetitions to search for the best-scoring maximum likelihood tree.

Variables

File: Microcystis_mapping.csv

Description:

This file contains the metadata for the phylogeny contained in RAxML_bipartitions.tree.txt. Together with RAxML_bipartitions.tree.txt it is used to make the phylogeny from Figure 1.

Variables

Column 1: Strain

The mapping variable that matches to the strains in the phylogeny.

Column 2: Tree_Groups

Strain type for annotating tree*.*

Column 3: dataset

Denotes whether strain is from a previous dataset or a strain from a new round of strain isolation for this paper.

File: Growth_data.csv

Description:

Metadata on collected strains and their growth rates across four weeks of the warming trials in which replicates of each strain were grown at four temperatures (20, 24, 28 and 32°C). Used for all analysis of growth metrics

Variables

Column 1: flask

Experimental flask ID.

Colum 2: lake

The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:

Lake	Abbreviation
Ashland Reservoir, Middlesex Co., MA, USA	AR19
Bruin Lake, Washtenaw Co., MI, USA	BU19
Crooked Lake, Washtenaw Co., MI, USA	CR19
Farm Pond, Middlesex Co.,	FA19
Ford Lake, Washtenaw Co., MI, USA	F19
Gull Lake, Barry / Kalamazoo Co., MI, USA	G19
Lake Champlain-	CH19
South Meadow Pond, Worcester Co.,	SM19
Whitmore Lake, Livingston County, MI, USA	WH19

Column 3: strain

Strain being cultured in the experimental flask.

Column 4: combo_temp

A combination of the temperature treatment (numeric) and the strain type identifier. Where the strain type is based on its type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter). The unit for temperature is degrees Celsius.

Column 5: combo

Strain type identifier - based on the strain type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).

Column 6: genotype

Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.

Column 7: Trophic

Trophic level of the lake which the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.

Column 8: temp

Temperature treatment in the warming experiment the flask was exposed to. The unit for temperature is degrees Celsius.

Column 9: Crash_Numeric

Week a clear decline in cell density occurred, if no clear crash, set to 5. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.

Column 10: Crash

Week in which a clear decline in cell density occurred if such a crash occurred within the 4 week time-period of growth. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.

Column 11: Phase

The week of the experiment that growth is being recorded for in the Growth column. Where Exp = Exponential phase = week 1, wk2 = week 2, wk3 = week 3 and wk4 = week 4.

Column 12: Growth

The growth rate. The unit for growth is ‘per day’.

Column 13: NeverPositive

y = yes, n = no. Answers the question did the culture never have positive growth.

Column 14: Keep

y = yes, n = no. Used to filter out the CR19-01 strain that did not fall under one of the three types we used to classify strains.

Column 15: Growth_Z_by_timepoint

Z-score of growth.

Column 16: NegativePositiveGrowth

Is the growth rate negative or positive for this strain at this temperature in this week of the experiment.

File: cq.data.csv

Description:

This file contains all the qPCR data for the study. This file is necessary for the analysis of gene expression. The data is in a .csv file format which can be read into R for all analysis. The raw data in columns 1 to 8 was obtained directly from the qPCR machine software. Columns 9 and 14 to 22 are metadata. Columns 10 to 13 contain the steps to calculate the relative gene expression metric 2-ΔΔCT

Variables

Column 1: Date

The date the qPCR was run.

Column 2: Well

Well in the 96 well plate that the qPCR was run in.

Column 3: Fluor

Fluorophore used in qPCR. Where SYBR is SYBR Green. See methods for details.

Column 4: Target

Target gene for qPCR i.e. primers used.

Column 5: Sample

Numeric Sample ID.

Column 6: Cq

Cq exported from the qPCR machine system software. Where Cq is the cycle threshold number.

Column 7: Cq Mean

The Cq mean of the four technical replicate wells for that sample with that specific primer. Exported from the qPCR machine system software.

Column 8: Cq Std. Dev

The standard deviation in Cq among the four technical replicate wells for that sample with that specific primer – also exported from the qPCR machine system software.

Column 9: Replicate

To differentiate between two separate qPCR runs of the same sample if more than one qPCR run was done for a sample. qPCR runs were repeated if the first run showed anomalous results such as, for example, errors in the positive or negative controls.

Column 10: Processed Cq Mean (NAs set to 40)

Copied over from column 7 but with NAs set to 40, the greatest number of cycles possible with our protocol.

Column 11: Delta CT

Calculate the difference between the Cq for the reference gene (rpoA) and the gene of interest.

Column 12: Delta Delta CT - 20 as control

Take the difference between the Delta CT of a gene at the reference condition (20C) and the Delta CT of the same gene in this row; i.e. for each row take the difference between the Delta CT of the same gene as in that row but at the reference condition and the Delta CT of the row.

Column 13: Final metric - 20 as control

Raise 2 to the power of -deltadeltaCT (column 12) to gain the final metric of gene expression. Where 1 = no change in expression relative to reference conditions, >1 represents an increase in gene expression relative to reference conditions and <1 represents a decrease in gene expression relative to reference conditions.

Column 14: flask

Corresponds with numeric sample ID. Indicates the flask the culture was grown in that the biomass for RNA extraction was taken from.

Column 15: lake

The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:

Lake	Abbreviation
Ashland Reservoir, Middlesex Co., MA, USA	AR19
Bruin Lake, Washtenaw Co., MI, USA	BU19
Crooked Lake, Washtenaw Co., MI, USA	CR19
Farm Pond, Middlesex Co.,	FA19
Ford Lake, Washtenaw Co., MI, USA	F19
Gull Lake, Barry / Kalamazoo Co., MI, USA	G19
Lake Champlain-	CH19
South Meadow Pond, Worcester Co.,	SM19
Whitmore Lake, Livingston County, MI, USA	WH19

Column 16: strain

The strain cultured and harvested for the biomass which was used for RNA extraction.

Column 17: combo_temp

Column 18: combo

Strain type identifier - based on the strain type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).

Column 19: genotype

Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.

Column 20: Trophic

Trophic level of the lake the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.

Column 21: temp

The temperature treatment within the experiment for this particular flask/ID. The unit for temperature is degrees Celsius.

Column 22: Unique

Unique ID that places all technical replicates for the same sample+target+treatment group into unique groups – this can be used to get an average for each group of technical replicates so that each sample+target+treatment combination will have a single row once processed (processing occurs in the RScript).

File: Microcystis growth and metadata.csv

Description:

Alternative version of Growth_data.csv used to join growth information to the gene expression information.

Variables

Column 1: flask

Experimental flask ID.

Colum 2: lake

The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:

Lake	Abbreviation
Ashland Reservoir, Middlesex Co., MA, USA	AR19
Bruin Lake, Washtenaw Co., MI, USA	BU19
Crooked Lake, Washtenaw Co., MI, USA	CR19
Farm Pond, Middlesex Co.,	FA19
Ford Lake, Washtenaw Co., MI, USA	F19
Gull Lake, Barry / Kalamazoo Co., MI, USA	G19
Lake Champlain-	CH19
South Meadow Pond, Worcester Co.,	SM19
Whitmore Lake, Livingston County, MI, USA	WH19

Column 3: strain

Strain being cultured in the experimental flask.

Column 4: combo_temp

Column 5: quality

NA – not used. Describes quality of ITSc sequence obtained from this strain.

Column 6: combo

Strain type identifier - based on the strain type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).

Column 7: genotype

Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.

Column 8: Trophic

Trophic level of the lake the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.

Column 9: temp

Temperature treatment for this flask in the warming experiment. The unit for temperature is degrees Celsius.

Column 10: Crash_Numeric

Week a clear decline in cell density occurred, if no clear crash, set to 5. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.

Column 11: Crash

Column 12: ExpPhase

Exponential phase i.e. week 1 growth rate. The unit for growth is ‘per day’.

Column 13: ExpPhase_Z

Z-scored exponential phase i.e. week 1 growth rate.

Column 14: wk2

Week 2 growth rate. The unit for growth is ‘per day’.

Column 15: wk2_Z

Z-scored week 2 growth rate.

Column 16: wk3

Week 3 growth rate. The unit for growth is ‘per day’.

Column 17: wk3_Z

Z-scored week 3 growth rate.

Column 18: wk4

Week 4 growth rate. The unit for growth is ‘per day’.

Column 19: wk4_Z

Z-scored week 4 growth rate.

File: tdat.csv

Description:

This file is a merged and processed version of cq.data.csv and Microcystis growth and metadata.csv. After merging these two files in the Rscript, the gene expression data is averaged and transformed using a boxcox transformation. Unnecessary columns are removed and a column for gene type is added. This object is exported and then re-imported for creation of later figures rather than re-importing and re-processing the two original files each time.

Variables

Column 1: flask

Experimental flask ID.

Column 2: Target

Target gene for qPCR i.e. primers used.

Column 3: lake

The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:

Lake	Abbreviation
Ashland Reservoir, Middlesex Co., MA, USA	AR19
Bruin Lake, Washtenaw Co., MI, USA	BU19
Crooked Lake, Washtenaw Co., MI, USA	CR19
Farm Pond, Middlesex Co.,	FA19
Ford Lake, Washtenaw Co., MI, USA	F19
Gull Lake, Barry / Kalamazoo Co., MI, USA	G19
Lake Champlain-	CH19
South Meadow Pond, Worcester Co.,	SM19
Whitmore Lake, Livingston County, MI, USA	WH19

Column 4: strain

The strain cultured and harvested for the biomass which was used for RNA extraction.

Column 5: temp

The temperature treatment within the experiment for this particular flask/ID. The unit for temperature is degrees Celsius.

Column 6: genotype

Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.

Column 7: Trophic

Trophic level of the lake the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.

Column 8: combo

The strain’s bacterial type. Where the strain type is based on its bacterial type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).

Column 10: mean.b20.CT.metric

The average (across replicates) relative gene expression for that strain at that temperature. Where relative gene expression is 2 to the power of -deltadeltaCT.

Column 11: Crash

Column 12: Crash_Numeric

Week a clear decline in cell density occurred, if no clear crash, set to 5. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.

Column 13: ExpPhase

Exponential phase i.e. week 1 growth rate. The unit for growth is ‘per day’.

Column 14: ExpPhase_Z

Z-score of exponential phase i.e. week 1 growth rate.

Column 15: wk2

Week 2 growth rate. The unit for growth is ‘per day’.

Column 16: wk2_Z

Z-score of week 2 growth rate.

Column 17: wk3

Week 3 growth rate. The unit for growth is ‘per day’.

Column 18: wk3_Z

Z-score of week 3 growth rate.

Column 19: wk4

Week 4 growth rate. The unit for growth is ‘per day’.

Column 20: wk4_Z

Z-score of week 4 growth rate.

Column 21: Type

Whether the gene is in the heat shock protein family or the toxin production pathway family.

Column 22: CT.metric

Boxcox transformed version of column 10 i.e. boxcox transformed relative gene expression. See RScript for details of Boxcox transformation.

File: RAxML_bipartitions.FigS1A.txt

Description:

This file was created with RAxML and contains the phylogeny of Microcystis strains built by RAxML. We constructed phylogenies using RAxML version 8.2.12 with an outgroup of Synechococcus strain NC 006576 obtained from NCBI, and a GTRGAMMA evolutionary model with bootstrap analyses of 10,000 repetitions to search for the best-scoring maximum likelihood tree.

Variables

File: Microcystis_mapping_full_tree.csv

Description:

This file contains the metadata for the phylogeny contained in RAxML_bipartitions.tree.FigS1A.txt. Together with RAxML_bipartitions.tree.FigS1A.txt it is used to make one of the two phylogenies from Figure S1.

Variables

Column 1: Strain

The mapping variable that matches to the strains in the phylogeny.

Column 2: Tree_Groups

Strain type for annotating tree.

File: RAxML_bipartitions.FigS1B.txt

Description:

This file was created with RAxML and contains the phylogeny of Microcystis strains using ITSc sequences built by RAxML. We constructed phylogenies using RAxML version 8.2.12 with an outgroup of Synechococcus strain NC 006576 obtained from NCBI, and a GTRGAMMA evolutionary model with bootstrap analyses of 10,000 repetitions to search for the best-scoring maximum likelihood tree.

Variables

File: Microcystis_mapping_for_ITSc.csv

Description:

This file contains the metadata for the phylogeny contained in RAxML_bipartitions.tree.FigS1B.txt. Together with RAxML_bipartitions.tree.FigS1B.txt it is used to make one of the two phylogenies from Figure S1.

Variables

Column 1: Strain

The mapping variable that matches to the strains in the phylogeny.

Column 2: Tree_Groups

Strain type for annotating tree.

File: Heat_Stress_pfam_list.csv

Description:

A file containing a list of heat stress protein family genes and their pfam IDs. Citation for heat stress protein family genes: https://onlinelibrary.wiley.com/doi/pdf/10.1002/mas.10031.

Variables

Column 1: Gene

Heat Shock Protein Gene or gene family

Column 2: PFAM_ID_corresponding_to_gene

PFAM ID for gene or gene family

File: R_file10r_PFAMs_by_strain

Description:

Data on the number of heat shock protein family genes in each strain of *Microcystis *previously collected and analysed in Jackrel et al 2019 (Genome evolution and host-microbiome shifts correspond with intraspecific niche divergence within harmful algal bloom-forming Microcystis aeruginosa). This data is in long format to allow easier manipulation in R, so strains will appear multiple times to accommodate all PFAMs present per strain. Note that each PFAM will be recorded for each strain a number of times = to the number of times a gene of that family was found in the strain i.e. if there are two rows for PF000004.28 for strain BK11-02 two genes for this PFAM where found in the BK11-02 genome.

Variables

Column 1: Strain

The mapping variable that matches to the strains in the phylogeny in Fig S1*.*

Column 2: PFAMs_present

PFAMs present in strain

Column 3: Family_else_PFAM

The family name if available, otherwise the PFAM.

This is the same as the previous column except that where possible the PFAM code has been replaced by a gene name. *

Column 4: Gene & Column 5: PFAM_ID_corresponding_to_gene

Mapping genes to PFAM IDs – this is a copy of the contents of Heat_Stress_pfam_list.csv. It does not correspond to the columns 1:3 in terms of rows, but rather acts as a key.

Data from: Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios

Data files

Abstract

README: README: Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios.

Description of the data and file structure

Files and variables

File: Microcystis_Thermal_Experiment_Supplementary_AllAnalysisCode.R

File: RAxML_bipartitions.tree.txt

Variables

File: Microcystis_mapping.csv

Variables

File: Growth_data.csv

Variables

File: cq.data.csv

Variables

File: Microcystis growth and metadata.csv

Variables

File: tdat.csv

Variables

File: RAxML_bipartitions.FigS1A.txt

Variables

File: Microcystis_mapping_full_tree.csv

Variables

File: RAxML_bipartitions.FigS1B.txt

Variables

File: Microcystis_mapping_for_ITSc.csv

Variables

File: Heat_Stress_pfam_list.csv

Variables

File: R_file10r_PFAMs_by_strain

Variables

Works referencing this dataset