Data from: Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios
Data files
Dec 23, 2024 version files 11.67 MB
-
cq.data.csv
528.42 KB
-
Growth_data.csv
28.75 KB
-
Heat_Stress_pfam_list.csv
704 B
-
Microcystis_growth_and_metadata.csv
8.50 KB
-
Microcystis_mapping_for_ITSc.csv
1.46 KB
-
Microcystis_mapping_full_tree.csv
1.82 KB
-
Microcystis_mapping.csv
2.49 KB
-
Microcystis_Thermal_Experiment_Supplementary_AllAnalysisCode.R
190.10 KB
-
R_file10r_PFAMs_by_strain.csv
10.79 MB
-
RAxML_bipartitions.FigS1A.txt
2.66 KB
-
RAxML_bipartitions.FigS1B.txt
2.64 KB
-
RAxML_bipartitions.tree.txt
3.97 KB
-
README.md
22.21 KB
-
tdat.csv
90.07 KB
Abstract
Intraspecific biodiversity can have ecosystem-level consequences and may affect the accuracy of ecological forecasting. For example, rare genetic variants may have traits that prove beneficial under future environmental conditions. The cyanobacterium responsible for most freshwater harmful algal blooms worldwide, Microcystis aeruginosa, occurs in at least three types. While the dominant type occurs in eutrophic environments and is adapted to thrive in nutrient rich conditions, two additional types have recently been discovered that inhabit oligotrophic and eutrophic environments and have genomic adaptations for survival under nutrient limitation. Here, we show that these oligotrophic types are widespread throughout the Eastern USA. By pairing an experimental warming study with gene expression analyses, we found that the eutrophic type was most susceptible to climate warming. In comparison, oligotrophic types maintained their growth better and persisted longer under warming. As a mechanistic explanation for these patterns, we found that oligotrophic types responded to warming by widespread elevated expression of heat shock protein genes. Reduction of nutrient loading has been a historically effective mitigation strategy for controlling harmful algal blooms. Our results suggest that climate warming may benefit oligotrophic types of M. aeruginosa, potentially reducing the effectiveness of such mitigation efforts. In depth study of intraspecific variation may therefore improve forecasting for understanding future whole ecosystem dynamics.
README: README: Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios.
Description of the data and file structure
This repository is associated with Kuijpers, M. C. M.; Quigley, C. V.; Bray, N. C.; Ding, W.; White, J. D.; & Jackrel, S. L. (2025). Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios. PRSB. In Press.
In this study, we first show that previously discovered oligotrophic types of Microcystis aeruginosa are widespread throughout the Eastern USA. By pairing an experimental warming study with gene expression analyses, we then show that the common eutrophic type may be most susceptible to climate warming. In comparison, we find that oligotrophic types maintained their growth better and persisted longer under warming. As a potential mechanistic explanation for these patterns, we then show that oligotrophic types responded to warming by widespread elevated expression of heat shock protein genes. Reduction of nutrient loading has been a historically effective mitigation strategy for controlling harmful algal blooms. Our results suggest that climate warming may benefit oligotrophic types of M. aeruginosa, potentially reducing the effectiveness of such mitigation efforts.
This repository contains the scripts and raw data necessary to perform all analyses in the associated paper and to produce all figures with the exception of the base maps which were produced with opensource data with QGIS and the raw sequence data which is available on NCBI with accession numbers: PQ666794-PQ666862. The analysis script was written in R and run in RStudio. All packages used are open source and free to access.
Files are listed in this document in order of use. Any files in this repository not listed below are unused formats of the data. For example, “All data with comments.xlsx” is a file with all data on separate sheets for easier upload but is not actually used in the analysis and RScript.
Files and variables
File: Microcystis_Thermal_Experiment_Supplementary_AllAnalysisCode.R
Description:
This RScript contains all the code to analyse our experimental data and create the figures in the paper and supplement with the exception of maps and phylogenies. Maps figures were made with QGIS. Phylogenies which were first built with RAxML (see supplemental information for Intraspecific divergence within Microcystis aeruginosa mediates the dynamics of freshwater harmful algal blooms under climate warming scenarios) and then imported into this RScript for plotting.
File: RAxML_bipartitions.tree.txt
Description:
This file was created with RAxML and contains the phylogeny of Microcystis strains using ITSc sequences built by RAxML (on NCBI with accession numbers: PQ666794-PQ666862). We constructed phylogenies using RAxML version 8.2.12 with an outgroup of Synechococcus strain NC 006576 obtained from NCBI, and a GTRGAMMA evolutionary model with bootstrap analyses of 10,000 repetitions to search for the best-scoring maximum likelihood tree.
Variables
NA
File: Microcystis_mapping.csv
Description:
This file contains the metadata for the phylogeny contained in RAxML_bipartitions.tree.txt. Together with RAxML_bipartitions.tree.txt it is used to make the phylogeny from Figure 1.
Variables
Column 1: Strain
The mapping variable that matches to the strains in the phylogeny.
Column 2: Tree_Groups
Strain type for annotating tree*.*
Column 3: dataset
Denotes whether strain is from a previous dataset or a strain from a new round of strain isolation for this paper.
File: Growth_data.csv
Description:
Metadata on collected strains and their growth rates across four weeks of the warming trials in which replicates of each strain were grown at four temperatures (20, 24, 28 and 32°C). Used for all analysis of growth metrics
Variables
Column 1: flask
Experimental flask ID.
Colum 2: lake
The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:
Lake | Abbreviation |
---|---|
Ashland Reservoir, Middlesex Co., MA, USA | AR19 |
Bruin Lake, Washtenaw Co., MI, USA | BU19 |
Crooked Lake, Washtenaw Co., MI, USA | CR19 |
Farm Pond, Middlesex Co., | FA19 |
Ford Lake, Washtenaw Co., MI, USA | F19 |
Gull Lake, Barry / Kalamazoo Co., MI, USA | G19 |
Lake Champlain- | CH19 |
South Meadow Pond, Worcester Co., | SM19 |
Whitmore Lake, Livingston County, MI, USA | WH19 |
Column 3: strain
Strain being cultured in the experimental flask.
Column 4: combo_temp
A combination of the temperature treatment (numeric) and the strain type identifier. Where the strain type is based on its type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter). The unit for temperature is degrees Celsius.
Column 5: combo
Strain type identifier - based on the strain type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).
Column 6: genotype
Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.
Column 7: Trophic
Trophic level of the lake which the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.
Column 8: temp
Temperature treatment in the warming experiment the flask was exposed to. The unit for temperature is degrees Celsius.
Column 9: Crash_Numeric
Week a clear decline in cell density occurred, if no clear crash, set to 5. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.
Column 10: Crash
Week in which a clear decline in cell density occurred if such a crash occurred within the 4 week time-period of growth. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.
Column 11: Phase
The week of the experiment that growth is being recorded for in the Growth column. Where Exp = Exponential phase = week 1, wk2 = week 2, wk3 = week 3 and wk4 = week 4.
Column 12: Growth
The growth rate. The unit for growth is ‘per day’.
Column 13: NeverPositive
y = yes, n = no. Answers the question did the culture never have positive growth.
Column 14: Keep
y = yes, n = no. Used to filter out the CR19-01 strain that did not fall under one of the three types we used to classify strains.
Column 15: Growth_Z_by_timepoint
Z-score of growth.
Column 16: NegativePositiveGrowth
Is the growth rate negative or positive for this strain at this temperature in this week of the experiment.
File: cq.data.csv
Description:
This file contains all the qPCR data for the study. This file is necessary for the analysis of gene expression. The data is in a .csv file format which can be read into R for all analysis. The raw data in columns 1 to 8 was obtained directly from the qPCR machine software. Columns 9 and 14 to 22 are metadata. Columns 10 to 13 contain the steps to calculate the relative gene expression metric 2-ΔΔCT
Variables
Column 1: Date
The date the qPCR was run.
Column 2: Well
Well in the 96 well plate that the qPCR was run in.
Column 3: Fluor
Fluorophore used in qPCR. Where SYBR is SYBR Green. See methods for details.
Column 4: Target
Target gene for qPCR i.e. primers used.
Column 5: Sample
Numeric Sample ID.
Column 6: Cq
Cq exported from the qPCR machine system software. Where Cq is the cycle threshold number.
Column 7: Cq Mean
The Cq mean of the four technical replicate wells for that sample with that specific primer. Exported from the qPCR machine system software.
Column 8: Cq Std. Dev
The standard deviation in Cq among the four technical replicate wells for that sample with that specific primer – also exported from the qPCR machine system software.
Column 9: Replicate
To differentiate between two separate qPCR runs of the same sample if more than one qPCR run was done for a sample. qPCR runs were repeated if the first run showed anomalous results such as, for example, errors in the positive or negative controls.
Column 10: Processed Cq Mean (NAs set to 40)
Copied over from column 7 but with NAs set to 40, the greatest number of cycles possible with our protocol.
Column 11: Delta CT
Calculate the difference between the Cq for the reference gene (rpoA) and the gene of interest.
Column 12: Delta Delta CT - 20 as control
Take the difference between the Delta CT of a gene at the reference condition (20C) and the Delta CT of the same gene in this row; i.e. for each row take the difference between the Delta CT of the same gene as in that row but at the reference condition and the Delta CT of the row.
Column 13: Final metric - 20 as control
Raise 2 to the power of -deltadeltaCT (column 12) to gain the final metric of gene expression. Where 1 = no change in expression relative to reference conditions, >1 represents an increase in gene expression relative to reference conditions and <1 represents a decrease in gene expression relative to reference conditions.
Column 14: flask
Corresponds with numeric sample ID. Indicates the flask the culture was grown in that the biomass for RNA extraction was taken from.
Column 15: lake
The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:
Lake | Abbreviation |
---|---|
Ashland Reservoir, Middlesex Co., MA, USA | AR19 |
Bruin Lake, Washtenaw Co., MI, USA | BU19 |
Crooked Lake, Washtenaw Co., MI, USA | CR19 |
Farm Pond, Middlesex Co., | FA19 |
Ford Lake, Washtenaw Co., MI, USA | F19 |
Gull Lake, Barry / Kalamazoo Co., MI, USA | G19 |
Lake Champlain- | CH19 |
South Meadow Pond, Worcester Co., | SM19 |
Whitmore Lake, Livingston County, MI, USA | WH19 |
Column 16: strain
The strain cultured and harvested for the biomass which was used for RNA extraction.
Column 17: combo_temp
A combination of the temperature treatment (numeric) and the strain type identifier. Where the strain type is based on its type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter). The unit for temperature is degrees Celsius.
Column 18: combo
Strain type identifier - based on the strain type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).
Column 19: genotype
Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.
Column 20: Trophic
Trophic level of the lake the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.
Column 21: temp
The temperature treatment within the experiment for this particular flask/ID. The unit for temperature is degrees Celsius.
Column 22: Unique
Unique ID that places all technical replicates for the same sample+target+treatment group into unique groups – this can be used to get an average for each group of technical replicates so that each sample+target+treatment combination will have a single row once processed (processing occurs in the RScript).
File: Microcystis growth and metadata.csv
Description:
Alternative version of Growth_data.csv used to join growth information to the gene expression information.
Variables
Column 1: flask
Experimental flask ID.
Colum 2: lake
The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:
Lake | Abbreviation |
---|---|
Ashland Reservoir, Middlesex Co., MA, USA | AR19 |
Bruin Lake, Washtenaw Co., MI, USA | BU19 |
Crooked Lake, Washtenaw Co., MI, USA | CR19 |
Farm Pond, Middlesex Co., | FA19 |
Ford Lake, Washtenaw Co., MI, USA | F19 |
Gull Lake, Barry / Kalamazoo Co., MI, USA | G19 |
Lake Champlain- | CH19 |
South Meadow Pond, Worcester Co., | SM19 |
Whitmore Lake, Livingston County, MI, USA | WH19 |
Column 3: strain
Strain being cultured in the experimental flask.
Column 4: combo_temp
A combination of the temperature treatment (numeric) and the strain type identifier. Where the strain type is based on its type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter). The unit for temperature is degrees Celsius.
Column 5: quality
NA – not used. Describes quality of ITSc sequence obtained from this strain.
Column 6: combo
Strain type identifier - based on the strain type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).
Column 7: genotype
Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.
Column 8: Trophic
Trophic level of the lake the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.
Column 9: temp
Temperature treatment for this flask in the warming experiment. The unit for temperature is degrees Celsius.
Column 10: Crash_Numeric
Week a clear decline in cell density occurred, if no clear crash, set to 5. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.
Column 11: Crash
Week in which a clear decline in cell density occurred if such a crash occurred within the 4 week time-period of growth. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.
Column 12: ExpPhase
Exponential phase i.e. week 1 growth rate. The unit for growth is ‘per day’.
Column 13: ExpPhase_Z
Z-scored exponential phase i.e. week 1 growth rate.
Column 14: wk2
Week 2 growth rate. The unit for growth is ‘per day’.
Column 15: wk2_Z
Z-scored week 2 growth rate.
Column 16: wk3
Week 3 growth rate. The unit for growth is ‘per day’.
Column 17: wk3_Z
Z-scored week 3 growth rate.
Column 18: wk4
Week 4 growth rate. The unit for growth is ‘per day’.
Column 19: wk4_Z
Z-scored week 4 growth rate.
File: tdat.csv
Description:
This file is a merged and processed version of cq.data.csv and Microcystis growth and metadata.csv. After merging these two files in the Rscript, the gene expression data is averaged and transformed using a boxcox transformation. Unnecessary columns are removed and a column for gene type is added. This object is exported and then re-imported for creation of later figures rather than re-importing and re-processing the two original files each time.
Variables
Column 1: flask
Experimental flask ID.
Column 2: Target
Target gene for qPCR i.e. primers used.
Column 3: lake
The lake the cultured strain was isolated from. See Table S1. for details of lakes. Lake abbreviations as follows:
Lake | Abbreviation |
---|---|
Ashland Reservoir, Middlesex Co., MA, USA | AR19 |
Bruin Lake, Washtenaw Co., MI, USA | BU19 |
Crooked Lake, Washtenaw Co., MI, USA | CR19 |
Farm Pond, Middlesex Co., | FA19 |
Ford Lake, Washtenaw Co., MI, USA | F19 |
Gull Lake, Barry / Kalamazoo Co., MI, USA | G19 |
Lake Champlain- | CH19 |
South Meadow Pond, Worcester Co., | SM19 |
Whitmore Lake, Livingston County, MI, USA | WH19 |
Column 4: strain
The strain cultured and harvested for the biomass which was used for RNA extraction.
Column 5: temp
The temperature treatment within the experiment for this particular flask/ID. The unit for temperature is degrees Celsius.
Column 6: genotype
Genotype of the strain being cultured in the flask. Where LG = oligotrophic genotype, and HG = eutrophic genotype.
Column 7: Trophic
Trophic level of the lake the strain was isolated from. Where HN = eutrophic lake (see methods for categorization of lakes) and LN = eutrophic lake.
Column 8: combo
The strain’s bacterial type. Where the strain type is based on its bacterial type (3rd and 4th letter) and the trophic level of the lake it was isolated from (1st and 2nd letter).
Column 10: mean.b20.CT.metric
The average (across replicates) relative gene expression for that strain at that temperature. Where relative gene expression is 2 to the power of -deltadeltaCT.
Column 11: Crash
Week in which a clear decline in cell density occurred if such a crash occurred within the 4 week time-period of growth. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.
Column 12: Crash_Numeric
Week a clear decline in cell density occurred, if no clear crash, set to 5. Here clear decline in cell density i.e. crash is quantitatively described as gaining a negative growth rate in that week.
Column 13: ExpPhase
Exponential phase i.e. week 1 growth rate. The unit for growth is ‘per day’.
Column 14: ExpPhase_Z
Z-score of exponential phase i.e. week 1 growth rate.
Column 15: wk2
Week 2 growth rate. The unit for growth is ‘per day’.
Column 16: wk2_Z
Z-score of week 2 growth rate.
Column 17: wk3
Week 3 growth rate. The unit for growth is ‘per day’.
Column 18: wk3_Z
Z-score of week 3 growth rate.
Column 19: wk4
Week 4 growth rate. The unit for growth is ‘per day’.
Column 20: wk4_Z
Z-score of week 4 growth rate.
Column 21: Type
Whether the gene is in the heat shock protein family or the toxin production pathway family.
Column 22: CT.metric
Boxcox transformed version of column 10 i.e. boxcox transformed relative gene expression. See RScript for details of Boxcox transformation.
File: RAxML_bipartitions.FigS1A.txt
Description:
This file was created with RAxML and contains the phylogeny of Microcystis strains built by RAxML. We constructed phylogenies using RAxML version 8.2.12 with an outgroup of Synechococcus strain NC 006576 obtained from NCBI, and a GTRGAMMA evolutionary model with bootstrap analyses of 10,000 repetitions to search for the best-scoring maximum likelihood tree.
Variables
NA
File: Microcystis_mapping_full_tree.csv
Description:
This file contains the metadata for the phylogeny contained in RAxML_bipartitions.tree.FigS1A.txt. Together with RAxML_bipartitions.tree.FigS1A.txt it is used to make one of the two phylogenies from Figure S1.
Variables
Column 1: Strain
The mapping variable that matches to the strains in the phylogeny.
Column 2: Tree_Groups
Strain type for annotating tree.
File: RAxML_bipartitions.FigS1B.txt
Description:
This file was created with RAxML and contains the phylogeny of Microcystis strains using ITSc sequences built by RAxML. We constructed phylogenies using RAxML version 8.2.12 with an outgroup of Synechococcus strain NC 006576 obtained from NCBI, and a GTRGAMMA evolutionary model with bootstrap analyses of 10,000 repetitions to search for the best-scoring maximum likelihood tree.
Variables
NA
File: Microcystis_mapping_for_ITSc.csv
Description:
This file contains the metadata for the phylogeny contained in RAxML_bipartitions.tree.FigS1B.txt. Together with RAxML_bipartitions.tree.FigS1B.txt it is used to make one of the two phylogenies from Figure S1.
Variables
Column 1: Strain
The mapping variable that matches to the strains in the phylogeny.
Column 2: Tree_Groups
Strain type for annotating tree.
File: Heat_Stress_pfam_list.csv
Description:
A file containing a list of heat stress protein family genes and their pfam IDs. Citation for heat stress protein family genes: https://onlinelibrary.wiley.com/doi/pdf/10.1002/mas.10031.
Variables
Column 1: Gene
Heat Shock Protein Gene or gene family
Column 2: PFAM_ID_corresponding_to_gene
PFAM ID for gene or gene family
File: R_file10r_PFAMs_by_strain
Description:
Data on the number of heat shock protein family genes in each strain of *Microcystis *previously collected and analysed in Jackrel et al 2019 (Genome evolution and host-microbiome shifts correspond with intraspecific niche divergence within harmful algal bloom-forming Microcystis aeruginosa). This data is in long format to allow easier manipulation in R, so strains will appear multiple times to accommodate all PFAMs present per strain. Note that each PFAM will be recorded for each strain a number of times = to the number of times a gene of that family was found in the strain i.e. if there are two rows for PF000004.28 for strain BK11-02 two genes for this PFAM where found in the BK11-02 genome.
Variables
Column 1: Strain
The mapping variable that matches to the strains in the phylogeny in Fig S1*.*
Column 2: PFAMs_present
PFAMs present in strain
Column 3: Family_else_PFAM
The family name if available, otherwise the PFAM.
This is the same as the previous column except that where possible the PFAM code has been replaced by a gene name. *
Column 4: Gene & Column 5: PFAM_ID_corresponding_to_gene
Mapping genes to PFAM IDs – this is a copy of the contents of Heat_Stress_pfam_list.csv. It does not correspond to the columns 1:3 in terms of rows, but rather acts as a key.