Data from: Cyanobacterial blooms in subtropical riverine and estuarine ecosystems of South America
Data files
Jul 21, 2024 version files 7.85 MB
-
CyaRiskData.csv
7.78 MB
-
HistoricalCyaData.csv
19.11 KB
-
Metadata.xls
12.80 KB
-
README.md
14.48 KB
-
Variables.xls
26.62 KB
Abstract
Water quality impairment caused by toxic cyanobacterial blooms is a growing global concern adversely affecting the biodiversity and functioning of aquatic ecosystems, which can disrupt recreation and human health. Recent studies indicate that factors such as eutrophication, dam construction, and climate change are likely to increase the frequency and intensity of these blooms in aquatic ecosystems worldwide. This trend raises concerns in the subtropical South America (SA) region, where the pampas ecosystem has registered a sustained increase in the surface used by agroindustrial activities which leads to eutrophication of the Uruguay River (UR) and the Río de la Plata estuary (RdlP) ecosystems. The UR-RdlP system is crucial for recreational activities and serves as an essential water source. Historical monitoring data indicate that currently, toxic blooms are often documented in the UR and transported downstream to the RdlP (Kruk et al., 2017; Martínez de la Escalera et al., 2017).
In this context, it is imperative to develop comprehensive and coherent reviewed datasets to analyze the spatio-temporal dynamics of toxic cyanobacterial blooms effectively. Despite the availability of public information, its accessibility and suitability for analysis are not always guaranteed. Therefore, establishing and maintaining comprehensive long-term databases in ecosystems frequented for recreational purposes is crucial for studying the mechanisms associated with bloom formation and predicting human health risks. Here, we provide historical records (1963-2022) and indices of toxic cyanobacterial blooms at ca. 80 sites in the subtropical region along the Uruguay River (UR) and Río de la Plata (RdlP). The data compilation process involved gathering dispersed information from open sources, research projects, reports from multiple water quality monitoring programs, and collaborative efforts with research institutions in the country and the region. Data was checked for consistency and included geospatial data on cyanobacterial cell abundance, microcystin concentration, chlorophyll-a concentration, and risk levels from field samples combined with relevant environmental, land use, and climatic variables. This included in-situ measured environmental variables (e.g., water temperature, salinity, turbidity, conductivity) and regional climate and hydrology information (e.g., precipitation and flow rates), as well as land use patterns in the UR basin (e.g., crops, forestation, grasslands).
A fundamental contribution of this dataset lies in the consolidation and integration of variables reviewed from different sources, facilitating its utilization to evaluate the frequency and intensity of cyanobacterial blooms in a framework of productive intensification and climate change, to analyze the causes and effects of cyanobacterial blooms in riverine and estuarine recreational beaches and their relation with human health risks, to understand the historical dynamic of water quality experienced by users of these aquatic ecosystems, and to model and improve early warning and national monitoring systems, helping to mitigate potential public health risks. In short, various studies utilizing the provided dataset reveal the following trends: over the temporal analysis, there is a sustained increase in cyanobacteria abundance from 1960 to the present, particularly marked by an exponential growth around the year 2000 (Kruk et al., 2023). This shift is associated with changes in land use, notably the transition to industrial crops (Kruk et al., 2023). Cyanobacteria organisms and their bloom frequency of occurrence increased also in estuarine waters (Martinez de la Escalera et al., 2017; Kruk et al., 2017). Elevated salinity selects larger cyanobacterial organisms with high toxicity (Kruk et al., 2019). Cyanotoxin levels in UR and RdlP are significantly high, posing substantial public health risks, especially to vulnerable populations (Kruk et al., 2019).
We highlight the potential of this dataset to explore the interplay between environmental factors, anthropogenic changes, and cyanobacterial dynamics at recreational beaches over an extended historical period in which many relevant transitions were recorded that promoted the rise and intensification of harmful algal blooms. Its significance extends to aiding researchers and healthcare professionals in establishing specific conditions for beach water quality management.
Corresponding author: Carla Kruk (ckruk@yahoo.com)
We present a comprehensive historical dataset spanning ca. 60 years of toxic cyanobacterial blooms abundance and risk indices, which were linked to in-situ water characteristics and merge to regional environmental variables. This dataset encompasses a wide geographic range of beaches across key South American water ecosystems.
The dataset is organized and stored in four files, two .xls with and two .csv files:
File 1: Metadata.xls
File 2: Variables.xls
File 3: CyaRiskData.csv
File 4: HistoricalCyaData.csv
The first two files provide comprehensive information regarding the sources referenced for generating this dataset, including compiled variables. The last two files contain the outcomes of the obtained values for each variable, prepared for analysis. One dataset focuses on indices associated with exposure to toxic cyanobacteria blooms in recreational waters, while the other dataset examines temporal trends in cyanobacterial abundance in correlation with environmental changes, and land-use patterns.
Description of each file
File 1: Metadata.xls: provides a concise summary of information compiled during the data collection process. It includes consulted sources, characteristics of acquired variables, temporal span and frequency, and the respective sites (i.e., beaches) or ecosystems (i.e., UR or RdlP) from which the data originated. Web links are provided for primary sources with public accessibility.
File 2: Variables.xls: offers a comprehensive summary detailing compiled variables, featuring concise descriptions, variable types, column headers as variable names, associated units of measurement, analysis methodology, applied compilation filters, and the corresponding file location for each variable.
File 3: CyaRiskData.csv: provides a compilation of water quality monitoring data from public institutions and research projects, focusing on recreational beaches within subtropical South American ecosystems. This dataset encompasses variables related to cyanobacterial bloom occurrence derived from field samples, such as cyanobacterial cell numbers, cyanotoxins (i.e., microcystin-LR) and chlorophyll-a concentration, and derived risk indices. The dataset also includes meteorological, hydrological, and climatological data. It also encompasses spatial attributes such as geographical coordinates, system, zone, and region, along with temporal information like month, year, and season.
File 4: HistoricalCyaData.csv: contains historical annual records of cyanobacterial abundance dynamics from 1963 to 2019 in the UR. This dataset is accompanied by in-situ measurements of nutrient concentrations, temperature, precipitation, and river flow within the UR watershed, covering the period from 1901 to 2021. Additionally, it includes land use/cover data for the entire UR basin, as well as its Brazilian section, generated through the Mapbiomas initiatives.
Description of features/column header for File 1
dataSource
: The origin or organization providing the revised information.
codeSource
: A unique identifier assigned to each data source for reference.
URL
: A hyperlink directing to the source for further consultation.
accessSource
: Indicates the type of access to the data, whether it’s readily available or requires a request.
variable
: Names of the compiled variables within the dataset.
period
: The date range covered by the compiled data.
frequency
: The regularity or interval of data monitoring or compilation.
site
: The number of monitored sites, beaches, or geographic points from which the compiled information is sourced.
system
: Refers to the study area, specifically the subtropical South American ecosystems of the Uruguay River (UR) and the Rio de la Plata estuary (RdlP).
Description of features/column header for File 2
variableType
: Categorizes variables based on their relevance to geographical or temporal aspects, as well as whether they pertain to in-situ measurements or belong to hydrological, meteorological, climatic, or land use change domains.
variable
: Names assigned to the compiled variables within the dataset.
variableColumnHeader
: The variable name as it appears on the column header.
dataTypeUnit
: Specifies whether the variable is categorical or continuous, along with its unit of measurement.
description
: Provides a brief explanation of each variable.
codeSource
: The source code or unique identifier denoting the origin of the information for each variable.
filter
: Describes the filter applied to each information source to gather the relevant data.
method
: Details the methodology employed to estimate the value of each variable.
file
: Indicates the file number where each described variable is located within the dataset.
Description of features/column header for File 3
date
: The date of the data.
year
: The year of the data, ranging from 2008 to 2022.
month
: The month of the data, ranging from January to December.
season
: The season of the year categorized as summer (December to February), autumn (March to May), winter (June to August), and spring (September to November).
latitude
: The geographic latitude coordinates of the sampling site.
longitude
: The geographic longitude coordinates of the sampling site.
source
: The institution or project providing each data subset.
system
: Refers to the study area, either UR or RdlP.
region
: One of the nine regions into which the UR and RdlP ecosystems are divided.
zone
: One of the seven zones into which the UR and RdlP ecosystems are divided.
site
: The beach or sampling point.
siteCode
: The identification code of the beach or sampling point.
cyaTot
: The total abundance of cyanobacteria.
cyaMax
: The maximum abundance of cyanobacteria.
cyaMin
: The minimum abundance of cyanobacteria.
mcy
: Total microcystin-LR concentration.
chla
: Chlorophyll-a concentration.
riskRaw
: The original risk level as reported by the beach water quality monitoring program or research project.
riskCya
: The identified risk level associated with cyanobacterial cell concentration in recreational waters.
riskMcy
: The identified risk level associated with microcystin-LR concentration in recreational waters.
riskChla
: The identified risk level associated with chlorophyll-a concentration in recreational waters.
riskMax
: The maximum level of health risk detected due to recreational exposure to cyanobacteria.
riskMin
: The minimum level of health risk detected due to recreational exposure to cyanobacteria.
tempWater
: Surface water temperature.
salinity
: Surface water salinity.
conductivity
: Surface water conductivity.
turbidity
: Surface water turbidity.
SS105
: Suspended solids dried at 105°C and weighed.
SS550
: Suspended solids burnt at 550°C and weighed.
pH
: Surface water measure of acidity or alkalinity.
O2dis
: Surface water dissolved oxygen.
pheo
: Pheophytin concentration.
coliFec
: Fecal thermotolerant coliforms concentration.
sd
: Secchi disk depth.
PT
: Total phosphorus concentration.
NT
: Total nitrogen concentration.
TZ1
to TZ5
: Water temperature of submerged buoys along different zones of the UR: 1) includes thermometer Bella Union, 2) includes thermometer Federacion and Salto Grande del Uruguay, 3) includes thermometer Concordia and Puerto Yerua, 4) includes thermometer Paysandu, Concepcion, and Fray Bentos, and 5) includes thermometer La Concordia and Nueva Palmira.
tempMax_cru
, tempAve_cru
, tempMin_cru
, preciAve_cru
: Observed average annual maximum temperature, mean temperature, minimum temperature, and precipitation, respectively, from the Climate Change Knowledge Portal.
QAveDay
: Annual average of the mean daily flow, from the Argentinian National Water Information System.
tempMax_mz
, tempAve_mz
, tempMin_mz
, preciAcu1d_mz
, windDir_mz
, windVel_mz
: Maximum temperature, mean temperature, minimum temperature, accumulated precipitation - 1 day, wind direction, and wind velocity, respectively, from Meteomanz.
tempMax_inia
, tempAve_inia
, tempMin_inia
, preciAcu_inia
, preciAcu1d_inia
, preciAcu2d_inia
, preciAcu3d_inia
: Maximum temperature, mean temperature, minimum temperature, accumulated precipitation, accumulated precipitation - 1 day, accumulated precipitation - 2 days, and accumulated precipitation - 3 days, respectively, from the Uruguayan National Agricultural Research Institute.
Description of features/column header for File 4
year
: The year of the data, ranging from 1901 to 2021.
cyaMax
: Annual average of the maximum cyanobacterial abundance.
cyaAve
: Annual average of the mean cyanobacterial abundance.
NTMax
: Annual average of the maximum total nitrogen concentration.
NTAve
: Annual average of the mean total nitrogen concentration.
NTMin
: Annual average of the minimum total nitrogen concentration.
PTMax
: Annual average of the maximum total phosphorus concentration.
PTAve
: Annual average of the mean total phosphorus concentration.
PTMin
: Annual average of the minimum total phosphorus concentration.
tempMax
: Annual average of the maximum temperature.
tempAve
: Annual average of the mean temperature.
tempMin
: Annual average of the minimum temperature.
preciAve
: Annual average of the mean precipitation.
QAveMonthlyMax
: Annual average of the maximum mean monthly flow.
QAveMonthlyAcu
: Annual average of the accumulated mean monthly flow.
QAveMonthly
: Annual average of the mean monthly flow.
QMaxDay
: Annual average of the maximum mean daily flow.
QAveDay
: Annual average of the mean daily flow.
QMinDay
: Annual average of the minimum mean daily flow.
naturalForest
: Land use/cover category for the Uruguay river basin, natural forest.
forestation
: Land use/cover category for the Uruguay river basin, forest plantation.
grasslands
: Land use/cover category for the Uruguay river basin, grasslands and wetlands.
noVeg
: Land use/cover category for the Uruguay river basin, non-vegetated areas including bare soil and urban infrastructure.
annualCropsPasture
: Land use/cover category for the Uruguay river basin, farming including annual crops (e.g., maize, soybean, wheat) and sown pastures.
water
: Land use/cover category for the Uruguay river basin, water bodies and rivers.
BRnaturalForest
: Land use/cover category for the Brazilian portion of the Uruguay river basin, natural forest.
BRforestation
: Land use/cover category for the Brazilian portion of the Uruguay river basin, forest plantation.
BRgrasslands
: Land use/cover category for the Brazilian portion of the Uruguay river basin, grasslands and wetlands.
BRannualCropsPasture
: Land use/cover category for the Brazilian portion of the Uruguay river basin, farming including annual crops (e.g., maize, soybean, wheat) and sown pastures.
BRnoVeg
: Land use/cover category for the Brazilian portion of the Uruguay river basin, non-vegetated areas including bare soil and urban infrastructure.
BRwater
: Land use/cover category for the Brazilian portion of the Uruguay river basin, water bodies and rivers.
BRpasture
: Categorization of the ‘Brazilian annual crops and pasture’ class into various land use covers for the Brazilian portion of the Uruguay river basin, pastures.
BRsoybean
: Categorization of the ‘Brazilian annual crops and pasture’ class into various land use covers for the Brazilian portion of the Uruguay river basin, soybeans.
BRotherCrops
: Categorization of the ‘Brazilian annual crops and pasture’ class into various land use covers for the Brazilian portion of the Uruguay river basin, other summer crops and pastures.
BRCropsPasture
: Categorization of the ‘Brazilian annual crops and pasture’ class into various land use covers for the Brazilian portion of the Uruguay river basin, mosaic of pastures and annual crops.
Any Empty Value in the Data
Any cells containing ‘NA’ or left empty indicate missing values or signify that the column is not applicable to the respective site and date.
Sharing/Access information
This is the only publicly accessible locations of the dataset.
The data was sourced from the following: please refer to File 1 Metadata.xls.
Code/Software
Risk level estimation for cyanobacterial blooms at recreational beaches
GREEN = LOW RISK, YELLOW = MODERATE RISK, RED = HIGH RISK. Low risk means cyanobacteria levels are below 5000 cells L-1 or microcystin or chlorophyll concentrations below 2 and 10 µg L-1 respectively; high risk indicates cyanobacteria levels above 50,000 cells L-1, or microcystin exceeding 10 µg L-1 or chlorophyll above 50 µg L-1; moderate risk falls between these thresholds.
library(dplyr)
df <- read.csv(“RiskCyaData.csv”, header = T, sep = “,”, dec = “.”)
Define a function to assign risk levels
assign_risk <- function(value, thresholds) {
factor(case_when(
value < thresholds[1] ~ “green”,
value >= thresholds[1] & value <= thresholds[2] ~ “yellow”,
value > thresholds[2] ~ “red”)) }
Apply the function to each variable and create new columns
CyaTot: total cyanobacterial abundance (cell mL-1)
Chla: chlorophyll-a concentration (µg L-1)
McyTot: total microcystin concentration (µg L-1)
df <- df %>%
mutate(riskCya = assign_risk(cyaTot, c(5000, 50000)),
riskMcy = assign_risk(mcy, c(2, 10)),
riskChla = assign_risk(chla, c(10, 50)))
Arrange the original risk variable of each subset and map it to the corresponding color code
df$riskRaw <- ifelse(df$riskRaw == “foam”, “red”,
ifelse(df$riskRaw == “colony”, “yellow”,
ifelse(df$riskRaw == “no”, “green”, df$riskRaw)))\
df$riskRaw <- as.factor(df$riskRaw)
Subsequently, the maximum alert level will be determined by the highest level detected in each observation of the risk variables (i.e., riskCya, riskMcy, riskChla, riskRaw), whereas the minimum alert level will correspond to the lowest level detected.
dfrisk <- select(df, c(“riskRaw”, “riskCya”, “riskChla”, “riskMcy”))
dfrisk <- apply(dfrisk, 2, function(col) { ifelse(col == “green”, 1, ifelse(col == “yellow”, 2, 3))})
dfrisk <-as.data.frame(dfrisk)
dfrisk <- cbind(dfrisk, riskMaxNum = apply(dfrisk, 1, function(x) ifelse(all(is.na(x)), NA, max(x, na.rm=T))))
riskMax <- as.factor(dfrisk$riskMaxNum)
levels(riskMax) <- c(“green”,”yellow”,”red”)
df$riskMax <- riskMax