Data from: Cyanobacterial blooms in subtropical riverine and estuarine ecosystems of South America

Sampognaro, Lia 1 ; Segura, Angel M.1 ; Piccini, Claudia2 ; Kruk, Carla 1

Published Jul 21, 2024 on Dryad. https://doi.org/10.5061/dryad.9w0vt4bpz

Data files

Jul 21, 2024 version files 7.85 MB

CyaRiskData.csv

7.78 MB
HistoricalCyaData.csv

19.11 KB
Metadata.xls

12.80 KB
README.md

14.48 KB
Variables.xls

26.62 KB

Abstract

Water quality impairment caused by toxic cyanobacterial blooms is a growing global concern adversely affecting the biodiversity and functioning of aquatic ecosystems, which can disrupt recreation and human health. Recent studies indicate that factors such as eutrophication, dam construction, and climate change are likely to increase the frequency and intensity of these blooms in aquatic ecosystems worldwide. This trend raises concerns in the subtropical South America (SA) region, where the pampas ecosystem has registered a sustained increase in the surface used by agroindustrial activities which leads to eutrophication of the Uruguay River (UR) and the Río de la Plata estuary (RdlP) ecosystems. The UR-RdlP system is crucial for recreational activities and serves as an essential water source. Historical monitoring data indicate that currently, toxic blooms are often documented in the UR and transported downstream to the RdlP (Kruk et al., 2017; Martínez de la Escalera et al., 2017).

In this context, it is imperative to develop comprehensive and coherent reviewed datasets to analyze the spatio-temporal dynamics of toxic cyanobacterial blooms effectively. Despite the availability of public information, its accessibility and suitability for analysis are not always guaranteed. Therefore, establishing and maintaining comprehensive long-term databases in ecosystems frequented for recreational purposes is crucial for studying the mechanisms associated with bloom formation and predicting human health risks. Here, we provide historical records (1963-2022) and indices of toxic cyanobacterial blooms at ca. 80 sites in the subtropical region along the Uruguay River (UR) and Río de la Plata (RdlP). The data compilation process involved gathering dispersed information from open sources, research projects, reports from multiple water quality monitoring programs, and collaborative efforts with research institutions in the country and the region. Data was checked for consistency and included geospatial data on cyanobacterial cell abundance, microcystin concentration, chlorophyll-a concentration, and risk levels from field samples combined with relevant environmental, land use, and climatic variables. This included in-situ measured environmental variables (e.g., water temperature, salinity, turbidity, conductivity) and regional climate and hydrology information (e.g., precipitation and flow rates), as well as land use patterns in the UR basin (e.g., crops, forestation, grasslands).

A fundamental contribution of this dataset lies in the consolidation and integration of variables reviewed from different sources, facilitating its utilization to evaluate the frequency and intensity of cyanobacterial blooms in a framework of productive intensification and climate change, to analyze the causes and effects of cyanobacterial blooms in riverine and estuarine recreational beaches and their relation with human health risks, to understand the historical dynamic of water quality experienced by users of these aquatic ecosystems, and to model and improve early warning and national monitoring systems, helping to mitigate potential public health risks. In short, various studies utilizing the provided dataset reveal the following trends: over the temporal analysis, there is a sustained increase in cyanobacteria abundance from 1960 to the present, particularly marked by an exponential growth around the year 2000 (Kruk et al., 2023). This shift is associated with changes in land use, notably the transition to industrial crops (Kruk et al., 2023). Cyanobacteria organisms and their bloom frequency of occurrence increased also in estuarine waters (Martinez de la Escalera et al., 2017; Kruk et al., 2017). Elevated salinity selects larger cyanobacterial organisms with high toxicity (Kruk et al., 2019). Cyanotoxin levels in UR and RdlP are significantly high, posing substantial public health risks, especially to vulnerable populations (Kruk et al., 2019).

We highlight the potential of this dataset to explore the interplay between environmental factors, anthropogenic changes, and cyanobacterial dynamics at recreational beaches over an extended historical period in which many relevant transitions were recorded that promoted the rise and intensification of harmful algal blooms. Its significance extends to aiding researchers and healthcare professionals in establishing specific conditions for beach water quality management.

Corresponding author: Carla Kruk (ckruk@yahoo.com)

We present a comprehensive historical dataset spanning ca. 60 years of toxic cyanobacterial blooms abundance and risk indices, which were linked to in-situ water characteristics and merge to regional environmental variables. This dataset encompasses a wide geographic range of beaches across key South American water ecosystems.

The dataset is organized and stored in four files, two .xls with and two .csv files:
File 1: Metadata.xls
File 2: Variables.xls
File 3: CyaRiskData.csv
File 4: HistoricalCyaData.csv

The first two files provide comprehensive information regarding the sources referenced for generating this dataset, including compiled variables. The last two files contain the outcomes of the obtained values for each variable, prepared for analysis. One dataset focuses on indices associated with exposure to toxic cyanobacteria blooms in recreational waters, while the other dataset examines temporal trends in cyanobacterial abundance in correlation with environmental changes, and land-use patterns.

Description of each file

File 1: Metadata.xls: provides a concise summary of information compiled during the data collection process. It includes consulted sources, characteristics of acquired variables, temporal span and frequency, and the respective sites (i.e., beaches) or ecosystems (i.e., UR or RdlP) from which the data originated. Web links are provided for primary sources with public accessibility.

File 2: Variables.xls: offers a comprehensive summary detailing compiled variables, featuring concise descriptions, variable types, column headers as variable names, associated units of measurement, analysis methodology, applied compilation filters, and the corresponding file location for each variable.

File 3: CyaRiskData.csv: provides a compilation of water quality monitoring data from public institutions and research projects, focusing on recreational beaches within subtropical South American ecosystems. This dataset encompasses variables related to cyanobacterial bloom occurrence derived from field samples, such as cyanobacterial cell numbers, cyanotoxins (i.e., microcystin-LR) and chlorophyll-a concentration, and derived risk indices. The dataset also includes meteorological, hydrological, and climatological data. It also encompasses spatial attributes such as geographical coordinates, system, zone, and region, along with temporal information like month, year, and season.

File 4: HistoricalCyaData.csv: contains historical annual records of cyanobacterial abundance dynamics from 1963 to 2019 in the UR. This dataset is accompanied by in-situ measurements of nutrient concentrations, temperature, precipitation, and river flow within the UR watershed, covering the period from 1901 to 2021. Additionally, it includes land use/cover data for the entire UR basin, as well as its Brazilian section, generated through the Mapbiomas initiatives.

Description of features/column header for File 1
dataSource: The origin or organization providing the revised information.
codeSource: A unique identifier assigned to each data source for reference.
URL: A hyperlink directing to the source for further consultation.
accessSource: Indicates the type of access to the data, whether it's readily available or requires a request.
variable: Names of the compiled variables within the dataset.
period: The date range covered by the compiled data.
frequency: The regularity or interval of data monitoring or compilation.
site: The number of monitored sites, beaches, or geographic points from which the compiled information is sourced.
system: Refers to the study area, specifically the subtropical South American ecosystems of the Uruguay River (UR) and the Rio de la Plata estuary (RdlP).

Description of features/column header for File 2
variableType: Categorizes variables based on their relevance to geographical or temporal aspects, as well as whether they pertain to in-situ measurements or belong to hydrological, meteorological, climatic, or land use change domains.
variable: Names assigned to the compiled variables within the dataset.
variableColumnHeader: The variable name as it appears on the column header.
dataTypeUnit: Specifies whether the variable is categorical or continuous, along with its unit of measurement.
description: Provides a brief explanation of each variable.
codeSource: The source code or unique identifier denoting the origin of the information for each variable.
filter: Describes the filter applied to each information source to gather the relevant data.
method: Details the methodology employed to estimate the value of each variable.
file: Indicates the file number where each described variable is located within the dataset.

Description of features/column header for File 3
date: The date of the data.
year: The year of the data, ranging from 2008 to 2022.
month: The month of the data, ranging from January to December.
season: The season of the year categorized as summer (December to February), autumn (March to May), winter (June to August), and spring (September to November).
latitude: The geographic latitude coordinates of the sampling site.
longitude: The geographic longitude coordinates of the sampling site.
source: The institution or project providing each data subset.
system: Refers to the study area, either UR or RdlP.
region: One of the nine regions into which the UR and RdlP ecosystems are divided.
zone: One of the seven zones into which the UR and RdlP ecosystems are divided.
site: The beach or sampling point.
siteCode: The identification code of the beach or sampling point.
cyaTot: The total abundance of cyanobacteria.
cyaMax: The maximum abundance of cyanobacteria.
cyaMin: The minimum abundance of cyanobacteria.
mcy: Total microcystin-LR concentration.
chla: Chlorophyll-a concentration.
riskRaw: The original risk level as reported by the beach water quality monitoring program or research project.
riskCya: The identified risk level associated with cyanobacterial cell concentration in recreational waters.
riskMcy: The identified risk level associated with microcystin-LR concentration in recreational waters.
riskChla: The identified risk level associated with chlorophyll-a concentration in recreational waters.
riskMax: The maximum level of health risk detected due to recreational exposure to cyanobacteria.
riskMin: The minimum level of health risk detected due to recreational exposure to cyanobacteria.
tempWater: Surface water temperature.
salinity: Surface water salinity.
conductivity: Surface water conductivity.
turbidity: Surface water turbidity.
SS105: Suspended solids dried at 105°C and weighed.
SS550: Suspended solids burnt at 550°C and weighed.
pH: Surface water measure of acidity or alkalinity.
O2dis: Surface water dissolved oxygen.
pheo: Pheophytin concentration.
coliFec: Fecal thermotolerant coliforms concentration.
sd: Secchi disk depth.
PT: Total phosphorus concentration.
NT: Total nitrogen concentration.
TZ1 to TZ5: Water temperature of submerged buoys along different zones of the UR: 1) includes thermometer Bella Union, 2) includes thermometer Federacion and Salto Grande del Uruguay, 3) includes thermometer Concordia and Puerto Yerua, 4) includes thermometer Paysandu, Concepcion, and Fray Bentos, and 5) includes thermometer La Concordia and Nueva Palmira.
tempMax_cru, tempAve_cru, tempMin_cru, preciAve_cru: Observed average annual maximum temperature, mean temperature, minimum temperature, and precipitation, respectively, from the Climate Change Knowledge Portal.
QAveDay: Annual average of the mean daily flow, from the Argentinian National Water Information System.
tempMax_mz, tempAve_mz, tempMin_mz, preciAcu1d_mz, windDir_mz, windVel_mz: Maximum temperature, mean temperature, minimum temperature, accumulated precipitation - 1 day, wind direction, and wind velocity, respectively, from Meteomanz.
tempMax_inia, tempAve_inia, tempMin_inia, preciAcu_inia, preciAcu1d_inia, preciAcu2d_inia, preciAcu3d_inia: Maximum temperature, mean temperature, minimum temperature, accumulated precipitation, accumulated precipitation - 1 day, accumulated precipitation - 2 days, and accumulated precipitation - 3 days, respectively, from the Uruguayan National Agricultural Research Institute.

Description of features/column header for File 4
year: The year of the data, ranging from 1901 to 2021.
cyaMax: Annual average of the maximum cyanobacterial abundance.
cyaAve: Annual average of the mean cyanobacterial abundance.
NTMax: Annual average of the maximum total nitrogen concentration.
NTAve: Annual average of the mean total nitrogen concentration.
NTMin: Annual average of the minimum total nitrogen concentration.
PTMax: Annual average of the maximum total phosphorus concentration.
PTAve: Annual average of the mean total phosphorus concentration.
PTMin: Annual average of the minimum total phosphorus concentration.
tempMax: Annual average of the maximum temperature.
tempAve: Annual average of the mean temperature.
tempMin: Annual average of the minimum temperature.
preciAve: Annual average of the mean precipitation.
QAveMonthlyMax: Annual average of the maximum mean monthly flow.
QAveMonthlyAcu: Annual average of the accumulated mean monthly flow.
QAveMonthly: Annual average of the mean monthly flow.
QMaxDay: Annual average of the maximum mean daily flow.
QAveDay: Annual average of the mean daily flow.
QMinDay: Annual average of the minimum mean daily flow.
naturalForest: Land use/cover category for the Uruguay river basin, natural forest.
forestation: Land use/cover category for the Uruguay river basin, forest plantation.
grasslands: Land use/cover category for the Uruguay river basin, grasslands and wetlands.
noVeg: Land use/cover category for the Uruguay river basin, non-vegetated areas including bare soil and urban infrastructure.
annualCropsPasture: Land use/cover category for the Uruguay river basin, farming including annual crops (e.g., maize, soybean, wheat) and sown pastures.
water: Land use/cover category for the Uruguay river basin, water bodies and rivers.
BRnaturalForest: Land use/cover category for the Brazilian portion of the Uruguay river basin, natural forest.
BRforestation: Land use/cover category for the Brazilian portion of the Uruguay river basin, forest plantation.
BRgrasslands: Land use/cover category for the Brazilian portion of the Uruguay river basin, grasslands and wetlands.
BRannualCropsPasture: Land use/cover category for the Brazilian portion of the Uruguay river basin, farming including annual crops (e.g., maize, soybean, wheat) and sown pastures.
BRnoVeg: Land use/cover category for the Brazilian portion of the Uruguay river basin, non-vegetated areas including bare soil and urban infrastructure.
BRwater: Land use/cover category for the Brazilian portion of the Uruguay river basin, water bodies and rivers.
BRpasture: Categorization of the 'Brazilian annual crops and pasture' class into various land use covers for the Brazilian portion of the Uruguay river basin, pastures.
BRsoybean: Categorization of the 'Brazilian annual crops and pasture' class into various land use covers for the Brazilian portion of the Uruguay river basin, soybeans.
BRotherCrops: Categorization of the 'Brazilian annual crops and pasture' class into various land use covers for the Brazilian portion of the Uruguay river basin, other summer crops and pastures.
BRCropsPasture: Categorization of the 'Brazilian annual crops and pasture' class into various land use covers for the Brazilian portion of the Uruguay river basin, mosaic of pastures and annual crops.

Any Empty Value in the Data
Any cells containing 'NA' or left empty indicate missing values or signify that the column is not applicable to the respective site and date.

Sharing/Access information
This is the only publicly accessible locations of the dataset.
The data was sourced from the following: please refer to File 1 Metadata.xls.

Code/Software

Risk level estimation for cyanobacterial blooms at recreational beaches

GREEN = LOW RISK, YELLOW = MODERATE RISK, RED = HIGH RISK. Low risk means cyanobacteria levels are below 5000 cells L-1 or microcystin or chlorophyll concentrations below 2 and 10 µg L-1 respectively; high risk indicates cyanobacteria levels above 50,000 cells L-1, or microcystin exceeding 10 µg L-1 or chlorophyll above 50 µg L-1; moderate risk falls between these thresholds.

library(dplyr)

df <- read.csv("RiskCyaData.csv", header = T, sep = ",", dec = ".")

Define a function to assign risk levels

assign_risk <- function(value, thresholds) {
factor(case_when(
value < thresholds[1] ~ "green",
value >= thresholds[1] & value <= thresholds[2] ~ "yellow",
value > thresholds[2] ~ "red")) }

Apply the function to each variable and create new columns

CyaTot: total cyanobacterial abundance (cell mL-1)

Chla: chlorophyll-a concentration (µg L-1)

McyTot: total microcystin concentration (µg L-1)

df <- df %>%
mutate(riskCya = assign_risk(cyaTot, c(5000, 50000)),
riskMcy = assign_risk(mcy, c(2, 10)),
riskChla = assign_risk(chla, c(10, 50)))

Arrange the original risk variable of each subset and map it to the corresponding color code

df$riskRaw <- ifelse(df$riskRaw == "foam", "red",
ifelse(df$riskRaw == "colony", "yellow",
ifelse(df$riskRaw == "no", "green", df$riskRaw)))
df$riskRaw <- as.factor(df$riskRaw)

Subsequently, the maximum alert level will be determined by the highest level detected in each observation of the risk variables (i.e., riskCya, riskMcy, riskChla, riskRaw), whereas the minimum alert level will correspond to the lowest level detected.

dfrisk <- select(df, c("riskRaw", "riskCya", "riskChla", "riskMcy"))

dfrisk <- apply(dfrisk, 2, function(col) { ifelse(col == "green", 1, ifelse(col == "yellow", 2, 3))})
dfrisk <-as.data.frame(dfrisk)

dfrisk <- cbind(dfrisk, riskMaxNum = apply(dfrisk, 1, function(x) ifelse(all(is.na(x)), NA, max(x, na.rm=T))))

riskMax <- as.factor(dfrisk$riskMaxNum)
levels(riskMax) <- c("green","yellow","red")

df$riskMax <- riskMax

Study area

The Uruguay River (UR) is one of the largest rivers in South America (SA). Its basin spans from 28°S to 37°S, covering a vast area of ca. 365,000 km2 and a linear extension of 1,838 km, of which ~540 km is the border between Argentina and Uruguay. At 31°S, the Salto Grande (SG) dam was built in 1974 to produce electricity (length ~100 km, average depth 6.4 m) and frequently presents toxic cyanobacterial blooms. At 35°S on the Atlantic coast of SA, lies the Rio de la Plata estuary (RdlP), with an extension of 325 km and a mean depth of 10 meters, draining the second largest basin of SA (3,170,000 km2) formed by the Paraná River and the UR. Within this basin, agricultural and industrial activities thrive, with approximately 15 million people living along its margins.

Sampling and data collection

Water quality, cyanobacterial blooms, and risk indices

Cyanobacteria information was compiled from publicly accessible reports and through formal requests from the Uruguayan Government Drinking Water Institution (Obras Sanitarias del Estado, OSE), the binational commission of the Rio Uruguay (Comisión Administradora del Río Uruguay, CARU), the binational Technical Commission of the Salto Grande Dam (CTM), the Municipality of Montevideo (IM), and different joint research projects from the University of the Republic in Uruguay (Centro Universitario Regional Este, Facultad de Ciencias, Instituto de Investigaciones Clemente Estable, and Laboratorio Técnológico del Uruguay).

Two datasets are presented. One corresponds to historical data on cyanobacterial abundance (cells mL⁻¹) from 1963 to 2019, primarily originated by OSE during water quality control campaigns at five sites adjacent to water treatment plants along the Uruguay River. The other dataset corresponds to the presence of toxic cyanobacterial blooms from 2008 to 2022, obtained through requests to environmental agencies responsible for public health and compiled from reports on the status of recreational water available on public servers. This dataset mainly consists of risk exposure indices. Both datasets were complemented with information retrieved from research projects.

During monitoring campaigns, samples were collected from subsurface water at 54 sites in the UR and 25 sites in the RdlP, to analyze the presence of cyanobacterial blooms based on cell count, microcystin concentration, and chlorophyll-a concentration. Cyanobacteria abundance was counted in inverted optical microscopy, cyanotoxins (i.e., microcystin-LR) concentrations were estimated with the Microcystins-ADDA ELISA method, and chlorophyll-a spectrophotometrically. Risk levels were assessed considering threshold values for variables indicating the presence of cyanobacteria used by environmental agencies (including cell counts, microcystin-LR concentrations, and visual inspections). Information was retrieved from the main beach water quality monitoring programs in Uruguay. In the UR, CTM monitors 16 sites along the 100 km coastline of the UR in the upper stream of the Salto Grande reservoir, while CARU monitors more than 40 sites along the entire river, extending approximately 300 km downstream from the dam, employing a comprehensive beach condition classification system. This classification is based on multiple indicators, including the presence of foam, chlorophyll-a levels, cyanobacterial cell counts, and microcystins. Exposure risk is subsequently categorized into three levels: “green” indicating low risk, “yellow” indicating moderate risk, and “red” indicating high risk and following threshold values for each variable (threshold values are provided in the readme). In the RdlP, the IM employs a visual surveillance system for cyanobacteria at 21 recreational beaches in Montevideo, categorizing cyanobacterial presence into three categories: “no” detectable colonies, low concentration with scattered “colonies”, and very high concentration with visible “foam”. To merge data retrieved from both monitoring systems (i.e., IM and CARU), and from research projects, we derived a unified three-level risk alert variable using in-situ observations from each dataset. This variable considered multiple indicators of cyanobacteria presence (i.e., cell count, chlorophyll-a and microcystin concentrations, the absence or presence of colonies, and foam). The maximum alert level was determined by identifying the highest risk level among these variables, representing the most severe scenario for public health.

Cyanobacteria data was further linked to relevant environmental variables, such as in-situ water characteristics (i.e, temperature, salinity, turbidity, nutrient concentration, etc), and time series of meteorological, climatological, and hydrological regional variables

Environmental variables

Water temperature, conductivity, dissolved oxygen, turbidity, and salinity of the water were recorded on the surface using a multiprobe. Also, data on water temperatures were measured by thermometer-equipped buoys across the UR between 2008 and 2021. The measurement of major inorganic nutrients, including Nitrogen (in the forms of nitrite, nitrate, and ammonium), and Phosphorus (as orthophosphate), followed the procedures outlined by the American Public Health Association. When there was no in-situ register of nutrient concentration, this information was supplemented with data from reports generated by the PROCON project (Water Quality and Pollution Control Program) in the UR.

For the RdlP, relevant local meteorological data were retrieved from public servers, the Uruguayan Meteorological Institute (INUMET, station Paysandú), the National Agricultural Research Institute of Uruguay (INIA, station Las Brujas), and Meteomanz.com. The recorded variables included precipitation, air temperature, and wind speed and direction; variables derived from the incorporated ones were also calculated (e.g., lagged variables and accumulated precipitation).

Hydrology, climate, and land use

Historical UR flow data (monthly and daily, average, minimum, and maximum) were acquired from the Argentine National Information System (SNIH) from 1908 to 2022 and station #3802 named Paso de los Libres upstream of Salto Grande Dam. Annual average values of maximum, minimum, and average daily temperatures and the average annual precipitation in the UR watershed from 1901 to 2022 were obtained from the historical climate database produced by the Climatic Research Unit (CRU) of the University of East Anglia and provided by the Work Bank climate knowledge portal for the watershed #252. Land use/cover maps from 2000 to 2019 for the entire UR basin were obtained from the MapBiomas Pampa project based on the use of satellite images (i.e., Landsat), and were grouped into 6 classes: natural forest, forest plantation, grasslands, and wetlands, crops and pasture, non-vegetated areas and water bodies. Data from 1985 to 2019 for the Brazilian segment of the UR basin, was used to obtain other land use categories, including pastures, soybeans, other summer crops, and a mixture of pastures and annual crops.

The integration of in-situ and regional data into a single matrix was implemented based on geographic proximity and taking into account the date in the in-situ variables to which the value corresponding to the previous day of meteorological and hydrological variables was associated. For annual data, each meteorological land use change value was repeated in all the years' observations. In this way, biological (i.e., cyanobacterial data) derived from water quality monitoring programs and research projects alongside the spatial positions of sampling sites were compiled and integrated with the environmental matrices (i.e., water characteristics, hydrological, climatological, and land use variables).

Data from: Cyanobacterial blooms in subtropical riverine and estuarine ecosystems of South America

Data files

Abstract

README: Data from: Cyanobacterial blooms in subtropical riverine and estuarine ecosystems of South America

Define a function to assign risk levels

Methods

Study area

Sampling and data collection

Works referencing this dataset