Skip to main content

Tropical South America Diatom Database (TSADB)

Cite this dataset

Benito, Xavier (2021). Tropical South America Diatom Database (TSADB) [Dataset]. Dryad.


Determining the mechanisms of community assembly forms the foundation of biogeography and community ecology. Studies of the biodiversity and distribution of Neotropical macro-organisms have revealed the roles of environmental, spatial, and historical factors in structuring communities at different spatial and temporal scales, but the role of these factors play for species and communities on microorganisms are still poorly known. Diatoms are a very species-rich group of algae, disperse widely, and sensitive to environmental variation due to their position in the base of aquatic food webs. Here, we present the Tropical South American Diatom Database (TSADB) which contains geographical and ecological information of species across lentic and lotic environments, including predictors that describe local (limnological) and regional (geo-climatic) factors. The database can be used to tackle fundamental questions in macroecology, including metacommunity ecology and biogeography theories that form the foundation for better understanding the rapid environmental change the tropical regions are experiencing. The TSADB includes diatom taxa from 437 samples in 326 sites containing distributed across 26 regions (0 to 5,070 m a.s.l, and between 8°N–35°S; 58–90°W). In addition, long-term, diatom-based paleolimnological records are presented as a complementary tool for identifying regions with available modern and paleo-datasets for a long-term limnological change observatory in the tropical Americas. We describe the TSADB structure and functionality, and R codes for data manipulation and visualization. Each of the 26 study regions is represented by 3 data matrices: sampling site information, environmental variables (limnology, climate, and landscape), and diatom community data (relative abundance or presence/absence). Access to data and future additions is through publicly available repositories and a guide to contributors, respectively, providing opportunities for complementing existing databases on diatoms and allowing optimal usage of TSADB by scientist including diatomists, limnologists, aquatic ecologists, and natural resource managers.


The Tropical South America Diatom Database (TSADB) includes 326 sites containing 437 samples distributed across 26 regions (0 to 5,070 m a.s.l, and between 8°N–35°S; 58–90°W). The database comprises published and unpublished studies from lentic and lotic environments sampled by different authors for different purposes (e.g. paleoclimatic reconstructions, taxonomy, biodiversity). Diatom samples correspond to multiple habitats (e.g. sediment surface, periphyton, and plankton) and cover the period 1978–2017, including predictors that describe local (limnology) and regional (geo-climatic) factors. 

Usage notes

The TSADB is hosted in two different data repositories (Dryad for Excel spreadsheets, and Zenodo for R files) via two different formats to provide users alternative ways to access and use the data.

  • Excel spreadsheets: 26 excel files (one for each study region). Depending on the original information, each file has three sheets containing the three data matrices (site information, environmental variables, and diatom community data). A metadata sheet describing sources and descriptions of environmental variables is common in all excel files. A guide on how to share datasets for the TSADB is also included.
  • R folder: files consisting in the TSADB’s region datasets as a R list object (“TSABD.RData”), three R scripts for data processing, plotting (Figs. 1-4) and updating taxon nomenclature, and the R shiny apps. The “TSADB.Rdata” is a list of objects, in which each element refers to a data matrix ($diatoms, $sites, $environment) that itself contains a list of study regions. The R script “1-cleaning and formatting.R” allows for data preparation, checking the correct format (i.e., numeric, text, ordered factors, region name spelling), and merging the original datasets by unique IDs into regions. Database manipulation, plotting and generating metadata is available via the R script “2-visualization.R”. Finally, the R script “3-taxonomic-harmonisation.R” run steps for nomenclature synchronization and authority harmonization (see below). A guide on how to share datasets for the TSADB is also included. The R code routines generate output files and plots that can be turned on/off by commenting/uncommenting lines. All steps were conducted using several R packages, including tidyverse (Wickham et al. 2019), ggplot2 (Wilke et al. 2019), cowplot (Wilke et al. 2019), and shiny (Winston et al. 2021). 


the Secretary of Universities and Research (Government of Catalonia) and the Horizon 2020 programme, Award: 801370