Global patterns and determinants of multiple facets of freshwater fishes beta diversity
Data files
Dec 30, 2025 version files 631.10 MB
-
CodaAndData.zip
631.09 MB
-
README.md
9.30 KB
Abstract
Global patterns of freshwater fish species diversity and their natural and anthropogenic determinants are relatively well documented. Yet, determinants of fish dissimilarity (beta diversity) across river basins remain poorly understood. Here, we quantify taxonomic, functional, and phylogenetic beta diversity of freshwater fish across global river basins and identify the key environmental and historical drivers shaping these patterns. We used a global database of freshwater fishes, a trait database based on morphological descriptions and phylogenetic information calculated by phylogenetic distances. We assessed multiple facets of fish beta diversity and partitioned beta diversity into turnover and nestedness components to assess their contributions. Spearman correlation analysis was conducted to assess the relationships between them and their components. We then applied a boosted regression tree (BRT) model to assess their relationship with key environmental, spatial, and historical variables. We reported strong links between three facets of beta diversity, but species turnover contributed more strongly than nestedness to overall beta diversity, while functional and phylogenetic beta diversity exhibited contrasting patterns. We further found that geographic, climatic, and historical factors all played significant roles in shaping beta diversity, with river basin area emerging as the most influential predictor. Although the global patterns of the three facets of beta diversity are consistent, the contributions of turnover and nestedness are different. Our results suggest that distinguishing between turnover and nestedness dimensions of the taxonomic, functional, and phylogenetic facets of biodiversity provides a comprehensive and insightful understanding of the mechanisms underlying global freshwater fish beta diversity.
Dataset DOI: 10.5061/dryad.dncjsxmbh
Overview
This dataset accompanies the study “Global patterns and determinants of multiple facets of freshwater fishes beta diversity”. It contains basin-level species occurrence data, morphological trait data, phylogenetic information, environmental variables, and derived beta diversity metrics for global freshwater fish assemblages. All data processing, analyses, and figure generation were conducted in R.
File Structure
The archive CodaAndData.zip contains two main directories:
data/– Data files used and generated in this studyR code/– R scripts used for data processing, analysis, and figure production
Data Files (data/)
All data files are stored in .RData format and can be loaded in R using the load() function.
Description of data files
tree_10705
Phylogenetic tree of freshwater fish species in phylo format, which was obtained from https://doi.org/10.6084/m9.figshare.13383170.
trait_10705 and PCA_10705_RF
Morphological trait data of freshwater fish species. Columns 1–6 contain taxonomic information (species classification), and the remaining columns contain species-level morphological trait measurements.
Occ_bef_2456_10682
Presence–absence (0–1) distribution data of freshwater fishes. Rows represent river basins, and columns represent species.
abun_six_realm_bef
A list object generated by splitting Occ_bef_2456_10682 into six biogeographic realms. Each element of the list corresponds to one biogeographic realm, with rows representing basins and columns representing species.
coord_basin_2456
Geographic coordinates of river basins. Columns 1–3 correspond to basin name, longitude, and latitude, respectively.
Basin042017_3119.shp
Shapefile containing the spatial boundaries of global river basins.
basin_2456
Basin-level information, including basin name, country, geographic coordinates, basin area, associated biogeographic realm, and river length.
dissimilarity_indices
List object containing three-dimensional beta diversity indices (taxonomic, functional, and phylogenetic) for two different time periods. These data were generated for comparative purposes and were not used in the analyses presented in this study.
0 Palearctic_five_match_768, 0 Oriental_five_match_336, 0 Neotropical_five_match_375, 0 Nearctic_five_match_241, 0 Afrotropical_five_match_202,0 Australian_five_match_534
List objects representing each basin and its five geographically adjacent basins within each of the six biogeographic realms.
0 environment data
Basin-level environmental data. Columns 1–3 contain basin identifiers and geographic coordinates, and the remaining columns contain raw environmental variables. Detailed descriptions of these environmental variables are provided in the Supporting Information of the associated article.
Variable Descriptions:
- T – Mean annual temperature (°C)
- TSeasonality – Temperature seasonality (standard deviation ×100)
- MaxTWarmestMonth – Maximum temperature of the warmest month (°C)
- MinTColdestMonth – Minimum temperature of the coldest month (°C)
- TAnnualRange – Annual temperature range (°C)
- TWettest – Mean temperature of the wettest quarter (°C)
- TDriest – Mean temperature of the driest quarter (°C)
- TWarmest – Mean temperature of the warmest quarter (°C)
- TColdest – Mean temperature of the coldest quarter (°C)
- P – Mean annual precipitation (mm)
- PSeasonality – Precipitation seasonality (mm)
- PWettest – Precipitation of the wettest quarter (mm)
- PDriest – Precipitation of the driest quarter (mm)
- PWarmest – Precipitation of the warmest quarter (mm)
- PColdest – Precipitation of the coldest quarter (mm)
- Runoff – River runoff volume (L/s/km²)
- NPP – Net primary productivity (g C m⁻² yr⁻¹)
- Elevation – Elevational range within the basin (m)
- Slope – Mean slope of the basin (degrees)
- Velocity – Late Quaternary glacial–interglacial climate‐change velocity (km/kyr)
- Area – River basin area (km²)
- LGMT – Change in mean annual temperature between present and Last Glacial Maximum (°C)
LGMP – Change in mean annual precipitation between present and Last Glacial Maximum (mm/year)
2 environment data
Dataset containing 12 environmental variables used in subsequent analyses. Detailed definitions and units are provided in the Supporting Information of the article.
Variable Descriptions:
- Temp_PCA1 – The first principal component axis of temperature variables
- Temp_PCA2 – The second principal component axis of temperature variables
- Prec_PCA1 – The first principal component axis of precipitation variables
- Prec_PCA2 – The second principal component axis of precipitation variables
- Runoff – River runoff volume
- NPP – Net primary productivity
- Elevation – Elevational range within the basin
- Slope – Mean slope of the basin
- Velocity – Late Quaternary glacial–interglacial climate‐change velocity
- Area – River basin area
- LGMT – Change in mean annual temperature between the present and the Last Glacial Maximum
- LGMP – Change in mean annual precipitation between the present and the Last Glacial Maximum
- 1 ratio of all_basin_dissimilarity
Columns 1–6 represent the relative contributions of turnover and nestedness components to total taxonomic, functional, and phylogenetic beta diversity. - 1 all_basin_dissimilarity
Columns 1–2 contain basin identifiers. Columns 3–11 contain total beta diversity and its turnover and nestedness components for taxonomic, functional, and phylogenetic diversity. - 3 NULL_ALL_new
Results of 999 null model simulations for functional and phylogenetic beta diversity. Rows represent basins, and each column corresponds to one null model iteration. - 5 var for BRT NULL SES NEW
Input variables for boosted regression tree (BRT) models. Columns 1–2 contain basin identifiers, columns 3–8 contain standardized effect sizes (SES) of diversity indices, and the remaining columns contain environmental variables used in the models (see Supporting Information). - 5 var for BRT adjacent ALL
Input variables for BRT models based on adjacent-basin analyses. Columns 1–2 contain basin identifiers, columns 3–11 contain beta diversity indices, and the remaining columns contain environmental variables used in the models. - 5 TBD_opest_adjacent, 5 FBD_opest_adjacent,5 PBD_opest_adjacent,
5 FBD_opest_deviation_new, 5 PBD_opest_deviation_new
Results of cross-validation analyses for taxonomic, functional, and phylogenetic beta diversity indices, as well as SES values for functional and phylogenetic diversity. Columns 1–5 correspond to bf, tc, lr, CV-D2, and nt. - 5 TBD_invaInf,5 FBD_invaInf, 5 PBD_invaInf,
5 FBD_invaInf_null,5 PBD_invaInf_null
Results of 100 repeated BRT model runs for taxonomic, functional, and phylogenetic beta diversity indices, as well as SES values for functional and phylogenetic diversity. The first column contains variable names, and the remaining columns represent the relative importance of each variable in each model run.
Data generated in this study
The remaining .RData Files were generated by the R scripts provided in this archive and include:
- Basin-level taxonomic, functional, and phylogenetic beta diversity indices
- Environmental variables associated with each basin
- Outputs from null model analyses
- Model results from boosted regression tree (BRT) analyses
These derived datasets are used directly in the statistical analyses and figures presented in the manuscript.
R Code (R code/)
The following R scripts are provided to ensure full reproducibility of the analyses:
- MapForDissimilarity.R
Computes beta diversity dissimilarity indices and environmental variables, and produces Fig. 1, Fig. 2, Fig. S1, Fig. S2, and Fig. S3. - Null_model.R
Performs null model analyses and produces Fig. S6. - BRT.R
Conducts boosted regression tree (BRT) analyses and produces Fig. 3, Fig. 4, Fig. 5, Fig. S7, and Fig. S8. - Fill the trait missing values. R
Fills missing values in the morphological trait database. - functional.betapart.core.pair.R
Custom function used to calculate functional beta diversity. - ggPD_su (re-rank).R
Custom plotting function used in BRT result visualization.
File Formats
- Data files:
.RData(also the data without an extension) - Spatial data:
.shp - Code files:
.R
Software Requirements
- R (version ≥ 4.0 recommended)
- Required R packages are listed within each script
Contact
For questions regarding this dataset, please contact,
Ziqi Chen: Email: chenzq1212@gmail.com
