Skip to main content

Fish assemblages on two continents

Cite this dataset

Pyron, Mark (2022). Fish assemblages on two continents [Dataset]. Dryad.


Aim: Fish assemblages –whether defined by taxonomy or functional traits—respond to regional and local habitat variation. We sampled rivers of Mongolia and the western United States (US) to determine the scale at which habitat could predict fish assemblage variation, classified by taxonomy or functional traits. Our hypothesis was that fish assemblages could be predicted using valley-scale hydrogeomorphology and reach-scale hydrology. We further predicted that if valley-scale variables explained high variation in fish assemblages then reach-scale variables would explain additional dimensions.

Location: Mongolia, United States

Methods: We evaluated reach- and valley-scale hydrogeomorphology of rivers in the US and Mongolia in each of three ecoregions, grassland, forest, and endorheic. Fishes were collected using backpack electrofisher following standard protocols.

Results: Ordinations resulted in distinct assemblage patterns that corresponded with habitat variables at both valley- and reach-scales. Hydrogeomorphology differed for Mongolia and US rivers and likely contributed to different patterns that explained fish assemblage variation classified by taxonomy vs. traits. Ecoregions differed in factors contributing to fish assemblage patterns, likely a result of differences in hydrogeomorphology and historical influences, as well as effects of introduced species in the US.

Main Conclusions: We found that fish assemblages were structured by hydrogeomorphic processes occurring at valley- and reach-scales, and that variables predicting fish assemblages vary with scale, ecoregion, and continent. We found a common pattern where if valley-scale variables provided high explanation of fish assemblages, then reach-scale variables frequently explained more ordination dimensions than valley-scale variables. This implies that reach-scale hydrology variables are always strong predictors of fish assemblage variation, and valley-scale geomorphology variables are sometimes strong predictors. We found evidence that introduced species or anthropogenic impacts modified our analyses predicting fish assemblage variation of Mongolia and US mountain steppe rivers. Although anthropogenic impacts were substantially higher for western US rivers than for Mongolia rivers, we were unable to detect strong differences in our ability to predict fish assemblage variation from reach- and valley-scale habitat variables.


2.1 Study area and valley-scale habitat assessment

We identified rivers in the US and Mongolia in three ecoregions, grassland (G), forest-steppe (F), and endorheic (E, Figures 1, 2) (Olson et al., 2001). Unique hydrogeomorphic patches were delineated into Functional Process Zones (FPZs, Thorp et al., 2006; 2008) using the GIS-based program RESonate (Williams et al., 2013) to extract valley-scale hydrogeomorphic and environmental variables from existing geospatial data. Maasri et al. (2019; 2021a) described details for data extraction using RESonate. We used the ten most influential variables for valley-scale hydrogeomorphology to delineate FPZs. These variables—which were extracted at 10 km stream intervals because of the size of these rivers--included elevation, mean annual precipitation, geology, valley width, valley floor (floodplain) width, valley width-to-valley width ratio, river channel sinuosity, right valley slope, left valley slope, and down valley slope. Data were normalized to a 0 to 1 scale for each river network, and a dissimilarity matrix was generated using a Gower dissimilarity transformation (Gower, 1971). A Gower transformation is recommended for non-biological data with range-standardization (Thoms & Parsons, 2003). The dissimilarity matrix was used in a hierarchical clustering following the Ward linkage method, as it resulted in the best partitioning of clusters (Murtagh & Legendre, 2014). We then used a Principal Components Analysis (PCA) to identify important contributive variables for group partitioning, and to describe cluster groups based on the ten variables described above. Cluster groups were later mapped to allow identification of sampling sites. We performed the clustering of FPZ groups using the cluster package (version 2.1.0) (Maechler et al., 2018) and the PCA using the FactoMineR package (version 1.42) (Lê et al., 2008) in R version 3.6.3 (R Core Team, 2020). We mapped the resulting groups using ArcGIS (version 10.5). We examined gradients in hydrogeomorphology with PCA using the ten influential valley-scale variables (above) in Minitab 18.1 ( for all sites and by continent and ecoregion.

2.2 Reach-scale habitat assessment

Each selected site was sampled following the Physical Habitat protocols from Environmental Monitoring and Assessment Program section 7 (Lazorchak et al., 1998) to provide a characterization of hydrogeomorphology at the reach-scale. Recorded field measurements were calculated into seven different metric sections (channel geometry, bank geometry, substrate, fish cover, human influence, riparian cover, flow) representing the habitat and dominant processes in the reach (Kaufmann et al., 1999). Sampling was conducted over a total reach length of 40 times the average wetted width, except where total reach length would have exceeded 5 km, where length was halved. Transects were taken at 0.1 intervals of the total reach length, while half transects were taken at 0.05 of the total reach length. Visual estimates of riparian cover were recorded as the amount and type of cover provided in a 10 m by 10 m area on the left and right banks centered on each transect. Visual estimates of the amount and type of fish cover were recorded representing an area 5 m upstream and 5 m downstream in and over the water at a transect. Human influence data were collected using a “presence metric” that also indicated closeness to the river at a given transect (P- Present > 10m away, C- Present within 10 m, B- present on the bank, 0- Not Present). Channel geometry data included five depth measurements across each transect, and wetted width at each transect and half transect. Bank geometry data were collected at each transect on both banks and included top-of-bank elevations and distances, bankfull elevations and distances, and bank angles. Substrate data were collected at the same spot as depth at transects, as well as at half transects.

Additional reach-scale data were collected remotely in ArcGIS using digital elevation models and aerial photography to extract slope and sinuosity. In total, 120 characteristics and metrics were collected for each sampled site (Appendix 1). These variables were reduced by selecting only characteristics that were aggregates of multiple similar characteristics (i.e., PCT_FAST sums the percentages of falls cascades, rapids, and riffles). FPZ segment data were linked to sampled reaches through analysis in GIS, using a spatial join. The spatial join was conducted on the most downstream GPS point of a sampled site, joining one-to-one with the closest FPZ segment with a search radius to select a single FPZ line with a single sampled site. The spatial join was manually confirmed that each sampled site had an associated FPZ segment. In cases where reaches did not pair with an FPZ segment, manual connections were made by identifying the closest FPZ segment downstream. The FPZ dataset, originally representing over 4300 valley segments across FPZs (Costello et. al, in review), was reduced to 95 segments representing the FPZs that were sampled.

The most downstream point of a sampled reach was used to delineate a contributing watershed area boundary. Using the watershed boundary, data were extracted from DEMs (SRTM 30-m Mongolia, 10-m US), land cover (IGBP Land Cover Classification, Mongolia; National Land Cover Dataset, US), and climate (WorldClim, 30 arc-second). Land cover characteristics were combined to provide consistent classifications between the United States and Mongolia. Land cover characteristics were divided by the contributing watershed area to allow for relative comparison across differing watershed sizes. A total of 29 characteristics were collected for each of the 96 contributing watersheds (Appendix 1).

2.3 Fish collections and traits

We collected fishes from 94 sites that were identified as described above. Site distances for fish collections were 20 times the mean wetted width. We collected fishes by single-pass backpack electrofishing supplemented with angling (Ball State University IACUC #126193) following the American Fisheries Society standard collection protocols (Bonar et al., 2009). Fish abundances across sampled areas were standardized using CPUE fish-per-m. Fishes were collected during one-month expeditions in each of the river networks during summer or fall seasons from 2017-2019 (Maasri et al., 2021b). Species identifications, ecological and biological traits were from Mendsaikhan et al. (2017) for Mongolia, and from state fish guides for the US. Reproductive traits were reduced to four categories: nonguarder open substratum, nonguarder brood hiders, guarders, and viviparous based on Balon (1975).

2.4 Analyses

Valley-scale geomorphology data and reach-scale hydrology data were reduced separately into fewer variables with minimal collinearity using Principal Components Analysis (PCA) in Minitab version 18. We used three PCA axes for the valley-scale data and five PCA axes for the reach-scale data. We then evaluated fish assemblage responses to valley-scale geomorphology and reach-scale hydrology variables using constrained ordinations with forward selection of environmental variables in CANOCO 5 software ( CANOCO evaluates length of the first ordination axis and recommends either a linear method (Redundancy Analysis, RDA) or a nonlinear method (Canonical Correspondence Analysis, CCA). RDA is a direct gradient technique for multifactorial analysis-of-variance models using ecologically relevant distance measures and significance testing of individual variables (Legendre & Anderson, 1999). CCA is a direct gradient weighted averaging regression technique in CANOCO using environmental predictors (Palmer, 1993). PCA axes explaining habitat variation at the reach- and valley-scales were used as inputs to RDA or CCA as environmental predictors of fish assemblage variation. Both direct gradient analyses used 999 Monte Carlo permutations for significance of environmental variables as predictors of fish assemblage structure. If RDA or CCA were unable to reach a solution for constrained ordinations (usually due to number of sites in an ordination were not sufficiently large enough compared to the number of environmental predictors; Kent 2006), we used unconstrained ordinations with environmental variables projected. For an unconstrained ordination, CANOCO uses gradient length to suggests either a linear PCA or unimodal (Correspondence Analysis) ordination. All multivariate analyses were performed for taxonomic abundances and trait abundances with environmental predictors at three scales: two continents combined, individual continents, and individual ecoregions. Fish and trait abundances were log-transformed by log (X + 1) before analysis to account for abundances spanning three orders of magnitude. We summed the variation explained in ordinations by continent and ecoregion and compared the number of PCA axis predictors at the reach-scale and valley-scale to test if more predictors were required at a scale when higher variation was explained at the other scale.

Usage notes

Please see accompanying manuscript for details.


NSF Macrosystem Biology, Award: 1442595