Decade-scale stream morphology and stream fish community structure in headwater streams draining Mississippi's National Forests
Data files
Sep 17, 2025 version files 22.30 MB
-
drsu_ca.rds
6.43 MB
-
drsu_cca_aov_results.rds
93.25 KB
-
drsu_cca.rds
6.54 MB
-
drsu_nmds_stepdown.rds
1.16 KB
-
drsu_nmds.rds
4.08 MB
-
fishes.csv
2.04 MB
-
geo_pca.rds
96.06 KB
-
habitat_points.csv
1.95 MB
-
habitat_transects.csv
670.06 KB
-
network_topology.csv
20.21 KB
-
README.md
39.87 KB
-
sample_lulc.csv
212.51 KB
-
samples.csv
30.18 KB
-
sites.csv
79.94 KB
-
trait_sources.csv
1.37 KB
-
traits.csv
13.48 KB
Abstract
Decade-scale ecological datasets provide critical insights into long-term ecosystem properties, and long-term ecosystem response to human driven landscape alterations. In the state of Mississippi, USA, a history of intensive deforestation between 1830 and 1920 was followed by intensive sediment mitigation measures including intentional channel straightening and dredging and widespread reforestation. These corrective actions led to widespread and sometimes catastrophic channel incision and the potential for decoupling of floodplain and channel aquatic ecosystems. In this dataset we document a quantitative multi-decadal fish community and fish habitat dataset for 762 samples from streams draining National Forests in Mississippi. These data are used in a companion manuscript to test the hypothesis that increased channel incision relates to decreased prevalence of species with ecologies indicating floodplain, backwater, or off-channel habitat use. We provide data for associated landscape-level covariates (land use, stream network topology) derived from remote sensing data sources. We further provide a literature-based database of resource use for each species encountered in the survey. Scripts which document analyses of channel-floodplain ecological decoupling in an associated manuscript (Stearman et al. 2025) and code required to run these scripts are also provided.
Description
A dataset containing multi-decadal fish community and fish habitat measurements on 369 streams draining national forests in Mississippi, USA, with associated resource use data for each fish species, land use/land cover metrics and watershed topology metrics for each collection, and intermediate analysis products and analytical scripts.
Principle Investigator Contact Information
Name: Loren Stearman
Institution: University of Southern Mississippi
Email: Loren.Stearman@usm.edu
Alternate Contact Information
Name: Jake Schaefer
Institution: University of Southern Mississippi
Email: Jake.Schaefer@usm.edu
Name: Scott Clark
Institution: U.S. Fish and Wildlife Service, Baton Rouge Fish and Wildlife Conservation Office
Email: scottrclark2@gmail.com
Dates of Data Collection
- Fish community and fish habitat data collection: 1999 - 2003, 2008, 2009, 2015-2022
- Fish resource use data assembly: 2020
- Land use/land cover data collection (NLCD): 2001, 2004, 2005, 2006, 2008, 2011, 2013, 2016, 2019, 2021
- Watershed network topology: 2019-2020, data derived from NHD+ V2 datasets
Data Spatial Scope
Fish community and habitat data were collected from headwater streams (1st to 5th order, Strahler Stream Order) draining five national forests (Bienville, DeSoto, Holly Springs, Homochitto, and Tombigbee National Forests) in the state of Mississippi, United States. Associated, land cover, and watershed topology metrics in this dataset share the same spatial scope. Resource use datasets cover the range of the species collected in this survey, typically the southeastern United States.
Funding
Funding for this project was provided by grants from the U.S. Forest Service (grant 22-PA-11080700-157) for data collection, as well as the U.S. Army Corps of Engineers (contract W912HZ21C0064) during the analysis and writing phase.
Sharing/Access
This work is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license.
Files accessory_functions.R and geomorph_niche_analyses.R are copyright under a GNU GPLv3 license.
Associated publications
Stearman, L. W., J. F. Schaefer, and S. Clark. 2025. Early channel evolution relates to fish community resource use in the Gulf Coastal Plains of North America. Ecological Applications, in press.
Related Data Sources and References
- National Land Cover Database, Multi-Resource Use Consortium. https://www.mrlc.gov
- National Hydrography Dataset Plus, Version 2.0. Horizon Systems, Inc. https://www.nhdplus.com/NHDPlus/NHDPlusV2_home.php
- Boschung, H. T., and R. L. Mayden. 2004. Fishes of Alabama. First edition. Smithsonian Books, Washington, DC.
- Douglas, N. H. 1974. Freshwater Fishes of Louisiana. Claitor’s Publishing Division, Baton Rouge, LA.
- Kuehne, R. A., and R. W. Barbour. 1983. The American Darters. The University Press of Kentucky, Lexington, KY.
- Miller, R. J., and H. W. Robison. 2004. Fishes of Oklahoma. University of Oklahoma Press, Norman, OK.
- Oksanen, J., F. G. Blanchet, M. Friendly, R. Kindt, P. Legendre, D. McGlinn, P. R. Minchin, R. B. O’Hara, G. L. Simpson, P. Solymos, M. H. H. Stephens, E. Szoecs, and H. Wagner. 2020. vegan: Community Ecology Package. R package version 2.5-7.
- Page, L. M. 1983. Handbook of Darters. TFH Publications, Neptune, NJ.
- Pflieger, W. L. 1997. The Fishes of Missouri. Revised Edition. Missouri Department of Conservation, Jefferson City, MO.
- R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna Austria.
- Robison, H. W., and T. M. Buchanan. 2020. Fishes of Arkansas. 2nd edition. University of Arkansas Press, Fayetteville, AR.
- Ross, S. T., W. M. Brenneman, W. T. Slack, M. T. O’Connell, and T. L. Peterson. 2001. Inland Fishes of Mississippi. University Press of Mississippi, Jackson, MS.
- Warren, M. R., and B. M. Burr, editors. 2014. Freshwater Fishes of North America, Petromyzontidae to Catostomidae. John Hopkins University Press, Baltimore, MD.
Data Sources
Fish community data and fish habitat data were collected from field surveys from 1999 - 2022 by the authors, Mel Warren, and associated graduate students and technicians. Fish resource use data were extracted from regional literature reviews of fish biology and ecology. Land use data were derived from the National Land Cover Database (Multi-Resource Use Consortium), 2001 - 2021. Watershed topological metrics were derived from analysis of the NHD+ V2.0 dataset.
Description of the data and file structure
File Descriptions
fishes.csv
This file contains collections records for fishes at the 369 sites in the survey.
geo_pca.rds
This file contains an R list object specifying the results of a principle components analysis on habitat data indicative of fluvial geomorphic processes and characteristics at the sample site scale. This file contains an intermediate analysis product to provide replicable results for end users. Users may also replicate the analyses from scripts provided; however, some analysis which rely on resampling or random starts will produce subtly different results (typically rotated about axes).
habitat_points.csv
This file contains habitat measurements at 1-m intervals, taken on habitat transects in the field.
habitat_transects.csv
This file contains habitat measurements taken once per habitat transect in the field.
network_topology.csv
This file contains multiple measurements of watershed network topology (i.e., stream size, location within the watershed) for each of the 369 sample sites in the survey.
drsu_ca.rds
This file contains an R list object specifying the results of a correspondence analysis (weighted averaging) ordination of community-level relative niche utilization, determined by defining niche space at a site from the resource use data and the fish community data. This file contains an intermediate analysis product to provide replicable results for end users. Users may also replicate the analyses from scripts provided; however, some analysis which rely on resampling or random starts will produce subtly different results (typically rotated about axes).
drsu_cca.rds
This file contains an R list object specifying the results of a canonical correspondence analysis ordination of community-level relative niche utilization, constrained by both local-scale metrics derived from the geomorphic principle components analysis and landscape-level land use and network topology metrics. This file contains an intermediate analysis product to provide replicable results for end users. Users may also replicate the analyses from scripts provided; however, some analysis which rely on resampling or random starts will produce subtly different results (typically rotated about axes).
drsu_cca_aov_results.rds
This file contains an R list object specifying the results of analysis of variance on the niche and explanatory factor canonical correspondence analysis. This file contains an intermediate analysis product to provide replicable results for end users. Users may also replicate the analyses from scripts provided; however, some analysis which rely on resampling or random starts will produce subtly different results (typically rotated about axes).
drsu_nmds.rds
This file contains an R list object specifying the results of nonmetric multidimensional scaling analysis ordination of community-level relative niche utilization, constrained to only turnover components by dissimilarity decomposition. This file contains an intermediate analysis product to provide replicable results for end users. Users may also replicate the analyses from scripts provided; however, some analysis which rely on resampling or random starts will produce subtly different results (typically rotated about axes).
drsu_nmds_stepdown.rds
This file contains an R list object specifying the results of a step-down analysis to determine optimal end solution dimensionality for nonmetric multidimensional scaling analysis of community-level relative niche utilization. This file contains an intermediate analysis product to provide replicable results for end users. Users may also replicate the analyses from scripts provided; however, some analysis which rely on resampling or random starts will produce subtly different results (typically rotated about axes).
sample_lulc.csv
This file contains land use/land cover data associated with each fish and habitat sampling event.
samples.csv
This file contains sample metadata for each fish and habitat sampling event.
sites.csv
This file contains metadata for each of the 369 sites in the survey.
traits.csv
This file contains resource use values on five major gradients (habitat fluviatility, foraging height, cover type, foraging light intensity, and trophic level) for each of the 117 species in the survey. The file also contains footnotes for sources of trait data for each species.
trait_sources.csv
This file contains references for the trait data in the file "traits.csv".
Methodology
*Detailed methods and line by line steps for analytical products are documented in associated analytical code "analysis_script.R" for all associated .rds files.
fishes.csv
Fishes were collected from sample sites using a combination of backpack electrofishing and seining. Sites were defined as a stream reach 30x the mean wetted stream width, bounded between 120 and 240m, to encompass multiple microhabitat turnovers. Within each site, surveyors split the reach into four equal-sized subreaches. Fishes were sampled with a backpack electrofisher and dip nets with 3mm mesh, sampling in an upstream direction at a mean rate of 0.1m/s. Fishes were also sampled with 3mm mesh single-lead seines (typically 2-4m x 1.5-2m), using downstream hauls in pools and runs and kicksets in riffles. Sampling was single-pass only, separately for each technique. Following collection, fishes were euthanized per IACUC protocol (21021101.R1), preserved in 10% formalin separately by gear and by subreach, and transported to the laboratory at the University of Southern Mississippi. Fishes which did not fit into a 1L nalgene jar in the field were photographed, recorded to species, and released at the subreach of capture. Block nets were not used in sampling. Fish community samples were pooled by subreach within each sampling event for analysis.
geo_pca.rds
Original field measurements of habitat were analyzed to produce a suite of variables indicating fluvial geomorphic characteristics and processes. These variables were assessed for normality prior to analysis, and either log10-transformed (continuous values) or logit-transformed (percent or proportion values) if necessary to improve distributions. All variables were z-score transformed to mean = 0 and sd = 1. The transformed dataset was analysed using R function prcomp.
habitat_points.csv + habitat_transects.csv
Habitat data were taken using a point-transect method (both for habitat_points.csv and habitat_transects.csv). Sites were defined as a stream reach 30x the mean wetted stream width, bounded between 120 and 240m, to encompass multiple microhabitat turnovers. Within each site, surveyors split the reach into four equal-sized subreaches. Within each subreach, surveyors demarcated 3 equally-spaced transects perpendicular to the stream (n = 12 per sample). On each transect, surveyors recorded the channel width and wetted width (m), and for each bank (separately), the bank height (m), bank angle (<45°, >45°, and 90°), classes of riparian vegetation (herbaceous, shrub, sapling, tree), and presence of visible bank erosion or undercut banks. At 1m intervals on each transect, surveyors recorded water depth (cm), velocity (m/s), substrate size on a modified Wentworth scale (1 = clay/silt, 2 = sand, 3 = gravel, 4 = cobble, 5 = boulder, and 6 = bedrock), and presence of detritus, small woody debris, large woody debris, or aquatic vegetation.
network_topology.csv
Watershed network topology variables were derived from the NHD+ V2.0 dataset. COMID values (unique identifiers) were extracted from the stream reach containing the sample locality. Base COMID values (downstream end of the watershed) were extracted from the stream reach at the downstream end of the watershed. DCOMID values (COMID values immediately downstream of the site) were extracted from the stream reach immediately downstream of the reach containing the sample locality. Total drainage area (TDA) and Strahler Stream Order (SSO) were extracted from the NHD+ V2.0 value added attributes (VAA) table. Link magnitude was calculated as the number of upstream first-order tributaries. C-Link magnitude was calculated as the number of stream lengths downstream from the sample locality to the downstream-most stream segment in the watershed (Base COMID). C-Link6 was calculated in a similar fashion; however, the closest 6th order stream was substituted for the downstream-most segment. D-Link was calculated as the number of upstream first-order tributaries draining to the stream segment downstream of the segment containing the sample locality.
drsu_ca.rds
Species-level niche proxies were calculated by assessing the unique combinations of resource axis values occupied by each species. Community-level niche proxies were assigned by assigning all unique resource axis combinations for each species in a sample, to the sample. Each individual combination (e.g., riffle + benthic + diurnal + no cover + benthic invertebrate, pool + neustonic + diurnal + woody debris + terrestrial invertebrate) was treated as the equivalent of a "species" in the analysis, and the tally of each combination represented in a sample was used to calculate relative abundance in each sample. This relative abundance matrix was analyzed using R function "CCA", with no specified constraint matrix to produce a correspondence analysis / weighted averaging solution.
drsu_cca.rds
Species-level niche proxies were calculated by assessing the unique combinations of resource axis values occupied by each species. Community-level niche proxies were assigned by assigning all unique resource axis combinations for each species in a sample, to the sample. Each individual combination (e.g., riffle + benthic + diurnal + no cover + benthic invertebrate, pool + neustonic + diurnal + woody debris + terrestrial invertebrate) was treated as the equivalent of a "species" in the analysis, and the tally of each combination represented in a sample was used to calculate relative abundance in each sample. This relative abundance matrix was analyzed using R function "CCA", with a constraint matrix constructed of land use/land cover values, watershed network topology values, and axes from the habitat/geomorphology principle components analysis, to produce a canonical correspondence analysis solution.
drsu_cca_aov_results.rds
The final cca solution (file niche_cca.rds) was analysed using R function anova.cca to assess the fit of the solution.
drsu_nmds.rds
Species-level niche proxies were calculated by assessing the unique combinations of resource axis values occupied by each species. Community-level niche proxies were assigned by assigning all unique resource axis combinations for each species in a sample, to the sample. Each individual combination (e.g., riffle + benthic + diurnal + no cover + benthic invertebrate, pool + neustonic + diurnal + woody debris + terrestrial invertebrate) was treated as the equivalent of a "species" in the analysis, and the tally of each combination represented in a sample was used to calculate relative abundance in each sample. This matrix was transformed to a binary (presence/absence matrix) and decomposed into turnover and nestedness components using R function beta.pair. We retained the turnover component and analyzed this dissimilarity matrix using a custom function (documented in associated file "accessory_functions.R") to produce a nonmetric multidimensional scaling solution.
drsu_nmds_stepdown.rds
Species-level niche proxies were calculated by assessing the unique combinations of resource axis values occupied by each species. Community-level niche proxies were assigned by assigning all unique resource axis combinations for each species in a sample, to the sample. Each individual combination (e.g., riffle + benthic + diurnal + no cover + benthic invertebrate, pool + neustonic + diurnal + woody debris + terrestrial invertebrate) was treated as the equivalent of a "species" in the analysis, and the tally of each combination represented in a sample was used to calculate relative abundance in each sample. This matrix was transformed to a binary (presence/absence matrix) and decomposed into turnover and nestedness components using R function beta.pair. We retained the turnover component and analyzed this dissimilarity matrix using a custom function (documented in associated file "accessory_functions.R") and a step-down protocol to determine ideal dimensionality (K) for the final nmds solution. We assessed K values from 1 to 6.
sample_lulc.csv
Land use/land cover data were extracted for each of the 369 sample sites, for each of the 10 periods of available land use/land cover data in the National Land Cover Database (MRLC). Values were obtained by clipping original rasters to the upstream catchment of each site (derived from the NHD+ V2.0 catchment shapefiles) and extracting individual pixel values. Because sites were sampled over a multi-decadal period, and many sites were sampled more than once, each sample event was assigned values from the temporally closest NLCD dataset.
samples.csv
Sample metadata were recorded by surveors in the field prior to sampling events. For sample events preceding 2015, and for some events following 2015, start and end times and collectors were not recorded.
sites.csv
Site metadata were obtained during site selection processes. Surveyors recorded the site number, latitude and longitude (decimal degrees, NAD83), the National Forest, the major hydrologic basin, and the stream system. The Hydrologic Unit of Conservation level 10 (HUC10) was extracted from the NHD+ V2.0 dataset. A HUC10-proxy was assigned for sites where a HUC10 had few samples, but was adjacent to and plausibly ecologically interacting with an adjacent HUC10 with multiple samples. Group values (coordinating with HUC10 proxies and used in meta-analyses and aggregate analyses) were assigned manually.
traits.csv
Resource use values (or "traits") were extracted from review of regional literature. Values were primarily extracted from Douglas (1974), Page (1983), Robison and Buchanan (1988, 2020), Pflieger (1997), Ross et al. (2001), Boschung and Mayden (2004), Miller and Robison (2004), and Warren and Burr (2014). Species were scored for ordinal values (see data definitions below) on five major resource axes: the use of progressively swifter and shallower habitats (habitat fluviatility), the use of deeper to shallower foraging stations in the water column (foraging height), the use of progressively larger types of cover (cover utilization), the use of progressively more intensive light periods for foraging (foraging light intensity), and the use of progressively higher trophic level, larger, and more difficult to process prey items (trophic level). Species were scored for each ordinal category in each resource gradient either a 1 (documented to use) or 0 (not documented to use). The authors supplemented literature values with field observations where pertinent, and for species with missing data, data from a closely related and ecologically similar surrogate species were substituted. For foraging light intensity, family-level averages among documented species were substituted for species with no diel activity data.
trait_sources.csv
This file was compiled while creating the file "traits.csv". References were recorded and enumerated based on their order of occurrence.
Data-Specific Information
*All missing data values are specified as NA, to facilitate analysis in R. R data serialization (RDS) objects are provided to facilitate user replication of analyses in associated manuscripts where analyses involve randomization procedures. The structure of RDS objects specifying the results of analyses from published R packages are not replicated from their original documentation here; however, custom data structures from associated R scripts are detailed below.
fishes.csv
- Number of variables: 11
- Number of rows: 33502
- Variable list:
- COL_ID: (alphanumeric) A unique collections identifier for each collection event, incorporating the year, site ID, and visit number for that year.
- FISHID: (numeric) A unique assigned number for each fish collection, separately by species, sample, and subplot.
- SUBREACH: (numeric) The subplot sampled at the sample site.
- SAMPLINGMETHOD: (character) The sample method, either S (seining) or E (backpack electrofishing).
- CLASS: (character) Age class, either Adult (A), Juvenile (J), or Young-of-Year (Y). Deprecated after 2003 in favor of recording range of lengths.
- COUNT: (numeric) The number of individuals in the collection.
- SPECIES: (character) The scientific name of the species collected.
- MIN_SL: (numeric) The minimum standard length (mm) of individuals in the collection. Not collected before 2003.
- MAX_SL: (numeric) The maximum standard length (mm) of individuals in the collection. Not collected before 2003.
- RELEASED: (character) A value indicating if the species was released in the field instead of preserved. Only recorded after 2003.
- COMMENTS: (character) Any pertinent comments regarding the collection.
- Data Types: character, numeric, alphanumeric
- Missing Data Value: NA
geo_pca.rds
A list object of class "prcomp" containing the the results of a Principle Components analysis using R function prcomp(). Documentation of the structure of object class "prcomp" is available at http://127.0.0.1:19525/library/stats/html/prcomp.html.
habitat_points.csv
- Number of variables: 13
- Number of rows: 43861
- Variable list:
- COL_ID: (alphanumeric) A unique collections identifier for each collection event, incorporating the year, site ID, and visit number for that year.
- SUBREACH: (numeric) The subplot sampled at the sample site.
- TRANSECT: (numeric) The transect number in the sample from which habitat point values were taken.
- POINT: (numeric) The point number on the transect from which habitat point values were taken.
- DEPTH: (numeric, cm) The depth in centimeters.
- VELOCITY: (numeric, m/s) The water velocity in meters per second.
- SUBSTRATE: (numeric) The substrate size on a modified Wentworth scale (1 = silt/clay, 2 = sand, 3 = gravel, 4 = cobble, 5 = boulder, 6 = bedrock).
- DET: (binary, 0 = no, 1 = yes) Logical, was detritus present? Defined as leaf litter or decaying organic matter.
- SWD: (binary, 0 = no, 1 = yes) Logical, was small woody debris present? Defined as woody debris with a maximum diameter <10cm.
- LWD: (binary, 0 = no, 1 = yes) Logical, was large woody debris present? Defined as woody debris with a maximum diameter >10cm.
- VEG: (binary, 0 = no, 1 = yes) Logical, was aquatic vegetation present? Inclusive of submerged and emergent aquatic vegetation.
- CANOPY: (numeric, percent) Percent of canopy cover visible directly above the point.
- COMMENTS: (character) Any comments pertinent to the measurements.
- Data Types: character, numeric, alphanumeric, binary
- Missing Data Value: NA
habitat_transects.csv
- Number of variables: 23
- Number of rows: 8921
- Variable list:
- COL_ID: (alphanumeric) A unique collections identifier for each collection event, incorporating the year, site ID, and visit number for that year.
- SUBREACH: (numeric) The subplot sampled at the sample site.
- TRANSECT: (numeric) The transect number in the sample from which habitat point values were taken.
- WWIDTH: (numeric, m) The wetted width in meters.
- CWIDTH: (numeric, m) The channel width (total) in meters.
- RBH: (numeric, m) The right bank height in meters.
- RBSTAB: (character, Stable or Eroding) The right bank stability.
- RBANG: (character, <45, >45, 90) The right bank angle in degrees.
- RHERB: (binary, 0 = no, 1 = yes) Presence of herbaceous plants on the right bank.
- RSHRUB: (binary, 0 = no, 1 = yes) Presence of shrubbery on the right bank.
- RSAPL: (binary, 0 = no, 1 = yes) Presence of saplings on the right bank.
- RTREE: (binary, 0 = no, 1 = yes) Presence of trees on the right bank.
- RUNDCT: (character, N = no, Y = yes) Presence of undercut banks on the right bank.
- LBH: (numeric, m) The left bank height in meters.
- LBSTAB: (character, Stable or Eroding) The left bank stability.
- LBANG: (character, <45, >45, 90) The left bank angle in degrees.
- LHERB: (binary, 0 = no, 1 = yes) Presence of herbaceous plants on the left bank.
- LSHRUB: (binary, 0 = no, 1 = yes) Presence of shrubbery on the left bank.
- LSAPL: (binary, 0 = no, 1 = yes) Presence of saplings on the left bank.
- LTREE: (binary, 0 = no, 1 = yes) Presence of trees on the left bank.
- LUNDCT: (character, N = no, Y = yes) Presence of undercut banks on the left bank.
- COMMENTS: (character) Any comments pertinent to the measurements.
- Data Types: character, numeric, alphanumeric, binary
- Missing Data Value: NA
network_topology.csv
- Number of variables: 10
- Number of rows: 382
- Variable list:
- SITE_ID: (numeric) A unique site identifier for each sample locality.
- COMID: (numeric) The COMID identifier from the NHD+ V2.0 dataset for the stream segment occupied by the sample locality.
- BASE.COMID: (numeric) The COMID identifier from the NHD+ V2.0 dataset for the stream segment at the downstream end of the watershed occupied by the sample locality.
- DCOMID: (numeric) The COMID identifier from the NHD+ V2.0 dataset for the stream segment downstream of the stream segment occupied by the sample locality.
- TDA: (numeric, km^2) The total upstream drainage area, in square kilometers, of the stream segment occupied by the sample site, extracted from the NHD+ V2.0 dataset.
- SSO: (numeric) The Strahler Stream Order of the stream segment occupied by the sample site, extracted from the NHD+ V2.0 dataset. Smaller numbers indicate smaller streams. A first-order tributary has no tributaries. Orders only increase when an equal or larger SSO tributary confluences (1 + 1 = 2, 2 + 1 = 2, 2 + 2 = 3...)
- LINK: (numeric) The link magnitude of the stream segment occupied by the sample site. This is the number of first order tributaries upstream of the sample site, inclusive of the reach in which the sample site occurred.
- CLINK: (numeric) The c-link magnitude of the stream segment occupied by the sample site. This is the number of downstream segments between the stream segment occupied by the sample site (COMID) and the downstream-most segment in the watershed (BASE.COMID), inclusive of both.
- CLINK6: (numeric) A modified version of the c-link magnitude of the stream segment occupied by the sample site. This is the number of downstream segments between the stream segment occupied by the sample site (COMID) and the nearest downstream 6th order stream.
- DLINK: (numeric) The link magnitude of the stream segment immediately downstream of the stream segment occupied by the sample site. This is the number of first order tributaries upstream of the stream segment immediately downstream of the segment occupied by the sample site.
- Data Types: numeric
- Missing Data Value: NA
drsu_ca.rds
A list object of class "cca" containing the the results of a Correspondence Analysis of fish niche structure data using R function cca() from R package "vegan" (Oksanen et al. 2022). Documentation of the structure of object class "prcomp" is available at http://127.0.0.1:19525/library/vegan/html/cca.object.html.
drsu_cca.rds
A list object of class "cca" containing the the results of a Canonical Correspondence Analysis of fish niche structure data constrained by explanatory variables using R function cca() from R package "vegan" (Oksanen et al. 2022). Documentation of the structure of object class "prcomp" is available at http://127.0.0.1:19525/library/vegan/html/cca.object.html.
drsu_cca_aov_results.rds
A serialized list object containing the following (list object notation is in [[n]] format):
- [[1]] The results of a permutation test for the overall CCA model.
- [[2]] The results of a permutation test for each explanatory variable in the CCA model.
- [[3]] The results of a permutation test for each CCA axis in the CCA model.
drsu_nmds.rds
A serialized list object of class "nmds.core" containing the following:
- $final: The optimal nmds solution selected during analysis. This includes all results from function R monoMDS() from R package "vegan" (Oksanen et al. 2022). Documentation of the structure of object class "monoMDS" is available at http://127.0.0.1:19525/library/vegan/html/monoMDS.html.
- $dissimilarities: The original input dissimilarity object.
- $start.coords: The random starting coordinates used for the optimal solution.
- $sample.scores: The solution sample scores.
- $stress: The solution stress (scaled 0 to 1, type 1).
- $seq.stress: A sequence of stress values during each iteration. These are post-hoc simulated and allow visualization of the optimization process.
- $instability: The final instability estimate (variance of stress over time, calculated from $seq.stress).
- $iterations: The maximum number of iterations allowed.
- $rand.starts: the number of random starts used in selecting an optimal solution.
- $proc.prop: The proportion of procrustes-rotated solutions in the overall solution set which are within the user-specified residual dissimilarity of the optimal solution (how complex is the solution landscape?).
- rmse: The solution root mean square error.
- diss: The user-specified dissimilarity type, defaults to "Custom User Specified".
- $attr: The class attributes for the object.
drsu_nmds_stepdown.rds
A serialized list object containing vectors of stress values obtained in NMDS stepdown analyses (list object notation is in [[n]] format):
- [[1]] The stress values for K = 1.
- [[2]] The stress values for K = 2.
- [[3]] The stress values for K = 3.
- [[4]] The stress valeus for K = 4.
- [[5]] The stress values for K = 5.
- [[6]] The stress values for K = 6.
sample_lulc.csv
- Number of variables: 23
- Number of rows: 757
- Variable list:
- COL_ID: (alphanumeric) A unique collections identifier for each collection event, incorporating the year, site ID, and visit number for that year.
- SITE_ID: (numeric) A unique site identifier for each sample locality.
- LULC_YEAR (numeric) The National Land Cover/Land Use Database data year from which land cover observations were drawn for the sample.
- NLCD_11: (numeric) The proportion (0-1) of NLCD land cover class 11 (open water)
- NLCD_12: (numeric) The proportion (0-1) of NLCD land cover class 12 (perennial ice)
- NLCD_21: (numeric) The proportion (0-1) of NLCD land cover class 21 (developed open space)
- NLCD_22: (numeric) The proportion (0-1) of NLCD land cover class 22 (developed low intensity)
- NLCD_23: (numeric) The proportion (0-1) of NLCD land cover class 23 (developed medium intensity)
- NLCD_24: (numeric) The proportion (0-1) of NLCD land cover class 24 (developed high intensity)
- NLCD_31: (numeric) The proportion (0-1) of NLCD land cover class 31 (barren)
- NLCD_41: (numeric) The proportion (0-1) of NLCD land cover class 41 (deciduous forest)
- NLCD_42: (numeric) The proportion (0-1) of NLCD land cover class 42 (evergreen forest)
- NLCD_43: (numeric) The proportion (0-1) of NLCD land cover class 43 (mixed forest)
- NLCD_51: (numeric) The proportion (0-1) of NLCD land cover class 51 (dwarf shrub)
- NLCD_52: (numeric) The proportion (0-1) of NLCD land cover class 52 (shrub/scrub)
- NLCD_71: (numeric) The proportion (0-1) of NLCD land cover class 71 (grasslands/herbaceous)
- NLCD_72: (numeric) The proportion (0-1) of NLCD land cover class 72 (sedge/herbaceous)
- NLCD_73: (numeric) The proportion (0-1) of NLCD land cover class 73 (lichens)
- NLCD_74: (numeric) The proportion (0-1) of NLCD land cover class 74 (moss)
- NLCD_81: (numeric) The proportion (0-1) of NLCD land cover class 81 (pasture/hay)
- NLCD_82: (numeric) The proportion (0-1) of NLCD land cover class 82 (cultivated crops)
- NLCD_90: (numeric) The proportion (0-1) of NLCD land cover class 90 (woody wetland)
- NLCD_95: (numeric) The proportion (0-1) of NLCD land cover class 95 (herbaceous wetland)
- Data Types: alphanumeric, numeric
- Missing Data Value: NA
samples.csv
- Number of variables: 6
- Number of rows: 763
- Variable list:
- COL_ID: (alphanumeric) A unique collections identifier for each collection event, incorporating the year, site ID, and visit number for that year.
- SITE_ID: (numeric) A unique site identifier for each sample locality.
- DATE: (alphanumeric) The date, in MM/DD/YYYY format.
- START: (numeric) The start time in 24-hour format (HHMM).
- END: (numeric) The end time in 24-hour format (HHMM).
- COMMENTS: (character) Any pertinent comments for the sample event.
- Data Types: character, numeric, alphanumeric
- Missing Data Value: NA
- Comments: Record keeping of collections events varied by field team and many values have missing data in this dataset.
sites.csv
- Number of variables: 12
- Number of rows: 369
- Variable list:
- SITE_ID (numeric): A unique site identifier for each sample locality.
- LAT: (numeric) The latitude of the sample event (NAD83, decimal degrees).
- LON: (numeric) The longitude of the sample event (NAD83, decimal degrees).
- STREAM: (character) The name of the stream sampled. Unnamed streams are denoted as tributaries to named streams.
- LOCATION: (character) A text description of the sample locality including nearby landmarks.
- FOREST: (character) The U.S. Forest Service national forest name with which the sample locality is associated (some sample localities occurred in USFS management boundaries, but not within the national forests themselves).
- BASIN: (character) The hydrologic basin the sample occurred in, as defined by Ross et al. (2001).
- SYSTEM: (character) The stream system the sample occurred in, as defined by Ross et al. (2001).
- HUC10: (numeric) The numeric code for the hydrologic unit of conservation level 10 in which the sample occurred.
- HUC10_NAME: (character) The name for the hydrologic unit of conservation level 10 in which the sample occurred.
- HUC10_PROXY: (character) The hydrologic unit of conservation level 10 to which a sample was assigned for analysis purposes. Samples occurring in HUC10s with few other samples were pooled with other adjacent and hydrologically connected HUC10s where possible.
- GROUP: (alphanumeric) The group name for the pooled HUC10s for analysis. A group name of G-NA indicates that the samples were not included in meta-analyses or aggregate analyses, although they were included in global analyses.
- Data Types: character, numeric, alphanumeric
- Missing Data Value: NA
traits.csv
- Number of variables: 31
- Number of rows: 118
- Variable list:
- Species: (character) The scientific name of the species whose traits are specified.
- H:RF: (binary) Utilization of riffle habitat.
- H:RN: (binary) Utilization of run habitat.
- H:PL: (binary) Utilization of pool habitat.
- H:BW: (binary) Utilization of backwater habitat.
- H:OC: (binary) Utilization of off-channel and floodplain habitats.
- F:BD: (binary) Use of a foraging strategy involving disturbing substrates (benthic disturbance, sensu Matthews and Marsh-Matthews, 2017).
- F:BC: (binary) Use of crevices in the benthic surface (stream bed) for foraging.
- F:BS: (binary) Use of the benthic surface (stream bed) for foraging.
- F:PV: (binary) Use of the pelagic region (above the stream bed, below the surface) associated with vegetation or other structural elements for foraging.
- F:PO: (binary) Use of the open pelagic region (above the stream bed, below the surface, no associated structural elements) for foraging.
- F:NU: (binary) Use of the neustonic zone (water surface) for foraging.
- C:NO: (binary) No use of cover or structural elements in habitat selection.
- C:AV: (binary) Use of aquatic vegetation in habitat selection.
- C:WD: (binary) Use of woody elements in habitat selection.
- P:NC: (binary) Foraging during nocturnal periods (low to low light).
- P:EC: (binary) Foraging during evening crepuscular periods (decreasing light).
- P:MC: (binary) Foraging during morning crepuscular periods (increasing light).
- P:DI: (binary) Foraging during diurnal periods (high light).
- T:DET: (binary) Foraging on detritus/plant matter.
- T:ALG: (binary) Foraging on algae or nonvascular plants.
- T:VPL: (binary) Foraging on vascular plants.
- T:MEI: (binary) Foraging on meiofauna.
- T:AMI: (binary) Foraging on aquatic macroinvertebrates (excluding freshwater decapods and mollusks).
- T:TRI: (binary) Foraging on terrestrial invertebrates.
- T:CRU: (binary) Foraging on freshwater decapods.
- T:MOL: (binary) Foraging on freshwater mollusks.
- T:FSH: (binary) Foraging on other fishes.
- T:CAR: (binary) Foraging on carrion/animal remains.
- Surrogate for periodicity?: (character) Whether or not surrogate species were used to determine periodicity.
- Sources: (alphanumeric) Source footnotes for data, source numbers reference the file "trait_sources.csv".
- Comments: (character) Any pertinent comments regarding the data.
- Data Types: character, alphanumeric, binary (0 = no, 1 = yes)
- Missing Data Value: NA
trait_sources.csv
- Number of variables: 2
- Number of rows: 12
- Variable list:
- Number: (numeric) The reference ID number, refers to data in the file "traits.csv"
- Source: (character) The specific reference from which data were drawn
- Data Types: character, numeric
- Missing Data Value: NA
Associated Code and Software
accessory_functions.R
This script contains functions and programs necessary for analyses in the associated script "analysis_script.R".
R Environment for Statistical Computing
R 4.1.0
- plotrix
- vegan
analysis_script.R
This script contains the code necessary to replicate analyses in manuscripts associated with this dataset, as well as the code necessary to replicate the rds files in this dataset.
R Environment for Statistical Computing
R 4.1.0
- betapart
- plotrix
- ppcor
- vegan
- nlme
- lme4
output_script.R
This script contains code which outputs text, tabular, and graphical products from the file "analysis_script.R".
R Environment for Statistical Computing
R 4.1.0
- plotrix
Quantitative fish and habitat data were collected at 369 sites in headwater streams draining five national forests in Mississippi (Bienville, DeSoto, Holly Springs, Homochitto, and Tombigbee National Forests). Collections occured from 1999-2003, 2008-2009, and 2015-2022, with 40-50 sites sampled yearly. Sampling at each site encompassed a stream reach 30x the mean wetted width, bounded between 120 and 240m. Fishes were collected with a combination of single-pass backpack electrofishing and single-pass seining. Habitat data were collected using a standardized point-transect method, with points taken at 1m intervals on 12 transects perpendicular to the channel width. Habitat data quantification included metrics of channel morphology and meter-scale habitat characteristics.
Watershed-scale topology metrics and land use/land cover metrics were calculated for each site and each sampling event. Topology metrics (metrics of stream size and watershed network position of the sample site) were derived from the NHD+ V2 dataset. Land use data for watersheds upstream of sites were drawn from the National Land Use/Land Cover database (MRLC), using data from the closest time period for each sample event.
A resource use database was constructed for each of the 117 species in the dataset. This database records known resource utilization on five gradients documented to be important for resource partitioning by stream fishes: gross microhabitat type on a fluvial in-channel to lacustrine off-channel gradient, foraging depth, foraging diel periodicity, cover utilization, and gross trophic categories. Species were scored 0 (no use) or 1 (use) for ranked ordinal categories within each gradient by consulting regional literature reviews as well as observations during field sampling events.
