Data from: Environmental and spatial effects on co-occurrence network size and taxonomic similarity in stream diatoms, insects, and fish
Data files
Nov 17, 2024 version files 11.79 MB
-
Diatoms_genera_counts_and_environmental_data.csv
927.76 KB
-
Diatoms_genera_taxonomy_dataset.csv
17.61 KB
-
Diatoms_species_counts_and_environmental_data.csv
4.66 MB
-
Diatoms_species_taxonomy_dataset.csv
224.05 KB
-
Fish_genera_counts_and_environmental_data.csv
852.90 KB
-
Fish_genera_taxonomy_dataset.csv
12.12 KB
-
Fish_species_counts_and_environmental_data.csv
2.36 MB
-
Fish_species_taxonomy_dataset.csv
75.13 KB
-
Insects_genera_counts_and_environmental_data.csv
2.60 MB
-
Insects_genera_taxonomy_data.csv
55.09 KB
-
README.md
8.92 KB
Abstract
Aim: The influences of environmental and spatial processes on species composition have been at the center of metacommunity ecology. Conversely, the relative importance of these processes for species co-occurrences and taxonomic similarity has remained poorly understood. We hypothesized that at a subcontinental scale, shared environmental preference would be the major driver of co-occurrences across species groups. In contrast, co-occurrences due to shared dispersal history were more likely in dispersal-limited taxa. Finally, we tested whether taxa co-occurring due to similar responses to environmental and spatial processes were more taxonomically similar than expected by chance.
Location: The conterminous United States
Time Period: 1993-2019
Major taxa studied: Stream diatoms, insects, and fish
Methods: We generated co-occurrence networks and developed methodology to determine the proportions of nodes and edges explained by pure environment alone (after accounting for space), pure space alone (after accounting for the environment), pure environment and pure space together, and spatially structured environment. Taxonomic similarity of taxa co-occurring because of environmental and/or spatial controls or because of unmeasured processes was compared to that of a null model.
Results: Pure environment alone, spatially structured environment, and pure environment and pure space together explained the greatest proportion of nodes and edges in the co-occurrence networks of diatom species and genera, and insect genera. Conversely, pure environment and pure space together best explained the nodes and edges in the co-occurrence network of fish species and genera. Co-occurring taxa were more closely related than the random expectation in all 30 cases.
Main Conclusions: The environment controlled co-occurrences of all groups, while the influence of space was the strongest in fish, the most dispersal-limited group in our study. All co-occurring taxa were more taxonomically related than expected by chance due to environmental or spatial overlap or unaccounted factors.
README: Data from: Environmental and spatial effects on co-occurrence network size and taxonomic similarity in stream diatoms, insects, and fish
https://doi.org/10.5061/dryad.j3tx95xq6
Description of the data and file structure
We provide species by site matrices for 5 metacommunities of diatoms (species and genera), insects (genera), and fish (species and genera), in the files named “‘Taxon’ Dataset.csv”. We obtained these data from Passy et al., (2023), including diatoms, insects, and fish, collected from, respectively, 1698, 1700, and 1700 stream sites across the conterminous United States (Fig. S1). Sites were compiled from both the National Water-Quality Assessment (NAWQA) program of the US Geological Survey and the National Rivers and Streams Assessment (NRSA) of the US Environmental Protection Agency, which used similar collection methods (Moulton II et al., 2002; US Environmental Protection Agency, 2013). Streams were sampled between 1993 and 2019, with the majority of samples collected between 2007 and 2010. Diatoms and insects were collected from a predetermined area of substrate from May to September. Fish were sampled throughout the year using backpack electrofishing and seining. Community data consisted of counts of species or a lower taxonomic category in diatoms and fish, but genera in insects. To standardize the sampling effort, diatom counts were sampled down to 400 cells, and the insect and fish counts, to 100 individuals. We accounted for differences in taxonomic resolution by examining all datasets at a genus-level in addition to the species-level for diatoms and fish.
We selected our 1698-1700 sites from larger datasets of 2278 diatom, 2270 insect, and 2296 fish samples, ensuring similar environmental conditions, a minimum pairwise distance between any two sites of 1 km, and comparable average pairwise distance among datasets (1522, 1497, and 1491 km, respectively). Pairwise distances between all sites were calculated in km using the ‘distm’ function in the R package ‘geosphere’ (Hijmans et al., 2019). Each site had available physicochemical data on pH, specific conductance (µS/cm), water temperature (C°), nitrate (NO3, µg/L), and total phosphorous (TP, µg/L) from NAWQA or NRSA. Elevation (m) was obtained from the WorldClim database (Fick & Hijmans, 2017) and used to calculate slope (% grade), averaged across a 5 km buffer around each site with the package ‘raster’ (Hijmans et al., 2020). There were 19 climatic variables which described temperature and precipitation minima, maxima, averages, ranges, and seasonality across a 5 km buffer around each site (WorldClim V1.4) (Hijmans et al., 2005).
We additionally assembled higher-order taxa for each metacommunity, in the files names “‘Taxon’ Taxonomy Dataset.csv”. consisting of phylum, class, order, family, genus, species, subspecies, variety, and form for diatoms; phylum, class, order, family, and genus for insects, and finally, phylum, class, order, family, genus, species and subspecies for fish.
Files and variables
File: Species by site datasets:
Dataset names:
Diatom_species_counts_and_environmental_data.csv
Diatom_genera_counts_and_environmental_data.csv
Insects_genera_counts_and_environmental_data.csv
Fish_species_counts_and_environmental_data.csv
Fish_genera_counts_and_environmental_data.csv
Description: All species by site datasets include site specific location, physiochemical and climatic data. They follow the following format:
Variables
- SampleID: Unique Sample identifier
- SiteID: Unique site identifier
- Latitude: decimal degree
- Longitude: decimal degree
- Date: Date of sample collection
- Slope_prcnt: Stream gradient in percent grade
- pH: standard units
- WaterTemperature_C: Water temperature in degrees Celsius
- Conductivity_uS_cm: Specific conductivity in µS/cm
- Nitrate_ug_L: nitrate (NO3, µg/L)
- TotalPhosphorus_ug_L: total phosphorous (TP, µg/L)
- Elevation_m:
- Mean_Temp: Mean annual air temperature (C°)
- Mean_Diurnal_Range: Mean diurnal annual range in air temperature (C°)
- Isothermality: (C°)
- Temp_Seas: standard deviation of air temperature (C°)
- Max_Temp_Mo: maximum air temperature (C°)
- Min_Temp_Mo: minimum air temperature (C°)
- Temp_Range: Temperature Annual Range (C°)
- Mean_Temp_Wet: Mean temperature of the wettest month (C°)
- Mean_Temp_Dry: Mean temperature of the driest month (C°)
- Mean.Temp_Warmest_Q: Mean temperature of the warmest quarter (C°)
- Mean_Temp_Coldest_Q: Mean temperature of the coldest quarter (C°)
- Annual_Precip: annual precipitation (mm)
- Precip_Wettest_Mo: precipitation of the wettest month (mm)
- Precip_Driest_Mo: precipitation of the driest month (mm)
- Precip_Seas: coefficient of variation in precipitation
- Precip_Wettest_Q: precipitation of the wettest quarter (mm)
- Precip_Driest_Q: Precipitation of the driest quarter (mm)
- Precip_Warmest_Q: precipitation of the warmest quarter (mm)
- Precip_Coldest_Q: precipitation of the coldest quarter (mm)
- HUC3: 3 digit Hydrological unit code each site occurs in
- HUC2: 2 digit hydrological unit code each site occurs in
- All rest are site specific abundances of individual taxa.
Files: Taxon_Taxonomy_Dataset.csv
Dataset names:
- Diatoms species taxonomy dataset.csv
- Diatoms genera taxonomy dataset.csv
- Insects genera taxonomy dataset.csv
- Fish species taxonomy dataset.csv
- Fish genera taxonomy dataset.csv
Description: These datasets contain higher level taxonomy of all species or genera in the species by site matrices. These are used to calculate taxonomic similarity. Note that a unique value is needed in all cells in order to build a taxonomic tree. In cases where a particular taxa does not a have a lowest possible order classification (e.g., a diatom species without a form), a unique value was put there instead. The datasets follow the following general format:
Variables
- SpeciesDots: Species names with spaces in the species name replaced with periods (‘.’), to account for how R imports csv column headers. For example “Lepomis macrochirus” becomes “Lepomis.macrochirus”.
- Species1: Species names with spaces
- Kingdom:
- Phylum:
- Class:
- Order:
- Family:
- Genus:
- Species: Only for fish and diatom species
- Subspecies: Only for fish and diatom species
- Variety: Only for diatom species
- Form: Only for diatom species
Code/software
R Code: We used R version 4.2.2 for all analyses.
File 1 - Environment Variable selection and Residual Calculation: Takes in environmental data and uses RDA with forward selection to select the 5 best environmental variables for each dataset, using presence-absence data. This code also calculates pairwise distances between all sites, and a PCA of these distances. This code then performs linear regression to obtain the residuals of either the environmental variables or spatial variables. These variables, both the residuals (in the case of pure space and pure environment) and the untransformed variables (in the case of environment and space) are then used in BRTs.
Packages:
Vegan 2.6-4
Geosphere 1.5-18
adespatial 0.3-23
File 2: 2 - Boosted Regression Trees: Takes the species by site data as well as the selected and appropriately transformed environmental and spatial variables from code 1 and performs boosted regression trees (BRTs). These BRTs are a form of species distribution model and their outputs: area under the curve, and probability of occurrence matrices are used in step 3.
Packages:
Vegan 2.6-4
dismo 1.3-14
gbm 2.1.9
PresenceAbsence: 1.1.11
File 3: 3 - Convert BRT results to probability of detection matrix: Selects which taxa were adequately fit by the BRTs (those with AUC > 0.7) and multiples their observed abundance by the probability of occurrence. This creates an abundance matrix based on the site suitability predicted by BRTs. This is used in step 4. No packages are required in step 3
File 4: Netassoc: Take observed abundance data, probability of detection matrices, AUC matrices and taxonomic tree data to calculate the co-occurrence networks, the controlled co-occurrence networks and taxonomic similarity. Netassoc is the null model in which a pure spatial, pure environmental or environment plus space control is put on co-occurrences, and those co-occurrences that could be explained by these controls are retained. The outputted networks, and taxonomic similarities are the final products, and are represented in the results in the associated manuscript.
Packages:
Vegan 2.6-4
netassoc 0.7.0
corpcor 1.6.10
Matrix 1.6-5
RMThreshold 1.1
rpart 4.1.19
igraph: 2.0.1.1
Access information
Data was derived from the following sources:
- Passy et al. (2023), available at 10.5061/dryad.4tmpg4fdq.
Methods
We obtained data from Passy et al., (2023), including diatoms, insects, and fish, collected from, respectively, 1698, 1700, and 1700 stream sites across the conterminous United States (Fig. S1). Sites were compiled from both the National Water-Quality Assessment (NAWQA) program of the US Geological Survey and the National Rivers and Streams Assessment (NRSA) of the US Environmental Protection Agency, which used similar collection methods (Moulton II et al., 2002; US Environmental Protection Agency, 2013). Streams were sampled between 1993 and 2019, with the majority of samples collected between 2007 and 2010. Diatoms and insects were collected from a predetermined area of substrate from May to September. Fish were sampled throughout the year using backpack electrofishing and seining. Community data consisted of counts of species or a lower taxonomic category in diatoms and fish, but genera in insects. To standardize the sampling effort, diatom counts were sampled down to 400 cells, and the insect and fish counts, to 100 individuals. We accounted for differences in taxonomic resolution by examining all datasets at a genus level in addition to the species level for diatoms and fish.
We selected our 1698-1700 sites from larger datasets of 2278 diatom, 2270 insect, and 2296 fish samples, ensuring similar environmental conditions, a minimum pairwise distance between any two sites of 1 km, and comparable average pairwise distance among datasets (1522, 1497, and 1491 km, respectively), which we provide here. Pairwise distances between all sites were calculated in km using the ‘distm’ function in the R package ‘geosphere’ (Hijmans et al., 2019). Each site had available physicochemical data on pH, specific conductance (µS/cm), water temperature (C°), nitrate (NO3, µg/L), and total phosphorous (TP, µg/L) from NAWQA or NRSA. Elevation (m) was obtained from the WorldClim database (Fick & Hijmans, 2017) and used to calculate slope (% grade), averaged across a 5 km buffer around each site with the package ‘raster’ (Hijmans et al., 2020). There were 19 climatic variables that described temperature and precipitation minima, maxima, averages, ranges, and seasonality across a 5 km buffer around each site (WorldClim V1.4) (Hijmans et al., 2005). Environmental variables were ln-transformed if normality was improved. The slope was arcsine square root-transformed. All variables were standardized (mean = 0, standard deviation = 1).
We also assembled higher-order taxa for each metacommunity, consisting of phylum, class, order, family, genus, species, subspecies, variety, and form for diatoms; phylum, class, order, family, and genus for insects, and finally, phylum, class, order, family, genus, and species for fish.