Fine-grain predictions are key to accurately represent continental-scale biodiversity patterns
Data files
Nov 21, 2024 version files 65.27 GB
-
ebird_breeding_CEA.csv
9.70 GB
-
ebird_nonbreeding_CEA.csv
14.08 GB
-
ebird_postbreeding_migration_CEA.csv
10.14 GB
-
ebird_prebreeding_migration_CEA.csv
18.69 GB
-
habitat_prediction-surface_1.csv
11.42 GB
-
habitat_prediction-surface_10.csv
90.91 MB
-
habitat_prediction-surface_3.csv
807.51 MB
-
habitat_prediction-surface_5.csv
320.04 MB
-
habitat_prediction-surface_50.csv
5.18 MB
-
README.md
18.60 KB
-
sdm_bird_outputs_VS3.1_complete.csv
9.30 MB
-
species_list.csv
128.96 KB
Abstract
Aim
As global change accelerates, accurate predictions of species distributions and biodiversity patterns are critical to limit biodiversity loss. Numerous studies have found that coarse-grain species distribution models (SDMs) perform poorly relative to fine-grain models because they mismatch environmental information with observations. However, it remains unclear how grain-size biases vary in intensity across space and time, possibly generating inaccurate predictions for specific regions, seasons or species. For example, coarse-grain biases may intensify in patchy, discontinuous landscapes. Such biases may accumulate to produce highly misleading estimates of continental and seasonal biodiversity patterns.
Location: United States and Canada
Time Period: 2004-2021
Major taxa studied: Birds (Aves)
Methods
We fit presence-absence SDMs characterizing the summer and winter distributions of 572 bird species native to the US and Canada across five spatial grains from 1 to 50 km, using observations from the eBird citizen science initiative. We combined these predictions to generate seasonal biodiversity estimates across the US and Canada, which we validated using observations from 322 independent sites.
Results
We find that in both seasons, 1km models more accurately predicted species presence, absence, and richness at local sites. Coarse-grain models (even at 3 km) consistently under-predicted range area, potentially missing important habitat. This bias intensified during summer (83-86% of species) when many birds have smaller ‘operational scales’ via localized home ranges while breeding. Biases were greatest in desert regions with patchier habitat and for range-restricted and habitat specialist species. Predictions based on coarse-grain models overpredicted avian diversity in the west and underpredicted it in the great plains, prairie pothole region and boreal zones.
Main conclusions
We demonstrate that coarse-grain models can bias seasonal and continental estimates of biodiversity patterns across space and time and that grain-related biases intensify during summer and in patchier landscapes, especially for range-restricted and habitat specialist species at risk of population declines.
README: Fine-grain predictions are key to accurately represent continental-scale biodiversity patterns
https://doi.org/10.5061/dryad.mw6m9065c
Description of the data and file structure
## DATA FILES
ebird_prebreeding_migration_CEA.csv
ebird_nonbreeding_CEA.csv
ebird_postbreeding_migration_CEA.csv
ebird_breeding_CEA.csv
Processed, filtered eBird data for March-May, December-February, September-November, or June-August (respectively, in order of files listed) with spatially linked environmental covariates (landcover, climate, and topographic variables) at all resolutions (1, 3, 5, 10, 50km, as suffix of each variable), as well as checklist-level temporal and effort covariates. eBird data is structured with rows for each checklist (sampling event) and columns for presence (1) or absence (0) for every species. Cells containing NA have no associated environmental information available at a given location.
Column IDs
Checklist_id: ID of observation
Locality_id: Location ID
y: vertical axis spatial coordinate
x: horizontal axis spatial coordinate
year: year of observation
day_of_year: julian date of observation
time_observations_started: time of day of observation
duration_minutes: duration of observation in minutes
effort_distance_km: distance traveled during observation
number_observers: number of observers creating the observation
`Dendrocygna autumnalisuntil
Passerina versicolor`: presence (1) or absence (0) for a given species
bio1_1: mean temperature of climate where observation occurred, measured at 1km
bio12_1: total annual precipitation in mm in climate where observation occurred, measured at 1km
bio15_1: precipitation seasonality in climate where observation occurred, measured at 1km
cloudsd_1: cloud cover variability in climate where observation occurred, measured at 1km
evisum_1: enhanced vegetation index in climate where observation occurred, measured at 1km
twi_1: terrain wetness index where observation occurred, measured at 1km
tri_1: topographic roughness index where observation occurred, measured at 1km
elev_1: elevation where observation occurred, measured at 1km
pland_10_cropland_rainfed_1: percentage of 1km cell where observation occurred with rainfed cropland landcover type (type 1)
pland_100_mosaic_tree_shrub_1: percentage of 1km cell where observation occurred with mosaic tree/shrub landcover type
pland_11_cropland_rainfed_1: percentage of 1km cell where observation occurred with rainfed cropland landcover type (type 2)
pland_110_mosaic_herbacious_1: percentage of 1km cell where observation occurred with mosaic herbaceous landcover type
pland_12_cropland_rainfed_1: percentage of 1km cell where observation occurred with rainfed cropland landcover type (type 3)
pland_120_shrubland_1: percentage of 1km cell where observation occurred with shrubland landcover type (type 1)
pland_121_shrubland_1: percentage of 1km cell where observation occurred with shrubland landcover type (type 2)
pland_122_shrubland_1: percentage of 1km cell where observation occurred with shrubland landcover type (type 3)
pland_130_grassland_1: percentage of 1km cell where observation occurred with grassland landcover type
pland_140_lichens_mosses_1: percentage of 1km cell where observation occurred with lichens/mosses landcover type
pland_150_sparse_1: percentage of 1km cell where observation occurred with sparse landcover type (type 1)
pland_152_sparse_1: percentage of 1km cell where observation occurred with sparse landcover type (type 2)
pland_153_sparse_1: percentage of 1km cell where observation occurred with sparse landcover type (type 3)
pland_160_flooded_freshwater_1: percentage of 1km cell where observation occurred with flooded freshwater landcover type
pland_170_flooded_saltwater_1: percentage of 1km cell where observation occurred with flooded saltwater landcover type
pland_180_flooded_shrub_1: percentage of 1km cell where observation occurred with flooded shrub landcover type
pland_190_urban_1: percentage of 1km cell where observation occurred with urban landcover type
pland_20_cropland_irrigated_1: percentage of 1km cell where observation occurred with irrigated cropland landcover type
pland_200_barren_1: percentage of 1km cell where observation occurred with barren landcover type (type 1)
pland_201_barren_1: percentage of 1km cell where observation occurred with barren landcover type (type 2)
pland_202_barren_1: percentage of 1km cell where observation occurred with barren landcover type (type 3)
pland_210_water_1: percentage of 1km cell where observation occurred with water landcover type
pland_220_ice_1: percentage of 1km cell where observation occurred with ice landcover type
pland_30_mosaic_cropland_1: percentage of 1km cell where observation occurred with mosaic cropland landcover type
pland_40_mosaic_natural_veg_1: percentage of 1km cell where observation occurred with mosaic natural vegetation landcover type
pland_50_evergreen_broadleaf_1: percentage of 1km cell where observation occurred with evergreen broadleaf landcover type
pland_60_deciduous_broadleaf_1: percentage of 1km cell where observation occurred with deciduous broadleaf landcover type (type 1)
pland_61_deciduous_broadleaf_1: percentage of 1km cell where observation occurred with deciduous broadleaf landcover type (type 2)
pland_62_deciduous_broadleaf_1: percentage of 1km cell where observation occurred with deciduous broadleaf landcover type (type 3)
pland_70_evergreen_needleleaf_1: percentage of 1km cell where observation occurred with evergreen needleleaf landcover type (type 1)
pland_71_evergreen_needleleaf_1: percentage of 1km cell where observation occurred with evergreen needleleaf landcover type (type 2)
pland_72_evergreen_needleleaf_1: percentage of 1km cell where observation occurred with evergreen needleleaf landcover type (type 3)
pland_80_deciduous_needleleaf_1: percentage of 1km cell where observation occurred with deciduous needleleaf landcover type (type 1)
pland_81_deciduous_needleleaf_1: percentage of 1km cell where observation occurred with deciduous needleleaf landcover type (type 2)
pland_82_deciduous_needleleaf_1: percentage of 1km cell where observation occurred with deciduous needleleaf landcover type (type 3)
pland_90_mixed_forest_1: percentage of 1km cell where observation occurred with mixed forest landcover type
bio1_5 to pland_90_mixed_forest_3: same as above, but summarized to cells with grain size equaling the suffix of the column name in km
habitat_prediction_surface_1
habitat_prediction_surface_3
habitat_prediction_surface_5
habitat_prediction_surface_10
habitat_prediction_surface_50
Prediction surface for North America at the grid cell level (rows) with associated environmental features as columns (see list in methods) at 1, 3, 5, 10, or 50 km grain (given in file suffix). Cells containing NA have no associated environmental information available at a given location.
Column IDs
x: horizontal axis spatial coordinate
y: vertical axis spatial coordinate
elev: elevation of cell
bio1: mean temperature of climate where observation occurred, measured at 1km
bio12: total annual precipitation in mm in climate of cell, measured at 1km
bio15: precipitation seasonality in climate of cell, measured at 1km
cloudsd: cloud cover variability in climate of cell, measured at 1km
evisum: enhanced vegetation index in climate of cell, measured at 1km
twi: terrain wetness index of cell, measured at 1km
tri: topographic roughness index of cell, measured at 1km
elev: elevation of cell, measured at 1km
pland_10_cropland_rainfed: percentage of 1km cell with rainfed cropland landcover type (type 1)
pland_100_mosaic_tree_shrub: percentage of 1km cell with mosaic tree/shrub landcover type
pland_11_cropland_rainfed: percentage of 1km cell with rainfed cropland landcover type (type 2)
pland_110_mosaic_herbacious: percentage of 1km cell with mosaic herbaceous landcover type
pland_12_cropland_rainfed: percentage of 1km cell with rainfed cropland landcover type (type 3)
pland_120_shrubland: percentage of 1km cell with shrubland landcover type (type 1)
pland_121_shrubland: percentage of 1km cell with shrubland landcover type (type 2)
pland_122_shrubland: percentage of 1km cell with shrubland landcover type (type 3)
pland_130_grassland: percentage of 1km cell with grassland landcover type
pland_140_lichens_mosses: percentage of 1km cell with lichens/mosses landcover type
pland_150_sparse: percentage of 1km cell with sparse landcover type (type 1)
pland_152_sparse: percentage of 1km cell with sparse landcover type (type 2)
pland_153_sparse: percentage of 1km cell with sparse landcover type (type 3)
pland_160_flooded_freshwater: percentage of 1km cell with flooded freshwater landcover type
pland_170_flooded_saltwater: percentage of 1km cell with flooded saltwater landcover type
pland_180_flooded_shrub: percentage of 1km cell with flooded shrub landcover type
pland_190_urban: percentage of 1km cell with urban landcover type
pland_20_cropland_irrigated: percentage of 1km cell with irrigated cropland landcover type
pland_200_barren: percentage of 1km cell with barren landcover type (type 1)
pland_201_barren: percentage of 1km cell with barren landcover type (type 2)
pland_202_barren: percentage of 1km cell with barren landcover type (type 3)
pland_210_water: percentage of 1km cell with water landcover type
pland_220_ice: percentage of 1km cell with ice landcover type
pland_30_mosaic_cropland: percentage of 1km cell with mosaic cropland landcover type
pland_40_mosaic_natural_veg: percentage of 1km cell with mosaic natural vegetation landcover type
pland_50_evergreen_broadleaf: percentage of 1km cell with evergreen broadleaf landcover type
pland_60_deciduous_broadleaf: percentage of 1km cell with deciduous broadleaf landcover type (type 1)
pland_61_deciduous_broadleaf: percentage of 1km cell with deciduous broadleaf landcover type (type 2)
pland_62_deciduous_broadleaf: percentage of 1km cell with deciduous broadleaf landcover type (type 3)
pland_70_evergreen_needleleaf: percentage of 1km cell with evergreen needleleaf landcover type (type 1)
pland_71_evergreen_needleleaf: percentage of 1km cell with evergreen needleleaf landcover type (type 2)
pland_72_evergreen_needleleaf: percentage of 1km cell with evergreen needleleaf landcover type (type 3)
pland_80_deciduous_needleleaf: percentage of 1km cell with deciduous needleleaf landcover type (type 1)
pland_81_deciduous_needleleaf: percentage of 1km cell with deciduous needleleaf landcover type (type 2)
pland_82_deciduous_needleleaf: percentage of 1km cell with deciduous needleleaf landcover type (type 3)
pland_90_mixed_forest: percentage of 1km cell with mixed forest landcover type
sdm_bird_outputs_VS3.1_complete.csv
Species-level distribution model outputs. Each row is a species and columns contain model type information, including species modeled, modeling season, model grain (scale), model version, number of observations, estimated presence cells, model metrics (AUC, SPS threshold, TSS, sensitivity, specificity), associated file paths (may be NA if not relevant), and Random forest importance score for all model variables (any variable with _imp suffix). Cells containing NA have no associated model or path or information for a given species.
Column IDs
common_name: common name of species
sciname: scientific name of species
season: season at which model was fit
scale: grain size at which model was fit
version: model version
modName: model name
modURL to elevOffset columns: not meaningful (all NAs)
noPts: number of sampling points in model
range_area: estimated range size from model prediction
deltaAUC: not meaningful (all NAs)
AUC: model area under the curve
SPSThresh: model threshold value
TSS: model true skill score
Sensitivity: model sensitivity
Specificity: model specificity
modPathROR: file path for relative occurrence rate output
modPathPA: file path for presence/absence output
rangePath: file path for model range output
ptsPath: file path for model points output
ptsBgPath: file path for background points output
confMatPath: file path for confusion matrix
POPredsPath: file path for presence/absence predictions
threshPredsPath: file path for thresholded predictions
POPredsRasPath: file path for presence/absence raster
threshPredsRasPath: file path for thresholded raster
envVars: not meaningful (all NAs)
year_imp: importance score for year
day_of_year_imp: importance score for julian date
time_observations_started_imp: importance score for time observations started
duration_minutes_imp: importance score for duration of survey
effort_distance_km_imp: importance score for distance traveled
number_observers_imp: importance score for number of observers
bio1_imp: importance score for mean temperature in climate
bio12_imp: importance score for total precipitation in climate
bio15_imp: importance score for precipitation seasonality
cloudsd_imp: importance score for cloud cover variability
evisum_imp: importance score for enhanced vegetation index
twi_imp: importance score for terrain wetness index
tri_imp: importance score for topographic roughness index
elev_imp: importance score for elevation
pland_10_cropland_rainfed_imp: importance score for cropland (rainfed) landcover (type 1)
pland_100_mosaic_tree_shrub_imp: importance score for mosaic tree/shrub landcover
pland_11_cropland_rainfed_imp: importance score for cropland (rainfed) landcover (type 2)
pland_110_mosaic_herbacious_imp: importance score for mosaic-herbaceous landcover
pland_12_cropland_rainfed_imp: importance score for cropland (rainfed) landcover (type 3)
pland_120_shrubland_imp: importance score for shrubland landcover (type 1)
pland_121_shrubland_imp: importance score for shrubland landcover (type 2)
pland_122_shrubland_imp: importance score for shrubland landcover (type 3)
pland_130_grassland_imp: importance score for grassland landcover
pland_140_lichens_mosses_imp: importance score for lichens/mosses landcover
pland_150_sparse_imp: importance score for sparse landcover (1)
pland_152_sparse_imp: importance score for sparse landcover (2)
pland_153_sparse_imp: importance score for sparse landcover (3)
pland_160_flooded_freshwater_imp: importance score for flooded freshwater landcover
pland_170_flooded_saltwater_imp: importance score for flooded saltwater landcover
pland_180_flooded_shrub_imp: importance score for shrub landcover
pland_190_urban_imp: importance score for urban landcover
pland_20_cropland_irrigated_imp: importance score for irrigated cropland landcover
pland_200_barren_imp: importance score for barren landcover (1)
pland_201_barren_imp: importance score for barren landcover (2)
pland_202_barren_imp: importance score for barren landcover (3)
pland_210_water_imp: importance score for water landcover
pland_220_ice_imp: importance score for ice landcover
pland_30_mosaic_cropland_imp: importance score for mosaic cropland landcover
pland_40_mosaic_natural_veg_imp: importance score for natural vegetation landcover
pland_50_evergreen_broadleaf_imp: importance score for evergreen broadleaf landcover (1)
pland_60_deciduous_broadleaf_imp: importance score for deciduous broadleaf landcover (1)
pland_61_deciduous_broadleaf_imp: importance score for deciduous broadleaf landcover (2)
pland_62_deciduous_broadleaf_imp: importance score for deciduous broadleaf landcover (3)
pland_70_evergreen_needleleaf_imp: importance score for evergreen needleleaf landcover (1)
pland_71_evergreen_needleleaf_imp: importance score for evergreen needleleaf landcover (2)
pland_72_evergreen_needleleaf_imp: importance score for evergreen needleleaf landcover (3)
pland_80_deciduous_needleleaf_imp: importance score for deciduous needleleaf landcover (1)
pland_81_deciduous_needleleaf_imp: importance score for deciduous needleleaf landcover (2)
pland_82_deciduous_needleleaf_imp: importance score for deciduous needleleaf landcover (3)
pland_90_mixed_forest_imp: importance score for mixed forest landcover
RDataPath: not meaningful (all NAs)
rorOrigPath: not meaningful (all NAs)
paOrigPath: not meaningful (all NAs)
rangeOrigPath: not meaningful (all NAs)
ptsBgOrigPath: not meaningful (all NAs)
statsOrigPath: not meaningful (all NAs)
species_list.csv
Index of all candidate species. Cells containing NA have no information about extinction for a given species.
Column IDs
species name: species name
common name: common name
Abbr: four letter species abbreviation
Code: species rarity code- 1 (common) or 2 (uncommon)
ebird_code: species six letter code
TAXON_ORDER_CODE: order code
category: taxonomic category
range: description of range
order: taxonomic order
family: taxonomic family
eBird species group: species group description
waterbird: (not used in manuscript)
extinction: (not used in manuscript)
extinct year: (not used in manuscript)
IW: (not used in manuscript)
Sharing/access info
eBird data can be downloaded via the Cornell lab of Ornithology webiste at www.ebird.org
Environmental data can be downloaded from CHELSA (https://chelsa-climate.org/), MODIS (https://lpdaac.usgs.gov/products/mod11a1v006/) ESA CCI (https://climate.esa.int/en/) and EarthEnv (https://www.earthenv.org/)
AVONET trait data is available at https://onlinelibrary.wiley.com/doi/full/10.1111/ele.13898
CODE FILES
1. modeling_ranger_scale.R - species distribution modeling workflow
2. aggregate_scale.R - aggregating species-level predictions to biodiversity estimations
3. patterns.R - trends across species, regions, and seasons, and generating all cross-species figures
4. validations_sp_accuracy.R - code to run site-level validations
5. simulation.R - code to run the simulation in fig. 1 (no data needed)
6. ebird_exploration.R - code to explore how filtering/thinning alters ebird dataset for fig S1 and table S1
Code/software
R 3.4.0 and Microsoft Excel