Data from: Scale dependence of bird diversity in London
Abstract
Understanding drivers of biodiversity in cities can be mutually beneficial for ecosystems and people. Crowd-sourced bird observations provide an opportunity to assess how patterns of bird diversity change across observation scales and suggest driving processes. We assessed the scale dependence of bird diversity within a 128 × 128 km extent over London’s urban–rural gradient to suggest scales at which key drivers may be operating. We quantified scale variance of bird diversity across scales from 500 m to 64,000 m for three groups of species (All, Passeriformes, and Anseriformes and Charadriiformes combined). We estimated diversity by aggregating observations into a series of grids and computed comparable diversity estimates within each cell using interpolation and rarefaction. We calculated the variance explained by each scale for common diversity metrics. The results show that bird diversity patterns around London vary by scale, and that the location of high variance shifts across the study area depending on both scale and species group. The variance of Passeriformes diversity gradually shifted from the urban core to the periphery, while the variance of Anseriformes and Charadriiformes diversity occurred near water features. The results suggest that the urban–rural gradient and location of water are two properties of the study extent around London influencing the scale dependence of bird diversity that could be used to ground scale considerations of further modeling efforts.
https://doi.org/10.5061/dryad.wm37pvmwn
This dataset contains analysis scripts and processed results for studying the scale dependence of bird diversity in London and the surrounding region using scale variance analysis, as described in the related article.
Repository Structure
Data.zip
├── code/ Analysis scripts (R and Python) for reproducing results
├── results/ Processed analysis results (scale variance, diversity estimates)
Data Provenance
This analysis uses data from external sources that cannot be included due to licensing restrictions. To reproduce the analysis, you will need to obtain these datasets separately.
eBird Data
Source: eBird Basic Dataset (EBD)
Download: https://science.ebird.org/en/use-ebird-data/download-ebird-data-products
Filtering parameters used:
- Region: Great Britain (GB)
- Bounding box: -1.4177°W to 1.2414°E, 50.7000°N to 52.2661°N (London and surrounding region)
- Date range: 2014-01-01 to 2024-01-01
- Protocols: Stationary and Traveling counts only
- Complete checklists: Only checklists reporting all species observed
- Quality filters (following eBird best practices):
- Duration ≤ 360 minutes
- Distance ≤ 2 km (for traveling counts)
- Number of observers ≤ 10
Processing script: code/utilities/filter_ebd.R
Citation: eBird Basic Dataset. Version: EBD_relDec-2023. Cornell Lab of Ornithology, Ithaca, New York. December 2023.
AVONET Morphological Trait Database
Source: AVONET: A global database of morphological and ecological traits of all birds
Download: Available as supplementary data from Tobias et al. (2022) Ecology Letters
File used: AVONET Supplementary dataset 1.xlsx, sheet 'AVONET2_eBird'
Variables extracted: Order, Family, Habitat, Habitat.Density (habitat density), Migration, Trophic.Level (trophic level), Trophic.Niche (trophic niche), Primary.Lifestyle (primary lifestyle)
Citation: Tobias, J.A., Sheard, C., Pigot, A.L., et al. (2022). AVONET: morphological, ecological and geographical data for all birds. Ecology Letters, 25(3), 581-597. https://doi.org/10.1111/ele.13898
Greater London Boundary (Spatial Data)
Source: London Datastore, Greater London Authority (GLA)
Dataset: Statistical GIS Boundary Files for London
Download: https://data.london.gov.uk/dataset/statistical-gis-boundary-files-for-london-20od9/
Processing: The original shapefile was converted to GeoJSON format (GreaterLondon.geojson) for use with the sf package in R.
Analysis Workflow
The analysis follows a five-step pipeline:
- Hierarchical Binning (
code/functions/1 - create hbins.R): Creates nested hexagonal grids at multiple spatial scales using the sf package - Data Integration (
code/functions/2 - join hbin IDs.R): Assigns eBird observations to hierarchical grid cells (hbins) - Diversity Estimation (
code/functions/3 - estimate diversity.R): Computes Hill numbers (species diversity metrics) using the iNEXT package for diversity orders q = 0, 1, 2 - Spatial Interpolation (
code/functions/4a - ordinary kriging.R,code/functions/4b - empirical Bayesian kriging.py): Interpolates diversity estimates across the study area using geostatistical methods - Scale Variance Computation (
code/functions/5 - compute scale variance.R): Calculates variance in diversity across hierarchical spatial scales
Main entry point: code/0 - Batch Run Analysis.R configures and executes analyses with different spatial configurations
Code Organization
The code/ directory contains all scripts for data processing, analysis, and visualization:
Main Analysis Scripts
0 - Batch Run Analysis.R: Main file to run one or multiple analyses. Configure parameters here and execute.1 - Run Analysis Function.R: Main analysis function organizing the five-step workflow. Setarcgis = TRUE/FALSEhere.
Analysis Functions (code/functions/)
1 - create hbins.R: Creates hierarchical spatial bins (nested hexagonal grids)2 - join hbin IDs.R: Joins eBird observations to hierarchical bin identifiers3 - estimate diversity.R: Estimates species diversity using iNEXT4a - ordinary kriging.R: Performs ordinary kriging interpolation4b - empirical Bayesian kriging.py: Performs empirical Bayesian kriging (requires ArcGIS Pro)5 - compute scale variance.R: Computes scale variance across hierarchical levels
Visualization Scripts (Files 2-9)
2 - Get_observation_stats.R: Generates eBird data summaries3 - Plot_checklists_by_scale.R: Plots checklist density across spatial scales4 - Plot_checklist_stats.R: Plots checklist effort and completeness statistics5 - Plot_hbins_estD.R: Creates spatial maps of diversity estimates6 - Plot_sve.R: Plots scale variance elements (SVE) across hierarchical levels7 - Plot_svc.R: Plots scale variance components (SVC) and cumulative variance8 - Summarize_Taxonomy.R: Generates taxonomic composition summaries9 - Plot_iNEXT_example.R: Demonstrates iNEXT diversity estimation methodology
Utility Scripts (code/utilities/)
filter_ebd.R: Processes and filters raw eBird dataarcgis.R: Helper functions for ArcGIS integration (optional)
Package Versions
package-versions.csv: Lists exact versions of all R packages used in the analysis
Results Structure
The results/ directory contains outputs from multiple analysis runs organized by taxonomic group:
Directory Organization
results/
├── results_All/ # All bird species
│ ├── R1 - London 500m 7L/ # Run 1: 500m start scale, 7 levels
│ ├── R2 - London 750m 6L/ # Run 2: 750m start scale, 6 levels
│ ├── R3 - London 1000m 6L/ # Run 3: 1000m start scale, 6 levels
│ ├── R4 - London 1500m 5L/ # Run 4: 1500m start scale, 5 levels
│ ├── R5 - London 2000m 5L/ # Run 5: 2000m start scale, 5 levels
│ └── R6 - London 3000m 4L/ # Run 6: 3000m start scale, 4 levels
├── results_Anseriformes_Charadriiformes/ # Waterfowl and shorebirds only
│ └── [Same run structure as above]
├── results_Passeriformes/ # Songbirds only
│ └── [Same run structure as above]
└── results_meta/ # Cross-run summary statistics
File Naming Convention
Run IDs: R1-R6 represent different spatial configurations testing sensitivity to starting scale and number of levels
Scale parameters: Starting grain size in meters (500m, 750m, 1000m, 1500m, 2000m, 3000m)
Levels: Number of hierarchical scale levels (4L-7L), with each level doubling the grain size
Example: R5 - London 2000m 6L means:
- Run 5
- Starting scale: 2000m grain
- 5 hierarchical levels: 2000m, 4000m, 8000m, 16000m, 32000m
Files Within Each Run Directory
Each run directory (e.g., results_All/R1 - London 500m 7L/) contains:
RDS Files (R data objects, in main directory)
hbins.rds: Hierarchical spatial bins (hexagonal grid cells) without diversity estimateshbins_estD.rds: Hierarchical bins with observed diversity estimates (Hill numbers) for each cellhbins_{scale}m_{levels}L.geojson: Hierarchical grid structure for mapping (GeoJSON format)ebd_hbinIDs.rds: eBird observations with assigned hierarchical bin identifierssv_OBS_q{0,1,2}.rds: Scale variance results from observed (OBS) diversity without interpolationsv_OK_q{0,1,2}.rds: Scale variance results from ordinary kriging (OK) interpolationsv_EBK_q{0,1,2}.rds: Scale variance results from empirical Bayesian kriging (EBK) interpolation
CSV Files (summary results, in main directory)
svc_OBS_q{0,1,2}.csv: Scale variance components from observed diversitysvc_OK_q{0,1,2}.csv: Scale variance components from ordinary krigingsvc_EBK_q{0,1,2}.csv: Scale variance components from empirical Bayesian kriging
GeoJSON Subdirectory (geojson/)
The geojson/ subdirectory contains spatial data files for visualization:
OBS_estD_values_q{0,1,2}.geojson: Observed diversity estimates (spatial points)OK_estD_values_q{0,1,2}.geojson: Ordinary kriging diversity predictions (raster points)EBK_estD_values_q{0,1,2}.geojson: Empirical Bayesian kriging predictions (raster points)OK_result_q{0,1,2}.geojson: Ordinary kriging interpolation surfaceEBK_result_q{0,1,2}.geojson: Empirical Bayesian kriging interpolation surfaceestD_pts_q{0,1,2}.geojson: Diversity estimate pointssve{level}_q{0,1,2}.geojson: Scale variance elements (SVE) at each hierarchical level (level = 1 to number of levels)
ArcGIS Subdirectory (arcgis/, optional)
arcgis/: Contains geodatabase features (only present if analysis was run with ArcGIS integration enabled)
Meta-Results Directory
results/results_meta/ contains cross-run summary statistics:
Checklists_by_scale.csv: Proportion of study area covered by checklists at each grain sizeobservation_stats.csv: Summary statistics of eBird observations used in analysisestD_stats_all.csv: Diversity estimate statistics across all spatial configurationstaxonomic_summary_All.csv: Taxonomic composition summary (order and family level)taxonomic_summary_All_species.csv: Species-level taxonomic summary
Variable Definitions
Scale Variance Component Files (svc_*.csv)
These files contain the decomposition of total diversity variance across hierarchical spatial scales.
Columns:
scale_variance: Proportion of total variance explained at this hierarchical level (unitless, 0-1)level: Hierarchical level number (1 = finest grain, increases with coarser grains)scale: Grain size in meters for this levelsv_cumulative: Cumulative proportion of variance explained up to and including this level (unitless, 0-1)degf: Degrees of freedom, equal to the number of spatial units at this levelsum_squares: Sum of squared deviations at this level
Checklists by Scale File (Checklists_by_scale.csv)
This file quantifies spatial coverage of eBird sampling effort.
Columns:
grain: Grain size in metersn1: Proportion of grid cells containing at least 1 checklist (unitless, 0-1)n5: Proportion of grid cells containing at least 5 checklists (unitless, 0-1)
Observation Statistics File (observation_stats.csv)
This file provides summary statistics for the eBird data used in each analysis.
Columns:
dataset: Taxonomic group identifier (All, Anseriformes_Charadriiformes, Passeriformes)n_checklists: Total number of eBird checklists (count)n_species: Total number of species observed (count)n_observations: Total number of individual bird observations (count)
Diversity Metrics
This analysis uses Hill numbers (effective number of species), a family of diversity indices parameterized by order q:
- q = 0 (Species Richness): Count of species present, giving equal weight to all species regardless of abundance
- q = 1 (Shannon Diversity): Exponential of Shannon entropy, weighting species by their frequency. Emphasizes common species.
- q = 2 (Simpson Diversity): Inverse Simpson index, emphasizing the most dominant species
Hill numbers are estimated using interpolation and extrapolation of species accumulation curves (iNEXT package) to account for incomplete sampling. This approach provides robust diversity estimates even when sampling effort varies across locations.
Abbreviations and Technical Terms
- hbin: Hierarchical bin, a hexagonal grid cell at a specific spatial scale
- estD: Estimated diversity (Hill number)
- OBS: Observed diversity from raw data without interpolation
- OK: Ordinary kriging, a geostatistical interpolation method
- EBK: Empirical Bayesian kriging, an advanced geostatistical interpolation method
- SVC: Scale variance component, the proportion of variance at each scale
- SVE: Scale variance elements, variance contributed from each cell at each scale (sum SVE at each scale and normalize by total to get SVC)
- RDS: R Data Serialization format, a compressed binary file format for R objects
- GeoJSON: Geographic JavaScript Object Notation, a format for encoding spatial data
Opening Data Files
Most files in this dataset use text-based formats (CSV, GeoJSON, R scripts) that can be opened with any text editor or standard data analysis software.
RDS files contain R data objects and require R to open:
# In R, load an RDS file:
data <- readRDS("path/to/file.rds")
For example, to load hierarchical bins with diversity estimates:
hbins_estD <- readRDS("Data/results/results_All/R1 - London 500m 7L/hbins_estD.rds")
Spatial Reference System
All spatial data use the British National Grid coordinate reference system:
- EPSG Code: 27700
- Projection: Transverse Mercator
- Datum: OSGB36 (Ordnance Survey Great Britain 1936)
- Units: Meters
Note: The original eBird data download uses WGS84 geographic coordinates (EPSG:4326) but was transformed to British National Grid (EPSG:27700) for analysis to enable accurate distance calculations and area measurements.
Important Note: Post-Publication Spatial Reference Correction
After publication of the related article, we discovered that an earlier version of the analysis incorrectly used the Web Mercator projection (EPSG:3857) for hierarchical grid cells. Web Mercator is designed for web mapping and introduces distance distortion that increases with latitude. At London's latitude (~51.5°N), the distortion factor is approximately 0.623, meaning that spatial scales reported in the published article should be multiplied by 0.623 to reflect true ground distances.
For example:
- Reported 1,000m scale = actual 623m
- Reported 2,000m scale = actual 1,246m
- Reported 8,000m scale = actual 4,984m
The scripts in this repository have been corrected to use the British National Grid (EPSG:27700) throughout, which does not distort distances. All results in this data repository use the corrected coordinate system and report accurate spatial scales in meters.
Impact on published findings: While the absolute scale values in the published article are incorrect, all relative relationships, patterns, variance distributions, and scientific conclusions remain valid. The scale dependence patterns, variance components, and spatial distributions are accurate relative to one another. The conclusions about urban-rural gradients and environmental influences on bird diversity at different scales are unchanged.
A correction note has been published with the article to alert readers to the corrected spatial scales.
Software Requirements
The analysis was implemented in:
- R version: 4.3.3 (Angel Food Cake)
- Python version: 3.9.18
- ArcGIS Pro: 3.1 (optional, required only for empirical Bayesian kriging)
Key R packages (see code/package-versions.csv for exact versions):
sf(1.0-16): Spatial data handling and geometric operationstidyverse(2.0.0): Data manipulation and visualizationiNEXT(3.0.1): Interpolation and extrapolation of species diversitygstat(2.1-1): Geostatistical analysis and krigingauk(0.7.0): eBird data processing utilitiesarcgisbinding(1.0.1.306): Optional ArcGIS integration (R-ArcGIS bridge)
Key Python packages:
arcpy: ArcGIS Python API (included with ArcGIS Pro)
Running the Analysis
To reproduce this analysis:
- Obtain source data following instructions in the "Data Provenance" section above
- Download eBird data for the specified region and date range
- Download AVONET trait database
- Download Greater London boundary shapefile from London Datastore
- Convert shapefile to GeoJSON format (
GreaterLondon.geojson) if needed, or use the shapefile directly with sf package - Process eBird data using
code/utilities/filter_ebd.R
- Configure analysis parameters in
code/0 - Batch Run Analysis.R:run_ids: Vector of unique identifiers for each analysis run (e.g., "R1", "R2")start_scales: Vector of starting grain sizes in meters (e.g., 500, 750, 1000)nlevels: Vector of numbers of hierarchical levels (e.g., 7, 6, 6)
- Set ArcGIS option in
code/1 - Run Analysis Function.R:arcgis = TRUEto enable empirical Bayesian kriging (requires ArcGIS Pro 3.1+)arcgis = FALSEto run analysis without ArcGIS (ordinary kriging only)
- Execute analysis: Run
code/0 - Batch Run Analysis.Rin R
Results will be written to a results/ directory structured as described above.
Additional Resources
GitHub repository: The code from this analysis is also maintained at https://github.com/jacobdein/bird-diversity-scale-dependence
Related publication:
Dein J, and Tran L. 2024. Scale dependence of bird diversity in London. Landscape Ecology 2024 40:1 40.
https://doi.org/10.1007/s10980-024-02018-4
Contact
For questions about this dataset or analysis, contact:
Jacob Dein
jake@jacobdein.com
Or file an issue on the GitHub repository: https://github.com/jacobdein/bird-diversity-scale-dependence/issues
- Dein, Jacob; Tran, Liem (2024). Scale dependence of bird diversity in London. Landscape Ecology. https://doi.org/10.1007/s10980-024-02018-4
