Urban landscapes with more natural greenspace support higher pollinator diversity
Data files
Dec 28, 2024 version files 151.52 KB
-
README.md
9.32 KB
-
site_data.xlsx
141.94 KB
-
syrphidae_nativity.csv
247 B
Abstract
As cities around the world expand, we urgently need to better understand the drivers of urban biodiversity, especially for functionally important groups such as insect pollinators. In this study, we gathered hoverfly and bumble bee pollinator observations from natural history collections and community science initiatives from 462 urban landscapes across 85 U.S. metropolitan areas. We tested whether urban greenspace functions as pollinator habitat by examining whether the total area of greenspace in an urban landscape predicted pollinator occurrence, i.e., the presence or absence of species in a landscape. Our study was designed to determine whether there were differences between natural greenspace area (i.e., urban greenbelts, nature reserves and forest/grassland fragments) and developed greenspace area (i.e., managed parks, cemeteries and golf courses) in their ability to support a diversity of pollinator species. After accounting for sampling biases using an integrated occupancy modeling approach, we found a positive association between native hoverfly occurrence and natural greenspace area. This implies that urban landscapes with more natural greenspace support higher native hoverfly diversity. On average, bumble bee occurrence was not associated with natural greenspace area, however, the response varied among species, with several at-risk bumble bees showing a positive association. In contrast to natural greenspace area, we found no association between pollinator occurrence and the area of developed greenspace. In addition, we found that the proportion of racial minority households in an urban landscape was negatively associated with pollinator occurrence. This is consistent with the hypothesis that a history of systematic, unjust policies in neighborhoods with more racial minority households has lasting negative impacts on urban biodiversity. In conclusion, our results support the hypothesis that natural greenspace functions as vital habitat for urban pollinators. We recommend that cities prioritize the preservation of remnant natural greenspace and improve developed greenspaces in order to promote urban pollinator conservation. These efforts should be prioritized in urban landscapes with a higher proportion of racial minority households to improve equal access to nature and pollinator ecosystem services.
README: Urban landscapes with more natural greenspace support higher pollinator diversity
Updated December 11, 2024, by Jens Ulrich.
Data
The following data files are provided in the Dryad repository:
- syrphidae_nativity.csv - information about hoverfly nativity. The "species" column contains character data on species names. The "nativity" column indicates whether species are non-native (nativity == 0). All hoverfly species not included in this list were considered to be native.
- site_data.xlsx - these are site-specific predictors derived from publicly available land use and socioeconomic data (see citations below). These data are generated under the landscape definitions for our main analysis presented in the manuscript: landscape size of 10km by 10km, filtering out landscapes with less than 1,200 people / km^2.
Site-level land cover and socioeconomic data (site_data.xlsx) were derived from the following public datasets from the following databases: the U.S. National Land Cover Database (land cover – https://www.mrlc.gov/data/nlcd-2016-land-cover-conus); IPUMS National Historical GIS (household income – https://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev11/data-download#); the U.S. Census Bureau (neighborhood racial composition – https://data.census.gov/table/DECENNIALDP2020.DP1); the Center for International Earth Science Information Network (population density – https://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev11/data-download#); and the U.S. Environmental Protection Agency (ecoregion borders – https://www.epa.gov/eco-research/ecoregions), and the U.S. Census Bureau (metropolitan area definitions - https://catalog.data.gov/dataset/tiger-line-shapefile-2019-nation-u-s-current-metropolitan-statistical-area-micropolitan-statist). The query details for obtaining each of these datasets are as follows:
Land cover
2016 NLCD land cover data at 30 m resolution.
Household income
2020 income data, B19013 == Median Household Income in the Past 12 Months (in 2020 Inflation-Adjusted Dollars).
Racial composition
2020 race and other factors, DP1_0078P == Percentage of people identifying as "white, no other race"
Population Density
2015 population density raster at 1 km resolution.
Ecoregions
Shapefiles for ecoregion 1 in the north america (broadest spatial clustering unit for analysis) and ecoregion 3 in north america (a finer spatial clustering unit for analysis).
Metropolitan areas
Definitions from 2018 using 2010 census data. Used for tagging name of city area that the sites were associated with (for interpretation ease).
Hoverfly and Bumble Bee Detection Data
Hoverfly detection data
The hoverfly data were obtained from GIBF.org (GBIF.org 2023a and GBIF.org 2023b). We downloaded the data in two batches. The two citation links provide access to these data. Metadata are provided by the GBIF API. Combined, these data consist of 145,572 specimen records from years 2000-2022 within the geographical extent of the United States. We amended the data by appending a column of "basisOfRecord" where we identified detections as originating from "community science" or "research collections" observation processes. See our methods in the manuscript for how we determined basisOfRecord. We also amended the data by making the following taxonomic clustering for hoverflies:
# replace all Eumerus with Eumerus sp.
mutate(species = ifelse(genus == "Eumerus", "Eumerus sp.", species)) %>%
# replace all Chrysogaster with Chrysogaster sp.
mutate(species = ifelse(genus == "Chrysogaster", "Chrysogaster sp.", species)) %>%
# replace Eoseristalis (genus name) with Eristalis (genus name)
mutate(species = gsub("Eoseristalis", "Eristalis", species))
- GBIF.org. 2023a. “GBIF Occurrence Download.” Accessed Mar-01-2023. https://doi.org/10.15468/dl.nga26z.
- GBIF.org. 2023b. “GBIF Occurrence Download.” Accessed Mar-01-2023. https://doi.org/10.15468/dl.n5cmwv.
Bumble bee detection data
The bumble bee detection data used in the study were accessed from: Richardson LL. 2022. Bumble Bees of North America occurrence records database (https://www.leifrichardson.org/bbna.html; accessed Feb-16-2023). After trimming to the spatial extent of the continental U.S. from the widest temporal extent of the study of 2000-2022, the data consist of 468,949 specimen records. We amended the provided data by grouping records for "Bombus vancouverensis" and "Bombus bifarius" as a single species. We also appended a column of "basisOfRecord" where we identified detections as originating from "community science" or "research collections" observation processes. See our methods in the manuscript for how we determined basisOfRecord.
Analyses and code
To reproduce the analyses, the data and analysis files must be properly arranged in an R project directory. To view or clone the directory structure, see the github repository that was originally used to run the analyses, or alternatively arrange the files following the structure specified below.
To run an analysis, place run_model.R in a directory "./occupancy/analysis/". After opening the file, you will be prompted to enter details that will specify the model run, including: taxonomic group, spatial grain and temporal divisions of occupancy intervals.
We estimated the associations between urban pollinator occurrence and urban land cover / urban socioeconomics using integrated community occupancy models with Bayesian inference. The models are written in the language Stan. Place the Stan models in directory "/occupancy/models/". "model_syrphidae.stan" is for hoverflies and "model_bombus.stan" is for bumble bees.
Before running the analysis, place the following provided files in a directory "./occupancy/data_prep/": prep_data.R, get_species_ranges.R, and get_spatial_data.R. These files are required to format the data for the integrated community occupancy model.
How to run a model:
Run a model (./analysis/run_model.R) by specifying the data level constraints (taxonomic group, spatial grain and temporal divisions of occupancy intervals) and by tweaking the HMC settings if desired.
After specifying the data level constraints at the top of the file, you will be offered to prepare data for an analysis (prep_data() function). This preparation takes awhile (~10 minutes) to run, so if it's already been done and the prepared data has been previously saved, you can go ahead and skip down to load the previously saved data and then enter the HMC setting before running the model. Prepared data .rds files are held in the ./analysis/prepped_data/ folder.
The function prep_data() (called by run_model.R) is held in prep_data.R. After being called, this function will then communicate with get_spatial_data.R to define sites and gather covariate data before attributing detections to sites and inferring a sampling process. If the default study parameters are used as inputs (10km by 10km grid cells with minimum pop. density of 1,200 people/km^2) then the function will gather the predictor data included in the data file stored in this Dryad repository: "site_data.xlsx". The function will also communicate with get_species_ranges.R to determine which sites are in range and which are out of range (should be treated as NA's) based on each species distribution.
There are a few model diagnostic functions listed after the model run .
Where to place the data to run the models:
Make a directory "./data/". Place the pollinator occurrence data in "./data/occurrence_data/". We named the bumble bee data: "bbna_trimmed.csv" (we trimmed the dataset to the temporal and spatial extent of our study to speed up the .csv load times); we combined both GBIF downloads for hoverfly data into a single .csv file named: "syrphidae_data_all.csv". Use these names for the data prep scripts to run smoothly.
Place the land cover and socioeconomic data in the following subdirectories:
- "./data/spatial_data/land_cover/" for the land cover raster.
- "./data/spatial_data/socioeconomic_data/" for the household income shapefile.
- "./data/spatial_data/racial_composition/" for the neighborhood racial composition shapefile.
- "./data/spatial_data/population_density/" for the population density raster.
- "./data/spatial_data/ecoregion_level_one/" for the ecoregion level 1 shapefile.
- "./data/spatial_data/ecoregion_level_three/" for the ecoregion level 3 shapefile.
- "./data/spatial_data/cbsa_metro_areas/" for the metropolitan area shapefile.
Methods
In this study, we used publicly available data on urban land cover and urban socioeconomics and estimated their associations with urban pollinator occurrence rate. The publicly available urban land coverand urban socioeconomic data are large (continental scale) spatial files. The orginial sources for the spatial land cover and socioeconomic data are cited in the manuscript. We provide the site-specific land cover and socioeconomic predictor values that were derived from the publicly accessible data in a table - "site_data.csv". We focused on two groups of pollinator species - hoverflies (family Syrphidae) and bumble bees (family Apidae, genus Bombus). Pollinator detections were obtained from GBIF for hoverflies, and from the Bumble Bees of North America Database for bumble bees. The GBIF data request is cited in the README and in the manuscript text. The full Bumble Bees of North America Database data set is available upon request from the original provider, also cited in the README and in the manuscript text.
We used integrated community occupancy models to quantify the associations between urban landscape predictors and urban pollinator occurrence. This model form estimates and accounts for detection biases in the data while simultaneously estimating associations with whether or not species occur. We looked at whether four focal predictrs were associated with occurrence: natural greenspace, developed greenspace, household income and neighborhood racial composition. Natural greenspace versus developed greenspacewere synthetic categories that we constructed based on the categorical land cover raster values provided by the original source. The integrated community occupancy models estimates community average associations with each of the four focal predictors. The effect of natural greenspace was our primary interest so we also estimated species-specific effects of the natural greenspace area on occurrence. We did not estimate species-specific effects for all four predictors because those more complex models failed to converge given the amount of data in our study.