Data from: Habitat restorations in an urban landscape rapidly assemble diverse pollinator communities that persist
Data files
Oct 22, 2024 version files 6.34 MB
-
appendix_s2.csv
27.44 KB
-
clarkia_pollination_data_2022.csv
55.78 KB
-
flower_resources_herb_quadrats.csv
79.37 KB
-
flower_resources_woody.csv
32.09 KB
-
land_cover_by_site_reduced.csv
2.62 KB
-
metadata.docx
23.78 KB
-
pollinator_data.csv
789.68 KB
-
README.md
18.45 KB
-
reduced_pollinator_plant.csv
5.31 MB
Oct 31, 2024 version files 6.34 MB
-
appendix_s2.csv
27.44 KB
-
clarkia_pollination_data_2022.csv
55.78 KB
-
flower_resources_herb_quadrats.csv
79.37 KB
-
flower_resources_woody.csv
32.09 KB
-
land_cover_by_site_reduced.csv
2.62 KB
-
metadata.docx
23.78 KB
-
pollinator_data.csv
789.68 KB
-
README.md
19.33 KB
-
reduced_pollinator_plant.csv
5.31 MB
Abstract
Ecological restoration is a leading approach to mitigating biodiversity decline. While restoration often leads to an immediate increase in abundance or diversity, it is rarely clear whether it supports longer-term biodiversity gains at the landscape scale. To examine the impacts of urban restoration on pollinator biodiversity, we conducted a three-year natural experiment in 18 parks across a large metropolitan area. We applied an occupancy model to our survey data to determine how restoration, woody plant density, and pollinator specialization impacted interannual pollinator metacommunity dynamics. Restoration drove a rapid increase in pollinator species occurrence that was maintained through a positive balance between colonization and persistence, resulting in pollinator species richness gains that are retained. We conclude that urban restoration can effectively conserve pollinator biodiversity by influencing the processes that underlie long-term population stability. Our results highlight the need to study the long-term effects of restoration in different landscape contexts.
Data from: Habitat restoration promotes long-term diversity in an urban pollinator metacommunity
prepared by Jens Ulrich, October 2024
In this README the info for running the analyses is listed first, with the metadata (also contained in metadata.doc) pasted below. Information on the Software (R and Stan code) and the creation of the main figures is provided after the metadata.
change log
Version 31 Oct 2024: minor edits to README and Zenodo files. Data on Dryad did not change.
the paper contains two key sets of analyses:
- multi-species dynamic occupancy model for estimating changes in pollinator occurrence through time.
- logistic regression model for estimating effects of habitat enhancements on pollen limitation.
To reproduce the analyses, the data will need to be placed in project directories that align with the following formatting below. Several of the R files call functions held within other R files or call Stan models to be compiled and run. These functions or models will only run if the dependent files are organized correspondingly.
The data and code are also stored and organized in the appropriate structure for running the models on a GitHub repository: at https://github.com/jensculrich/urban_pollinator_occupancy_model/tree/main
data are stored in ./data/:
- flower_resources_herb_quadrat.csv (the data that was collected on flower counts in the herb restoration or analogous lawn space)
- flower_resources_woody.csv (the data that was collected on flower counts from woody plants)
- land_cover_by_site_reduced.csv (lat/long and landscape buffer impervious surface and tree canopy cover summaries for each of the 18 park sites)
- pollinator_data.csv (pollinator detections and the plants that they were recorded interacting with)
- metada.doc (contains detailed descriptions of column names from each data file)
- Pollen limitation data are in clarkia_pollination_data_2022.csv.
- appendix_s2 includes information about the species included in our study (number of species detections, specialization, and phenology metrics).
METADATA
Metadata for data from: “Habitat restorations in an urban landscape rapidly assemble diverse pollinator communities that persist”
Prepared by Jens Ulrich, May-October 2024
This file contains metadata on the five key pieces of data accompanying the manuscript:
- flower_resources_herb_quadrat.csv (the data that was collected on flower counts in the herb enhancements or analogous lawn space in the urban parks)
- flower_resources_woody.csv (the data that was collected on flower counts from existing woody plants in the urban parks)
- land_cover_by_site_reduced.csv (lat/long and landscape buffer impervious surface and tree canopy cover summaries for each of the eighteen urban park sites)
- pollinator_data.csv (pollinator detections from the urban parks, and the plants that they were recorded interacting with)
- clarkia_pollination_data_2022.csv (seed set for control and supplemented flowers from the pollen limitation experiment)
- reduced_pollinator_plant.csv. These are the publicly available interaction data from the following source that were used to supplement our internal interaction data gathered within our study system. We paired down the original data set provided by Guzman et al., removing the “Shrub-Steppe” interactions which are from a different biogeographic region. In the R code we also remove interactions south of 48 degree latitude. This further constrained the interaction detections to within a shared biogeographical context. See the following doi for the original authors data column descriptions. Guzman, L. M., T. Kelly, and E. Elle. 2023. “A Data Set for Pollinator Diversity and Their Interactions with Plants in the Pacific Northwest.” Ecology 104 (2): e3927. https://doi.org/10.1002/ECY.3927.
- appendix_s2.csv - pollinator species information. These data summarize the interaction specialization metrics estimated from our interaction network as well as species-specific phenology as estimated by our occupancy model.
flower_resources_herb_quadrat.csv
YEAR: year in which data was collected. 1 == 2021, 2 == 2022, 3 == 2023.
SAMPLING_ROUND: visit within year in which data was collected. Ranges from 1 – 7. Round 1 in year 1 was a practice round that we did not include (we also didn’t have permission to lethally sample pollinators at the beginning of this round). Year 2 we visited some sites a 7th time but did not include these data in our analysis due to difficulties with flexibility of programming around stan’s inability to handle NA values.
DATE: date on which survey was conducted.
SITE: park site name.
SPECIES: plant species recorded.
NUM_FLORAL_UNITS: number of open floral units recorded for species across all 20 quadrats for the visit.
flower_resources_woody.csv
YEAR: year in which data was collected. 1 == 2021, 2 == 2022, 3 == 2023.
SAMPLING_ROUND: visit within year in which data was collected.
DATE: date on which survey was conducted.
SITE: park site name.
SPECIES: plant species recorded.
NUM_FLORAL_UNITS: number of open floral units recorded for species across all 20 quadrats for the visit.
SHRUB_OR_TREE: we counted additional plants if we were unsure whether to deem a woody tree/shrub or otherwise. After the surveys were done, the main author (JU) added “y” if the species should indeed be considered a tree/shrub.
land_cover_by_site_reduced.csv
SITE: park site name.
Latitude: latitude of the site centroid.
Longitude: longitude of the site centroid.
Category: herbaceous enhancement or site used as a control.
mean_herb_scaled: mean herbaceous enhancement flower abundance across all surveys*years. Z-score scaled.
mean_woody_scaled: mean woody plant flower abundance across all surveys*years. Z-score scaled.
mean_herb: mean herbaceous enhancement flower abundance across all surveys*years. Raw value scaled.
mean_woody: mean woody plant flower abundance across all surveys*years. Raw value scaled.
imp_standardized: amount of area in the buffer that is impervious surface relative to the area of the buffer (which is not always full 500m buffer given that some sites have water bodies in the area and we didn’t want to penalize for this).
canopy_standardized: amount of area in the buffer that has tree canopy cover relative to the area of the buffer (which is not always full 500m buffer given that some sites have water bodies in the area and we didn’t want to penalize for this).
pollinator_data.csv
YEAR: year in which data was collected. 1 == 2021, 2 == 2022, 3 == 2023.
SAMPLING_ROUND: visit within year in which data was collected.
DATE: date on which survey was conducted.
SITE: park site name.
CLADE: Syrphidae (hover flies; included in analyses), Anthophila (wild bees; included in analyses), other == we sometimes captured other flies or wasps if we were unsure in the field and then confirmed IDs in the lab and filtered out with R if they were neither of the two focal groups.
SPECIES: pollinator species identification. If undetermined, then CLADE and SEX are listed as NA.
SEX: m == male, f == female, NA == undetermined or no specimen.
COLLECTED_SPECIMEN: y == we brought back to the lab to ID, physical specimen is associated; n == identified in the field, no physical specimen is associated.
UNIQUE_SPECIMEN_ID: unique year/# label given to the pollinator. This is printed on the labels for all physical specimens. Note that in 2021 we only assigned unique specimen ID to the specimens we collected and brought back to the lab for identification. In 2021 the unique specimen ID is NA if we didn’t actually collect a specimen. In 2022 and 2023 we assigned a number for every single specimen including the uncollected specimens, which helped us keep a sense of the total number of specimens collected during the course of the field collection season.
PLANT_NETTED_FROM_FAMILY: Family of the plant that the pollinator was interacting with.
PLANT_NETTED_FROM_GENUS: Genus of the plant that the pollinator was interacting with.
PLANT_NETTED_FROM_SCI_NAME: Genus species name of the plant that the pollinator was interacting with.
NOTES: notes about the row if any. For a small number of site visits we did not detect any pollinators. We kept a row in the data sheet with the site name and visit date, but with pollinator species ID/specimen ID as NA, the notes indicate that we did visit the site but that no pollinators were detected. Additionally there are some rows labelled as “extra label made”. After completing sampling, one field researcher counted the number of pollinator specimens collected off of a certain flower; the other researcher recorded this number and made a label for each potential pollinator from this group. If the first field researcher counted more pollinators than we actually collected, then an extra label was mistakenly made (with no associated pollinator). In the data sheet, the pollinator and plant information in the corresponding row was backfilled with NA’s. In some cases, the field researcher did not note enough specimens that we collected from a plant. In this case an extra row was made in the data sheet, with a unique specimen ID containing a decimal point -> e.g., if two pollinators were collected from the same flower in 2022 but the researcher only noted 1 in the field, we expanded a row in the data sheet with specimen jcu2022_1 and jcu2022_1.1.
clarkia_pollination_data_2022.csv
FLOWER_UNIQUE_ID: each flower in the experiment was given a unique ID number.
SITE: the park where the flower was placed.
SITE_TREATMENT: control == no park restoration; treatment == park was restored.
POT_NUMBER: plants were grown seven per pot. We tracked plant groupings by pot.
PLANT_NUMBER: Within each pot we used 4 plants to for the experiment. Labeling as 1:4 allowed us to group pairs of flowers (control versus supplement) from the same plant.
FLOWER_TREATMENT: control == open pollination; treatment == hand pollinated.
SEEDS_PRODUCED: Number of seeds produced by the fruiting capsule of the flower. NA indicates that the flower did not produce a viable seed count, either the seed pod was damaged, not recovered or in one instance the labelled packet had two different seed pods.
BOTH_FLOWERS_OPEN_AND_POLLEN_APPLIED: Did both flowers open up during the experimental exposure? Did we catch the stigma well receptive for pollen limitation. If yes, good; if no, we couldn’t use the pair for the analysis – see R code (prep_data_pollen_limitation.R)
BOTH_CAPSULES_RECOVERED: Did both capsules make it back to the common garden and through development without any physical damage? If yes, good; if no, we couldn’t use the pair for the analysis – see R code (prep_data_pollen_limitation.R).
COUNT_NOTES: Any notes about the seed counts – if the pod was damaged, any confusion about the identity of the seed pod or no packet with seed pod in the lab. If the seed pod was missing at the common garden (possible herbivory) or if we knew a priori that the flower pair was not open/receptive with the treatment flower successfully pollinated we did not spend time collecting the seed pod for count so the seed packet is missing. NA indicates no notes about the seed counts.
PL: 1 – open pollinated (control)/hand pollinated (treatment) (defined by Larsen and Barrett 2000 as the number proportion of seeds that the flower could have produced if pollination was optimal).
PL_INDEX: the complement of above (1 – PL), i.e., open pollinated/hand pollinated. This describes the proportion of seeds produced relative to the hand pollinated flower. Note that PL and PL_INDEX have identical values for both the control and treatment flower on an individual plant in the same pot at the same site, because the value is derived from the seeds produced by the two flowers. For the analysis we simply filter out the treatment flowers so we have one row of PL_INDEX (the response) per pair of flowers. Before the analysis we filtered out flower pairs with PL_INDEX > 2 as potential experimental errors, and flower pairs where one or more flowers did not produce a viable seed count (SEEDS PRODUCED == NA) – see R code (prep_data_pollen_limitation.R).
DATE_PLACED_AT_SITE: Date that the pot was dropped off at a field site.
DATE_RETURNED: Date that the pot was returned from a field site.
DATE_FLOWER_OPEN_CONFIRMED: we confirmed that a flower was open and receptive at least at some point during the window between the placed and returned dates. NA indicates that the flower receptive period was never confirmed during the exposure. If either flower from a pair was NA, we did not include the pair in the analysis.
DATE_POLLEN_APPLIED: Date at which the first dose of pollen was applied. We applied additional doses if the stigma still appeared receptive on additional site visits. We visited sites 4 total times over the exposure period to apply pollen supplements. NA indicates that the flower was never receptive during visits and hence no pollen was applied OR NA for all control plants because we never applied pollen to control plants.
TREATMENT_NOTES: Any notes on the pollination treatment. NA indicates no notes.
Appendix_S2.csv - pollinator species information
species – the scientific name of the pollinator species
clade – Anthophila indicates bees, Syrphidae indicates hoverflies
total captures – the total number of times a species was collected during the study.
phenology peak – the species specific phenology peak estimated by our model. A value of zero indicates that species detectability (and presumably flight window) peaks at the mean survey date across our visits within years (approximately mid June). A negative value indicates that a species was estimated to reach a peak detection earlier in the season versus a positive value indicating that a species was estimated to reach a peak detection later in the season.
phenology decay – the species specific phenology decay estimated by our model. A value of zero would indicate no change in detectability throughout the season. A negative value indicates that the species becomes less detectable before or after the peak detection date. A more negative value indicates that the detection rate declines more rapidly around the peak detection date (i.e., that the species has a short flight season).
pollen_specialization – information about oligolecty classification wherever available, taken from https://jarrodfowler.com/pollen_specialist.html. We didn’t use this in our models.
The following 6 metrics are calculated using only our internal interaction data from Vancouver city parks.
degree – the number of plant species that a pollinator species was recorded interacting with.
normalised_degree – the number of plant species that a pollinator species was recorded interacting with divided by the total number of plant genera interacted with by any pollinator species in our dataset.
d – the species specific specialization metric Bluthgen’s d (d’) as estimated using the bipartite package in R. Calculated after grouping plants by genus.
degree_scaled – z-score scaled degree.
normalised_degree_scaled – z-score scaled normalised_degree
d_scaled – z-score scaled d
The above 6 metrics were recalculated with a “supplemented” tag at the end of the column name (e.g., “degree_supplemented”) using our internal interaction data from Vancouver city parks combined with the extrernal data from the publicly available dataset on pollinator interactions from our region (see metadata list of files for that dataset).
The above 6 metrics were then recalculated with a “supplemented_genus” tag at the end of the column name (e.g., “degree_supplemented”) using our internal interaction data from Vancouver city parks combined with the extrernal data from the publicly available dataset on pollinator interactions from our region (see metadata list of files for that dataset) BUT with plants grouped by genus before the metrics were calculated. These are the metrics we used in our final analyses presented in the manuscript. We grouped by genus to resolve some disagreement or uncertainty in species level identifications across the datasets and also to avoid overinflating generalization based on interactions with many closely related species.
Software
To conduct the occupancy model analysis:
- navigate to ./dynamic_occupancy_model/run_model/run_model.R
- the run_model.R file will access a prep_data function from prep_data.R to organize the data into the appropriate types including array of detection/non-detection for [species,site,year,survey] and the covariates (including herbaceous restoration, woody plants, specialization).
- specifiy the appropriate stan model to estimate posterior distributions for unknown parameters and then call stan using rstan to run the model. The model used for the submitted version of the manuscript is labelled: ./dynamic_occupancy_model/models/final_model.stan
- model diagnostics and posterior distribution summaries can be pulled out at the end of the run_model.R file.
- navigate to ./dynamic_occupancy_model/PPCs/PPCs.R to run posterior predictive checks.
To conduct the pollen limitation analysis:
- navigate to ./pollen_limitation_experiment/analysis/run_model_pollen_limitation.R
- the run_model_pollen_limitation.R file will access a prep_data function from prep_data_pollen_limitation.R to organize the data into the appropriate types including array of pollen limitation outcomes and the covariates (including herb restoration).
- the prep_data function will access the pollen limitation data which is stored at ./pollen_limitation_experiment/data/clarkia_pollination_data_2022.csv.
- specifiy the appropriate stan model to estimate posterior distributions for unknown parameters and then call stan using rstan to run the model. The model used for the submitted version of the manuscript is labelled: ./pollen_limitation_experiment/models/logistic_model_pollen_limitation.stan
- model diagnostics and posterior distribution summaries can be pulled out at the end of the run_model_pollen_limitation.R file.
- posterior predictive checks can be pulled out at the end of the run_model.R file..
- navigate to ./pollen_limitation_experiment/figures/make_figures.R to recreate pollen limitation figure from the paper (Figure S23).
To recreate the main figures in manuscript:
- Figure 1 (site map with site predictors) was manually created in QGIS using the “land_cover_by_site_reduced.csv” data file.
- Figure 2 (histogram of species specialization d’) is created in the plot_colonization_persistence_initocc.R file.
- Figure 3 (effects of restoration on pollinator dynamics) is created in the plot_colonization_persistence_initocc.R file.
- Figure 4 (effects of existing woody plants on pollinator dynamics) is created in the plot_colonization_persistence_initocc.R file.
- Figure 5 (observed and estimated species richness) is created in the plot_species_richness.R file.
The dataset includes: (1) Pollinator detection data. Pollinator detection data was collected by visiting urban park sites and sweep netting for insects for 20 minutes on 6 visit occasions per year for three years (2021, 2022, and 2023). (2) Flower resource data from herbaceous restoration areas or, in control sites, managed herbaceous turfgrass. These data were collected by placing a transect of twenty 1m squared quadrats through the restored or turfgrass area and counting the number of flowers of each plant species within the quadrats. These data were collected on each of the 6 visits per year for three years (2021, 2022, and 2023). (3) Flower resource data from existing woody plants within each park. These data were collected by walking complete transects through the park space and counting (or estimating) every flower on all woody trees and shrubs. These data were collected on each of the 6 visits per year for three years (2021, 2022, and 2023). (4) Land cover data. These data summarize the landscape cover surrounding each site. The original landcover data were sourced from publicly available Metro Vancouver GIS resources, cited in the text and code. In addition, these data include the latitude and longitude of the urban park sites as well as the categorical classification of whether the sites were restored or not. (5) A subset of pollinator interaction data from a publicly available published dataset. We subsetted the original data to the interactions occurring within our local area. The full, publicly available published dataset is cited in the text and code. (6) Appendix S2 - pollinator species information. These data summarize the interaction specialization estimated by our interaction network as well as species-specific phenology as estimated by our model. (7) Pollination data. These data describe the seed set of paired pollen-supplemented and naturally-pollinated Clarkia amoena plants. These data were obtained by placing plants at 11 of the 18 field sites during flowering. The ripened fruiting capsules were then harvested and dissected, and seeds were counted.
The dataset here also includes novel code written to analyze these data. In summary, we transformed the pollinator detection data into an array of binary species/site/year/visit detections (0=not detected or 1= detected). We wrote a custom dynamic occupancy model in the programming language Stan, which estimated the interannual transitions in species occurrences while accounting for imperfect detection (false negatives) of species occurrence. Separately, we wrote a custom logistic regression model, also in Stan, to estimate the effect of park site restoration on the probability that a flower (of the annual plant species Clarkia amoena) is pollen limited. The code can also be accessed and run from a github repository: https://github.com/jensculrich/urban_pollinator_occupancy_model.