Processed data for the analysis of human mobility changes from COVID-19 lockdown on bird occupancy in North Carolina, USA
Data files
Mar 28, 2024 version files 84.29 MB
Abstract
The COVID-19 pandemic lockdown worldwide provided a unique research opportunity for ecologists to investigate the human-wildlife relationship under abrupt changes in human mobility, also known as Anthropause. Here we chose 15 common non-migratory bird species with different levels of synanthrope and we aimed to compare how human mobility changes could influence the occupancy of fully synanthropic species such as House Sparrow (Passer domesticus) versus casual to tangential synanthropic species such as White-breasted Nuthatch (Sitta carolinensis). We extracted data from the eBird citizen science project during three study periods in the spring and summer of 2020 when human mobility changed unevenly across different counties in North Carolina. We used the COVID-19 Community Mobility reports from Google to examine how community mobility changes towards workplaces, an indicator of overall human movements at the county level, could influence bird occupancy.
README: Processed data for the analysis of human mobility changes on bird occupancy in NC
https://doi.org/10.5061/dryad.gb5mkkwxr
There are 3 types of data here including Google Community Mobility data, and processed data (data after extracting spatial covariates and merging with all covariates for the Occupancy Modeling as well as extracted predicted occupancy data that we used to create figures).
Description of the data and file structure
Google Community Mobility data: This is the dataset downloaded from https://www.google.com/covid19/mobility/ that measures the mobility changes throughout the world during the COVID-19 lockdown. Please visit the above website for more information about the data. Please see the "Anthropause_AMCR_02112024" R file (uploaded to Zenodo) for details on how we processed the raw data.
Dataset name | Dataset description |
---|---|
2020_US_Region_Mobility_Report.csv | This data contains the mobility change by percentage at the county level in the United States. |
Variable Name | Variable description |
---|---|
country_region_code | The code is for country or region. It's only "US" in this dataset. All entries are from the United States. |
country_region | The country or region. Only "United States" in this dataset. |
sub_region_1 | The state name such as "North Carolina". |
sub_region_2 | The county name such as "Orange County". |
metro_area | It was not defined from Google and there is not any data under this column. |
iso_3166_2_code | It was not defined from Google and there is not any data under this column. |
census_fips_code | The code for Federal Information Processing Series, representing the county subdivisions and places. This data is not used for the manuscript. |
place_id | It was not defined from Google and this data is not used for the manuscript. |
date | The date for the community mobility percentage change. |
retail_and_recreation_percent_change_from_baseline | The percentage change of mobility towards retail and recreation compared to a baseline in Jan 2020. |
grocery_and_pharmacy_percent_change_from_baseline | The percentage change of mobility towards grocery and pharmacy compared to a baseline in Jan 2020. |
parks_percent_change_from_baseline | The percentage change of mobility towards parks compared to a baseline in Jan 2020. |
transit_stations_percent_change_from_baseline | The percentage change of mobility towards transit stations compared to a baseline in Jan 2020. |
workplaces_percent_change_from_baseline | The percentage change of mobility towards workplaces compared to a baseline in Jan 2020. This is the data used in the manuscript. |
residential_percent_change_from_baseline | The percentage change of mobility towards residential compared to a baseline in Jan 2020. |
Processed data for Occupancy Modeling: These are the datasets after processing via R and ArcGIS and they were used directly for Occupancy Modeling. There is a total of 45 datasets for 15 study species. Each species has 3 datasets representing before, during, and after lockdown periods. The naming convention of the 45 datasets are as follows [4-letter species code] + [lockdown period] + [postGIS]. The 4-letter species codes are below. The lockdown period are before, during, and after the lockdown. The word "postGIS" is in the name of all these 45 datasets indicating they were processed in ArcGIS.
Red-bellied Woodpecker (Melanerpes carolinus) - RBWO ,
Downy Woodpecker (Dryobates pubescens) - DOWO,
Blue Jay (Cyanocitta cristata) - BLJA,
Northern Mockingbird (Mimus polyglottos) - NOMO,
Tufted Titmouse (Baeolophus bicolor) - TUTI,
Carolina Chickadee (Poecile carolinensis) - CACH,
White-breasted Nuthatch (Sitta carolinensis) - WBNU,
Carolina Wren (Thryothorus ludovicianus) - CARW,
Northern Cardinal (Cardinalis cardinalis) - NOCA,
Mourning Dove (Zenaida macroura) - MODO,
Red-shouldered Hawk (Buteo lineatus) - RSHA,
American Crow (Corvus brachyrhynchos) - AMCR,
House Sparrow (Passer domesticus) - HOSP,
European Starling (Sturnus vulgaris) - EUST,
House Finch (Haemorhous mexicanus) - HOFI.
Variable name | Variable descriptions |
---|---|
OBJECTID | the unique location ID for each eBird checklist location. |
y.1 to y.10 | the presence or absence of observing the given species. These ten measures are the "repeat visits" for each sampling location. "TRUE" represents presence and "FALSE" represents absence. NA represents no data available (the eBird observer didn't make another checklist). |
Retail | community mobility change (%) related to retail travels. |
Workplaces | community mobility change (%) related to workplace travels. |
Residential | community mobility change (%) related to residential travels. |
time_observations_started.1 to time_observations_started.10 | the time when the eBird observer started the checklist for each "repeat visit" at a sampling location. |
duration_minutes.1 to duration_minutes.10 | the duration of the checklist in minutes for each "repeat visit" at a sampling location. |
number_observers.1 to number_observers.10 | the number of observers for a checklist for each "repeat visit" at a sampling location. |
Percent_forest_cover | the percentage of forest cover for each sampling location. |
Percent_development_cover | the percentage of development cover for each sampling location. |
Predicted occupancy data: these four datasets are predicted occupancy of AMCR, RSHA, HOSP, and WBNU. They were used to create figures.
Dataset name | Dataset description |
---|---|
Predicted_AMCR_all.csv | The predicted occupancy of American Crow. |
Predicted_HOSP_all.csv | The predicted occupancy of House Sparrow. |
Predicted_RSHA_all.csv | The predicted occupancy of Red-shouldered Hawk. |
Predicted_WBNU_all.csv | The predicted occupancy of White-breasted Nuthatch. |
Variable name | Variable descriptions |
---|---|
Predicted | The predicted occupancy estimates of a given species. |
Workplaces | The percent mobility change in a county in NC compared to a base level pre-pandemic (higher values indicate lower changes and relatively higher mobility). |
Percent_forest_cover | The percentage of forest cover at each sampling location. |
Percent_develop_cover | The percentage of development cover at each sampling location. |
Lockdown | The category of lockdown period (1_Before lockdown; 2_During lockdown; 3_After lockdown). |
Sharing/Access information
Data was derived from the following sources:
- https://www.ebird.org
- https://www.google.com/covid19/mobility/
- https://www.nconemap.gov/datasets/ncagr::forest-land-cover-2016/about
Code/Software
R files available via our linked Zenodo submission.
There is one R file for the data processing and data analysis using American Crow as an example ("Anthropause_AMCR_02112024.R"). All other 14 species went through the same procedure and can be repeated by replacing the AMCR data with other species' data and replacing "AMCR" with other species' 4-letter codes. There is another R file for creating figures using the predicted occupancy ("Anthropause_figure_02112024.R").
The other R file ("Anthropause_figure_02112024") is for making figures of the predicted occupancy of AMCR, RSHA, HOSP, and WBNU.
Methods
The data source we used for bird data was eBird, a global citizen science project run by the Cornell Lab of Ornithology. We used the COVID-19 Community Mobility Reports by Google to represent the pause of human activities at the county level in North Carolina. These data are publicly available and were last updated on 10/15/2022. We used forest land cover data from NC One Map that has a high resolution (1-meter pixel) raster data from 2016 imagery to represent canopy cover at each eBird checklist location. We also used the raster data of the 2019 National Land Cover Database to represent the degree of development/impervious surface at each eBird checklist location. All three measurements were used for the highest resolution that was available to use.
We downloaded the eBird Basic Dataset (EBD) that contains the 15 study species from February to June 2020. We also downloaded the sampling event data that contains the checklist efforts information. First, we used the R package Auk (version 0.6.0) in R (version 4.2.1) to filter data in the following conditions: (1) Date: 02/19/2020 - 03/29/2020; (2) Checklist type: stationary; (3) Complete checklist; (4) Time: 07:00 am - 06:00 pm; (5) Checklist duration: 5-20 mins; (6) Location: North Carolina. After filtering data, we used the zero fill function from Auk to create detection/non-detection data of each study species in NC. Then we used the repeat visits filter from Auk to filter eBird checklist locations where at least 2 checklists (max 10 checklists) have been submitted to the same location by the same observer, allowing us to create a hierarchical data frame where both detection and state process can be analyzed using Occupancy Modeling. This data frame was in a matrix format that each row represents a sampling location and the columns represent the detection and non-detection of the 2-10 repeat sampling events.
For the Google Community Mobility data, we chose the “Workplaces” categoriy of mobility data to analyze the Anthropause effect because it was highly relevant to the pause of human activities in urban areas. The mobility data from Google is a percentage change compared to a baseline for each day. A baseline day represents a normal value for the day of the week from the 5-week period (01/03/2020-02/06/2020). For example, a mobility value of -30.0 for Wake County on Apr 15, 2020, means the overall mobility in Wake County on that day decreased by 30% compared to the baseline day a few months ago. Because the eBird data we used covers a wider range of dates rather than each day, we took the average value of mobility before lockdown, during lockdown, and after lockdown in each county in NC.
For the environmental variables, we calculated the values in ArcGIS Pro (version 3.1.0). We created a 200 m buffer at each eligible eBird checklist location. For the forest cover data, we used “Zonal Statistics as Table” to extract the percentage of forest cover at each checklist location’s 200-meter circular buffer. For the National Land Cover Database (NLCD) data, we combined low-intensity, medium-intensity, and high-intensity development as development covers and used “Summarize Within” to extract the percentage of development cover using the polygon version of NLCD. We used a correlation matrix of the three predictors (workplace mobility, percent forest cover, and percent development cover) and found no co-linearity. Thus, these three predictors plus the interaction between workplace mobility and percent development cover were the site covariates of the Occupancy Models. For the detection covariates, four predictors were considered including time of observation, checklist duration, number of observers, and workplace mobility. These detection covariates were also not highly correlated. We then merged all data into an unmarked data frame using the “unmarked” R package (version 1.2.5). The unmarked data frame has eBird sampling locations as sites (rows in the data frame) and repeat checklists at the same sampling locations as repeat visits (columns in the data frame).