Adapting to urban heights: Flexible nest-site selection strategies of a human commensal bird along climate and anthropogenic gradients
Data files
May 15, 2026 version files 133.46 KB
-
Appendix_Core_R_Code_.txt
3.22 KB
-
NestHeight_SiteMean_raw-1.csv
123.88 KB
-
README.md
6.36 KB
Abstract
Urbanization is reshaping ecosystems worldwide, driving wildlife to navigate and adapt to novel and highly dynamic environments. The Eurasian tree sparrow (Passer montanus, ETS) serves as an exemplary human commensal, thriving in cities through exceptional behavioral and ecological flexibility. Here, we systematically investigated the nest-site selection strategies of ETSs across 645 buildings across 22 cities in northern China, integrating climatic, geographic, biotic, and anthropogenic variables at a macroecological scale. We found that both the availability and use of nest sites increased with building height, underscoring ETSs’ capacity to exploit vertical resources in dense urban landscapes. Notably, the preference for lower nest heights when nest sites were abundant suggests a strategy to reduce intraspecific competition and energy expenditure. Negative associations between nest-site use or preference and the normalized difference vegetation index indicate that ETSs favor anthropogenic over vegetated resources, likely to circumvent interspecific competition in urban green spaces. Additionally, altitudinal gradients modulated ETSs’ nesting responses: at lower elevations, higher building heights promoted nesting, whereas increased economic development (gross domestic product per cell) and noise suppressed it—signaling an avoidance of intense anthropogenic disturbance. Conversely, ETSs showed reduced competition at higher altitudes and increasingly relied on resources linked to urban prosperity. These findings reveal a nuanced behavioral adaptability in ETSs, allowing them to navigate trade-offs between anthropogenic benefits and environmental stressors across spatial gradients. Our study offers critical insights into the eco-behavioral mechanisms underlying urban adaptation and the evolution of commensalism, with important implications for biodiversity management and sustainable urban design.
Dataset DOI: 10.5061/dryad.rn8pk0ppz
Description of the data and file structure
The data collection involved systematic field surveys of Eurasian Tree Sparrow nests (availability and occupancy) on buildings across 22 northern Chinese cities over two breeding seasons, coupled with extensive collection of corresponding environmental data (climate, vegetation, human population, economic activity, light, noise, and building characteristics) from both field measurements and global/spatial databases, all georeferenced to the nest locations.
Files and variables
File: Appendix_Core_R_Code_.txt
Description: Plain-text file containing the core analysis code (SEM, mixed-effects modeling, visualization).
Can be opened with any text editor. See the “Code/software” section for required packages and workflow.
File: NestHeight_SiteMean_raw-1.csv
Description:
Variable Name (in raw data) | Description | Unit | Source
- Building_ID | Unique identifier for the building | categorical | Field collection
- City_ID | Identifier for the city (1-22) | categorical | Field collection
- Building_width | Width of the building (used for density calculation) | meters (m) | Field collection
- GPS_latitude | Latitude of the building | decimal degrees | Field collection
- GPS_longitude | Longitude of the building | decimal degrees | Field collection
- Building_height | Height of the building | meters (m) | Field collection
- Number_potential_nests | Count of potential nest sites (air conditioner inlets, etc.) on the building | count | Field collection
- Number_occupied_nests | Count of nests occupied by ETS on the building | count | Field collection
- NDa | Available nest density: Number_potential_nests / Building_width | nests per meter (nests/m) | Calculated
- NDo | Occupied nest density: Number_occupied_nests / Building_width | nests per meter (nests/m) | Calculated
- NHP | Nest height preference: Average height of occupied nests (weighted by density per building) | meters (m) | Calculated (from nest height measurements)
- Altitude | Elevation above sea level at the building location | meters (m) | WorldClim (or derived from a DEM)
- Bio10 | Mean air temperature during the warmest season (summer) | °C | WorldClim
- Bio18 | Precipitation during the warmest season (summer) | mm | WorldClim
- Wind_speed | Wind speed during the warmest season (summer) | m/s | WorldClim (or other source)
- NDVI | Normalized Difference Vegetation Index (average during warmest season) | dimensionless (range -1 to 1) | NASA VIP dataset
- NPP | Net Primary Productivity (average during warmest season) | kg/m2/s? (But note: typical units for NPP are kg/m2/year. The paper lists "kg/m2/s1", which might be a typo. The source is "CMIP6 data archive", so we use the unit from that dataset) | CMIP6 data archive
- Nest_direction | Direction of the nest (average or main direction for the building? The paper says recorded per nest) | categorical (e.g., N, NE, etc.) or degrees | Field collection (then averaged per building?)
- Population_density | Human population density | persons per km2 | GPWv4
- GDP_per_cell | Gross Domestic Product per grid cell | USD | Kummu et al., 2018
- ALAN | Artificial Light at Night (digital number) | DN (0-63) | Harmonized DMSP/VIIRS
- Urban_land_extent | Fraction of urban land in the grid cell | fraction (0-1) | Gao and Pesaresi, 2021
- Noise_level | Measured noise level (at ground level below nests) | decibels (dB) | Field measurement (Sound Meter app)
*Note: The R code uses "nd" for NDVI, "gdpp" for GDP_per_cell, and "alt_group_quantile" for a grouped version of altitude.
Missing Values: Coded as NA in all datasets
Coordinate System: WGS84 (EPSG:4326) for spatial data
Temporal Coverage: Field surveys: April-July 2021-2022
Environmental variables: 2000-2020 (see sources for specifics)
Statistical Transformations:
- Nest density variables weighted per SEM requirements
- Variables scaled (0-1) in R analysis
- Altitude stratified into 5 quantile groups for variance modeling
Code/software
I. Free/Open-Source Software for Data Viewing
Tabular Data (CSV):
- LibreOffice Calc (v7.0+), GNOME Calculator, or any text editor
- Python/R with pandas (no specialized tools needed)
Geospatial Data (TIFF/SHP):
- QGIS (v3.28+): Open-source GIS platform for raster/climate data and shapefiles
- Alternative: R terra/sf packages or Python geopandas/rasterio
Statistical Outputs:
- R/RStudio for SEM/mixed-model results
- Free PDF viewers for supplementary figures/tables
II. Analysis Software & Packages (with Versions)
Core environment: R v4.1.0+ with the following packages:
Package Version Purpose Critical Functions
- nlme 3.1-164 Mixed-effects modeling lme(), varIdent()
- piecewiseSEM 2.3.0 Structural Equation Modeling psem(), summary()
- interactions 1.1.5 Johnson-Neyman analysis johnson_neyman(), interact_plot()
- raster 3.5-15 Geospatial processing brick(), crop(), mask()
- sf 1.0-12 Spatial vector handling st_read(), spatial ops
- ggplot2 3.4.0 Visualization Base plotting system
- ggMarginal 0.3.0 Marginal distribution plots ggMarginal()
- gridExtra 2.3 Multi-panel layouts grid.arrange()
Working directory: Before running the script, set the R working directory to the folder containing the data file (NestHeight_SiteMean_raw.csv) and all required spatial datasets. The script assumes all input files are located in this directory.
III. Code Description & Workflow
The analysis script (Appendix_Core_R_Code_.txt) performs the following steps:
- Data Preparation – Load and crop spatial climate data to the study area.
- Mixed-Effects Modeling – Fit linear mixed-effects models with altitude-stratified variance structures.
- Structural Equation Modeling – Build and reduce a piecewise SEM using the fitted models.
- Interaction Analysis – Conduct Johnson-Neyman tests for interactions (e.g., GDP × altitude).
- Visualization – Generate publication-quality figures with marginal histograms.
The script contains detailed comments explaining each block of code. To reproduce the analyses, run the script line-by-line or source it in R after installing the required packages.
Materials and Methods
2.1 Study Area and Species
We investigated the nest usage and preferences of the Eurasian tree sparrow (Passer montanus, ETS) across 22 cities in northern China (Fig. 1 in associated manuscript). These cities encompass a diverse range of natural environmental and socioeconomic conditions, including variations in elevation (-12 to 2328 m), temperature (12.2 to 20.5 °C during the breeding season), precipitation (24 to 110 mm during the breeding season), and human population density (37 to 1334 persons/km2). This diverse gradient provided a unique opportunity to examine the impact of both natural and anthropogenic factors on ETS nesting behavior.
2.2 Nest Site Surveys
Nest site surveys were conducted during the breeding seasons (April to July) of 2021 and 2022. Each city was systematically surveyed to identify ETS nests in urban areas. Potential nest sites, including building dimensions (height and width), air conditioner inlets, and other urban infrastructure, were inspected for ETS nest presence. Surveyors recorded GPS coordinates, height above ground, and the type of structure used for nesting.
The direction of the nests was observed and recorded on-site. Available nest density (NDa) was calculated as the number of potential nest sites (e.g., open air conditioner inlets) per meter along each building (nests/ building width). Similarly, occupied nest density (NDo) was determined as the number of nests actively used by adult ETSs or identified by the presence of nesting materials and eggs along each building (nests/ building width). Nest height preference (NHP) was calculated as the average height of occupied nests, weighted by density for each building.
2.3 Environmental Data Collection
Climatic and geographic variables were derived from a variety of reputable sources. Average air temperatures, precipitation, and wind speed during the warmest seasons that are critical for the breeding season, along with elevation data, were sourced from the WorldClim2 database (Fick & Hijmans, 2017), providing a comprehensive set of climatic parameters (Table S1). The normalized difference vegetation index (NDVI) during the warmest seasons was calculated from NASA's vegetation index and phenology (VIP) dataset, average over the period between 2000 and 2010 (0.05 Deg CMG).
Socioeconomic data were collected from authoritative databases. Population density was accessed through the Center for International Earth Science Information Network (CIESIN) at Columbia University, using the Gridded Population of the World, Version 4 (GPWv4). GDP per cell (0.083° resolution) data were sourced from a detailed gridded global dataset spanning 1990–2015 (Kummu et al., 2018) (Table S1). Noise levels were measured using the Sound Meter application (v 2.5.2) of the smartphone at the ground level directly below the nest sites, maintaining a vertical distance of 5 meters from the building facade. Each recording session lasted 1 min, with data collection consistently conducted between 8:00 h and 10:00 h daily. Artificial light at night data was accessed from the Harmonization of DMSP and VIIRS nighttime light data (Li & Zhou, 2017). The unit of measurement for DMSP NTL data is the digital number (DN), which represents the intensity of nighttime lights detected by the satellite sensor; values range from 0 to 63, with 0 indicating no detected light and 63 representing the maximum detectable light intensity. Urban land extent, representing the fraction of urban land, was acquired from SSP-consistent Global Spatial Urban Land Projections from 1/8-degree to 1-km resolution 2000-2100 (Gao & Pesaresi, 2021). All climatic, geographic, biotic, and socioeconomic data were extracted based on the geographic coordinates of each building.
2.4 Structural Equation Model Building
We employed a mixed-effects linear model to analyze the relationships between nest-related variables (NDa, NDo, and NHP) and environmental factors. Fixed effects included climatic and geographic factors (temperature, precipitation, wind speed, and altitude), biotic factors (NDVI, NPP, and nest direction), and anthropogenic factors (population density, GDP per cell, artificial light at night, urban land extent, building height, and noise level). Random effects were incorporated to account for city-specific variations. To accurately estimate the selection effect of nest use, we weighted 1/(NDa) for NDo and NDo for NHP. To address heterogeneity of variances across altitudinal gradients, given the data skew toward low and mid altitudes with fewer high-altitude observations – we stratified altitude into five quantile-based groups of equal sample size. We then implemented a variance-structured linear mixed model (LMM) via lme function in 'nlme' package (Pinheiro et al., 2012), incorporating group-specific residual variances using varIdent(form = ~1 | alt_group_quantile).
Our strategy for SEM model modification involved systematically screening significant interconnections by performing piecewise tests of directed separation, using the 'piecewiseSEM' package (Lefcheck, 2016). Non-significant variables were iteratively removed, ensuring that all significant predictors were retained in the model. The model’s goodness of fit (GOF) was evaluated using Fisher's C statistic (with P > 0.05 indicating a good fit) and the Akaike Information Criterion (AIC), with lower scores indicating better model performance (Shipley, 2013). Standardized coefficients were used to assess the effects of predictors across varying scales. Subsequently, a full-factor structural equation model (SEM) was constructed by incorporating all significant predictors identified in the preliminary analysis (Table S2). To simplify the model, factors with partial R2 < 0.3 were iteratively excluded until the simplified SEM met the following criteria: (1) all retained factors exhibited partial R2 > 0.3, and (2) the model achieved the lower Fisher’s C statistic, indicating optimal parsimony (Table S3). The best-fitting SEM achieved the lowest Fisher's C statistic and AIC score (Table S4).
2.5 Interaction Analysis for Nest Use across Altitudes
To explore the effects of altitude on NDo and NHP, interaction terms for climatic, geographic, biotic, and anthropogenic factors were added to the base models. Interaction models were ranked by AIC, and the most informative model was selected. The Johnson-Neyman technique was used to estimate the range of significant slopes for interactions, implemented using the interactions package (Johnson & Fay, 1950; J. A. Long, 2019).
References
Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302–4315. https://doi.org/10.1002/joc.5086
Gao, J., & Pesaresi, M. (2021). Downscaling SSP-consistent global spatial urban land projections from 1/8-degree to 1-km resolution 2000–2100. Scientific Data, 8, 281. https://doi.org/10.1038/s41597-021-01052-0
Johnson, P. O., & Fay, L. C. (1950). The Johnson-Neyman technique, its theory and application. Psychometrika, 15(4), 349–367. https://doi.org/10.1007/BF02288864
Kummu, M., Taka, M., & Guillaume, J. H. A. (2018). Gridded global datasets for Gross Domestic Product and Human Development Index over 1990–2015. Scientific Data, 5, 180004. https://doi.org/10.1038/sdata.2018.4
Lefcheck, J. S. (2016). piecewiseSEM: Piecewise structural equation modelling in R for ecology, evolution, and systematics. Methods in Ecology and Evolution, 7(5), 573–579. https://doi.org/10.1111/2041-210X.12512
Li, X., & Zhou, Y. (2017). A stepwise calibration of global DMSP/OLS stable nighttime light data (1992–2013). Remote Sensing, 9(6), 637. https://doi.org/10.3390/rs9060637
Long, J. A. (2019). interactions: Comprehensive, user-friendly toolkit for probing interactions. R package version 1.1.0. https://cran.r-project.org/package=interactions
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., & R Core Team. (2012). nlme: Linear and nonlinear mixed effects models. R package version 3.1-103. https://CRAN.R-project.org/package=nlme
Shipley, B. (2013). The AIC model selection method applied to path analytic models compared using a d‑separation test. Ecology, 94(3), 560–564. https://doi.org/10.1890/12-0976.1
