Data and code for: Tree canopy halves urban heat island effect globally but disproportionately benefits higher incomes and suburbs
Data files
Mar 30, 2026 version files 1.38 GB
-
DataDryadArchive_McDonald_et_al_final.zip
1.38 GB
-
README.md
25.90 KB
Abstract
Urban trees are proposed as a nature-based solution for urban heat island (UHI) mitigation. However, there has been no comprehensive global urban assessment of their ability to reduce air temperature (AT), with most multi-city estimates using instead land surface temperature. Here, we combine a global 1-km AT dataset with high-resolution land cover information to estimate the tree canopy cooling for 8,919 urban areas, housing 3.6 billion people. This archive contains data and code from this analysis, conducted by R.I. McDonald and colleagues.
All code and data used in the paper by Robert McDonald and colleagues, "Tree canopy halves urban heat island effect globally but disproportionately benefits higher incomes and suburbs", published in Nature Communications in 2026, is available on Data Dryad under the DOI: 10.5061/dryad.905qfttz0.
Abstract of the paper:
Although tree cover reduces the urban heat island, no global estimate quantifies air temperature reductions by contemporary or future tree cover, currently and with climate change. Here, we estimate these reductions for all 8,919 large urban areas. Current urban tree cover mitigates 41–49% of the maximum potential air temperature urban heat island that would occur in the absence of tree canopy. Tree canopy reduces summer air temperature by a population weighted mean of 0.15±0.03°C, with wide variation (0.0–2.7°C), benefiting 914 (805–1040, 95% CI) million people by >0.25°C. Cooling benefits are greater in already cooler areas: high income countries and suburbs. Current and plausible future tree cover mitigate only ~10% (6.7–18% and 6.3–17%, respectively) of the median mid century climate change warming under a moderate emission scenario. Our results suggest tree canopy expansion in densely settled low-income urban areas is necessary for equitable urban heat island (UHI) mitigation and climate adaptation.
In this paper we conduct a comprehensive global assessment of air temperature (AT) cooling benefits using a statistical model. Our analysis looks at all 8,919 large urban areas globally, which captures the full range of climatic, geographic, and socioeconomic contexts where urban populations reside. Our analysis workflow is shown in Figure 1. We assembled remotely sensed information for all large functional urban areas (FUAs) globally, including measurements of tree canopy cover and land surface temperature (LST), as well as gridded urban-resolving estimates of AT. Input datasets were used to fit a hierarchical statistical model that predicts the local AT anomaly given local tree canopy and other land cover, with parameters varying by climate zone and geographic position. We also supplemented this global analysis with simulations using a process-based model for three case study cities that span a climatic gradient from arid to humid, which allows us to capture the impact of urban trees on a more complete heat stress metric, namely the wet bulb globe temperature (WBGT), Finally, we estimated for all large cities the magnitude of the UHI and increases in summer AT due to climate change. The magnitude of these quantities was then compared with output from our statistical and process models. Our analysis answers three research questions: How much bigger would the AT UHI be on average without urban tree canopy? How much is AT cooling from current or plausible future urban tree canopy able to mitigate AT warming due to climate change? How equally are AT cooling benefits distributed with respect to climate, level of economic development, and population density?
Before attempting to replicate our analysis, users should carefully read the Methods section in McDonald et al. 2026, which contains citations to the datasets used, descriptions of our analytical methodology and algorithms, and much more technical information. In that methods section, our methodology is described in detail in seven sections: (1) Data sources; (2) GIS processing; (3) the global 1 km empirical model; (4) global scenarios analyzed with the empirical model; (5) the high-resolution process model; (6) estimation of UHI; and (7) estimation of climate change impacts.
This Data Dryad archive contains one Zip archive, called "DataDryadArchive_McDonald_et_al_final.zip".
Once the Zip archive is unpacked, users will find top-level folders for data and code. Within each, there are four folders, where there are all the input datasets or links needed to run the scripts and replicate our analyses. There are ReadMe files within each subfolder. At the request of Data Dryad, we have also collated much of that information here, but it may be clearer to read the individual ReadMe files. The four folders (within both the code and data folders) are:
1.) Google Earth Engine scripts= Used for LST estimation, and collating global datasets.
2.) Climate change analysis = Python scripts used to extract CMIP6 information.
3.) i-Tree Cool analysis = scripts to use the i-Tree Cool model.
4.) R scripts= Used for descriptive and statistical analysis, and to make figures.
Description of the data and file structure
To replicate our analysis, users would want to access the data in the following order. Within each folder, there are all the input datasets or links needed to run the scripts and replicate our analyses.
1.) Google Earth Engine scripts= Used for LST estimation, and collating global datasets.
As described in McDonald et al. 2026 in the Methods section, our spatial boundaries for urban areas for this project come from the Functional Urban Area definition of the JRC. Specifically, we used this version of the file:
https://human-settlement.emergency.copernicus.eu/ghs_fua.php
This dataset is used as input to the GEE scripts. Users will need to download, and may need to reformat the file in a GIS to match the requirements to input this as a spatial object for use in GEE. The GEE code then takes as input many other geospatial information sources, such as AT and LST.
We strongly encourage users who are adapting this code for future work to download the most recent version of the FUA boundaries before beginning.
2.) Climate change analysis
Input dataset to run the Climate Change Analysis code is drawn directly from publicly accessible data files on the internet for CMIP6 in Python scrips, referenced in the code, and so is not included here.
This folder contains the output from the Climate Change Analysis in McDonald et al. 2026. The files are in netCDF format, and are rasters of the median (50th percentile), 5th percentile, and 95th percentile of change in summertime temperatures in degrees C (with summer defined correctly for the Northern and Southern Hemispheres). Please see Methods section for details.
3.) i-Tree Cool analysis = scripts to use the i-Tree Cool model.
This folder contains input data to run the i-Tree Cool Air model. This model is part of the i-Tree HydroPlus package. Users wishing to replicate our analysis should first read the technical manual: https://www.itreetools.org/documents/871/HydroPlus_TechnicalManual_hXHL6TW.pdf
Note that the "base" case is current land cover, while the "alt" case is the hypothetical case of all tree cover loss. The "base_LAI2" and "alt_LAI2" case was an exploratory case where maximum LAI was limited in arid and semi-arid climates, to understand the sensitivity of model outputs to this parameter. This case was not used in our manuscript.
Here, and in the relevant code, "gsw" is Gothenburg, Sweden; "lpo" is Lisbon, Portugal; and "paz" is Phoenix, Arizona. "ht" refers to high tree cover, "mt" to moderate tree cover, and "lt" to low tree cover.
Users will need to unpack the Zip file, which has four subfolders, for the base, alt, and base_LAI2 and alt_LAI2. Each is structured identically, with nine subfolders that reference a specific city and a specific tree cover treatment (e.g., "gsw_ht" refers to Gothenburg in the high tree cover treatment).
Within each subfolder, there are multiple input files. Please see the iTree technical manual for discussion of the units and format of these data files:
blockgroup.asc = an ascii grid of the blockgroup identifier.
dem.asc = an ascii grid of the digital elevation model
HydroPlusConfig.xml = the configuration parameters used for that model run
imperviouscover.asc = an ascii grid of the impervious cover
landcover.asc = an ascii grid of the landcover
radiation.csv = information on solar radiation, every 6 hours, over the model run period.
treecover.asc = an ascii grid of the treecover
Weather.csv = information on weather at the site, every 6 hours, over the model run period.
z_AspectGround_N_0_rad.asc = An ascii grid of the aspect of each pixel, derived from the DEM.
z_FDorganizer.asc = An ascii grid of the flow direction of each pixel, derived from the DEM.
z_FlowAccum.asc = An ascii grid of the flow accumulation of each pixel, derived from the
DEM.
z_SlopeGround_rad.asc = An ascii grid of the slope of each pixel, derived from the
DEM.
z_TIorganizer.asc = An ascii grid of the topographic index of each pixel, derived from the DEM.
4.) R scripts= Used for descriptive and statistical analysis, and to make figures.
This folder contains R input data used for the statistical analysis of McDonald et al. 2026. Details on the analysis can be found in the Methods section of that paper.
Users will need to rerun our whole sequence of 7 R codes to completely replicate our analysis, beginning with the input files contained here. See the Code Readme.txt for details.
Input datasets for the R code series include:
Area_km.csv = Output from GEE code. The land area of each grid cell of the ~1km global grid.
AT_km.csv = Output from GEE code. The air temperature of each grid cell of the ~1km global grid. See Methods for details on the source of this data.
DEM_km.csv = Output from GEE code. The elevation, in m, of each grid cell of the ~1km global grid. See Methods for details on the source of this data.
LC_km.csv = Output from GEE code. The fraction land cover for tree cover and impervious of each grid cell of the ~1km global grid, calculated from Worldcover data. See Methods for details on the source of this data.
LST_km.csv = Output from GEE code. The Land Surface Temperature (in degrees C) of each grid cell of the ~1km global grid. See Methods for details on the source of this data.
Pop_km.csv = Output from GEE code. The population of each grid cell of the ~1km global grid. See Methods for details on the source of this data.
NDVI_km.csv = Output from GEE code. The normalized difference of vegetation index (NDVI) of each grid cell of the ~1km global grid. Not used in final manuscript, but included here for thoroughness.
ESA_km_other.csv = Output from GEE code. The fraction land cover for other land covers of each grid cell of the ~1km global grid, calculated from Worldcover data. See Methods for details on the source of this data.
Rur_km.csv = Output from GEE code. The rural reference temperature in Celsius for each FUA, used in calculating the urban heat island effect for Figure 7. See Methods for details on the source of this data.
EVI_km_global.csv = Global contextual information for each grid cell of the ~1km global grid
Naturescapes_30_FUA_table_just_eFUA_ID.csv = At one point in the code, we selected out 30 FUAs for visualization. There were 30 FUAs that were study sites for another project, called Naturescapes. Not used in final manuscript, but included here for thoroughness.
EVI_kn_AI_and_Koppen.csv = Aridity Index and Koppen values of each grid cell of the ~1km global grid. See Methods for details on the source of this data. This data was derived from GIS work in ArcGIS Pro.
world-bank-income-groups.csv = the World Bank income group categories for each country globally. See Methods for details on the source of this data.
income_ppp_2020.csv = the World Bank income, in per-capita USD2020 PPP, for each country globally. See Methods for details on the source of this data.
EVI_km_point_w_tasmax_constrained.csv = the results of the Climate Change Analysis code, processed through ArcGIS Pro, to calculate the daily max temperature in summer months in Celsius for a constrained set of GCM ensemble members, for each grid cell of the ~1km global grid. See Methods for details on the source of this data.
iTCA_Output (folder) = Output from the i-Tree Cool Air model for the base case, used in the script to process that data.
iTCA_Output_alt (folder) = Output from the i-Tree Cool Air model for the alt case (i.e., no tree cover), used in the script to process that data.
Metadata.csv = Metadata from the i-Tree Cool Air model output.
For completeness, we also include in this folder the geospatial information that came from the GEE analysis. This information, at least the attribute values, is the same as in many of the above datafiles, but the geospatial information allows spatial patterns to be visualized. Files include:
EVI_km.geojson
Rur_km.geojson
Sharing/Access information
All code and data used in the paper by Robert McDonald and colleagues, "Tree canopy halves urban heat island effect globally but disproportionately benefits higher incomes and suburbs", published in Nature Communications in 2026, is available on Data Dryad under the DOI: 10.5061/dryad.905qfttz0. Data is publicly available under a Creative Commons license. Please see the methodology section of the paper first if anything is unclear. If questions remain, feel free to ask Robert McDonald, corresponding author of the study (rob_mcdonald@tnc.org).
Code/Software
The Code folder contains input data for use running the code used in the analysis of McDonald et al. 2026. We used code that is interpreted in languages like R, Python, or Google Earth Engine (GEE).
To replicate our analysis, users would want to run the codes in this order.
1.) Google Earth Engine scripts= Used for LST estimation, and collating global datasets.
This folder contains code used to estimate Land Surface Temperatures (LST) from satellite imagery, as well as code to collate global information from multiple global datasets to a common roughly 1 km grid. The code is for use on the Google Earth Engine (GEE) platform, and is in the format of JavaScript code. Users will need to have gone through the steps to get Earth Engine access, as described here:
https://developers.google.com/earth-engine/guides/access
The basics of running code on GEE is described here:
https://developers.google.com/earth-engine/guides/getstarted
Note that since these scripts are drawing input data from the GEE platform, we do not include much input data with this data archive specifically for use in GEE. The exception is a polygon file that shows the boundaries of the Functional Urban Areas. The extent of these FUAs is then used to set up the common analysis grid within GEE. For comprehensiveness, we have included this file in the data package, although users are encouraged to go download the most recent version of the FUA boundaries for future projects.
Most users will want to run the scripts in the following order:
1km_variables.js = set up common analysis grid and extract information.
ESAall.js = extract land cover information from the Worldcover dataset, for all land cover types.
RuralATLST.js = Estimate the rural background temperature for both air temperature (AT) and land surface temperature LST) using the SUE algorithm.
Buffers.js = Estimate the rural background temperature for both air temperature (AT) and land surface temperature LST) using the buffer algorithm.
Note that the output tables and geospatial information created are shown in the input data folder for the R code, because they are used as input files to our data processing in R.
Processing time on Google Earth Engine depends on the number of other commands executing on the platform at the time, among other factors. In our experience, it can take up to 10-20 hours for the GEE platform to fully complete some of these scripts.
2.) Climate change analysis = Python scripts used to extract CMIP6 information.
This analysis used Jupyter Notebook to run python code to extract CMIP6 information. Users will need at a minimum to have a working implementation of Python, and ideally an ability to run Jupyter notebooks.
Python and Jupyter Notebooks can typically be installed in under 5 minutes. There are different ways to install this software, but one common way to install them can be found here:
https://jupyter.org/install
CMIP6_ECS_TCR_FIrstSheetOnly.xlsx is a list of GCM models used in our ensemble analysis. See McDonald et al. 2026 for details on why this ensemble was chosen. It is called in the code.
Most users will likely want to run the code in this order:
cmip6_nexgddp_hist_ssp245_tasmax_hemispheric_summers_change.ipynb = Extracts relevant temperature data from CMIP6, for the summer months.
Load_Map_delta_tasmax_cmip6_nexgddp_constrain_ecs_likely.ipynb = Processes extracted data.
All input datafiles needed to run the code sequence are included in this data archive. Individual python scripts take less than 3 minutes to run on a normal desktop computer, although this speed can vary depending on your internet connection and how busy the data server is with other commands. The output files created (in netCDF format) are included in the ClimateChangeAnalysis data folder, for completeness.
3.) i-Tree Cool analysis = scripts to use the i-Tree Cool model.
The code in this folder relates to the i-Tree Cool Air analysis, described in McDonald et al. 2026. Please see the Methods section for details of this analysis.
Users will need to download the i-Tree HydroPlus suite, revision 1659 or similar. This is freely available at: https://www.itreetools.org/tools/research-suite/hydro-plus.
In this case the model is run using Python code, so users will need a working Python implementation. Python and Jupyter Notebooks can typically be installed in under 5 minutes. There are different ways to install this software, but one common way to install them can be found here:
https://jupyter.org/install
Python can typically be installed in under 5 minutes. The i-Tree HydroPlus suite should download and install in less than 5 minutes.
Users will need to download the input datasets located in the data section of this archive. They will then likely want to run the code in the following order, updating the paths to point to the correct location of their input datasets.
p_tnc_worldview_resample.py.txt = Sets up input grids for use in i-Tree Cool.
p_tnc_TC_Zero_IC_Increase.py.txt = Run i-Tree Cool air model for base and alt cases.
p_iTCA_BlockGroupDaily_Statistics_degC_multi.py.txt = Process i-Tree Cool air model output to calculate summary statistics.
Note that the "base" case is current land cover, while the "alt" case is the hypothetical case of all tree cover loss. The "alt_LAI2" case was an exploratory case where maximum LAI was limited in arid and semi-arid climates, to understand the sensitivity of model outputs to this parameter. This case was not used in our manuscript.
Here, and in the relevant code, "gsw" is Gothenburg, Sweden; "lpo" is Lisbon, Portugal; and "paz" is Phoenix, Arizona. "ht" refers to high tree cover, "mt" to moderate tree cover, and "lt" to low tree cover.
All input datafiles needed to run the code sequence are included in this data archive. Individual python scripts can take 5-10 minutes to run on a typical desktop computer. Output files are included in the R data folder, as they are input to the R code.
4.) R scripts= Used for descriptive and statistical analysis, and to make figures.
This folder contains R code used for the statistical analysis of McDonald et al. 2026. Details on the analysis can be found in the Methods section of that paper.
This code was written in R version 4.5.0, using RStudio version 2025.05.1 Build 513. Users will need a working version of R. While RStudio is optional, some figures may look different in base R than in RStudio.
R can typically be installed in under 3 minutes. RStudio may take an additional 5 minutes to install.
Instructions on installing RStudio and R are here:
https://posit.co/download/rstudio-desktop/
Users should download the input dataset from elsewhere in this folder. They will then need to alter the paths and filenames in the code to be able to correctly input the data in R.
Note that the R codes reference numerous R packages and libraries. These are all freely available for download through R, and are each loaded in the scripts they are needed in. Libraries used include:
tidyverse = A collection of packages for data science. Includes many of the packages below. https://www.tidyverse.org/
Hmisc = Harrell Miscellaneous functions for data manipulation. https://cran.r-project.org/web/packages/Hmisc/index.html
plyr = Tools for splitting, applying, and combining data. https://cran.r-project.org/web/packages/plyr/index.html
lubridate = Working with dates and times in R. https://lubridate.tidyverse.org/
data.table = Used to perform aggregations by group. https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
DescTools = Tools for descriptive statistics. https://cran.r-project.org/web/packages/DescTools/index.html
ggplot2 = Used for graphics for several of our figures. https://ggplot2.tidyverse.org/
dplyr = data frame manipulation package. https://dplyr.tidyverse.org/
gridExtra = Used for multi-panel graphics. https://cran.r-project.org/web/packages/gridExtra/index.html
spatstat.geom = Spatial statistics. https://cran.r-project.org/web/packages/spatstat.geom/index.html
zoo = Time and other data series manipulation. https://cran.r-project.org/web/packages/zoo/index.html
ggrepel = Used to make labels not overlap in figures. https://ggrepel.slowkow.com/
stringr = functions for working with strings. https://stringr.tidyverse.org/
forcats = tools for working with categorical variables. https://forcats.tidyverse.org/
hrbrthemes = Used for aesthetic considerations in figures in gggplot2. https://github.com/hrbrmstr/hrbrthemes
viridis = Colorblind-friendly color maps for R. https://cran.r-project.org/web/packages/viridis/index.html
Most users will want to run the code in the following sequence. Later codes in this sequence import objects or CSV files created by earlier codes:
- merge_TC_data_files_v3_0.R = Input files from GEE created by TC Chakraborty, and merge
- analysis_descriptive_v2_23_for_FigureA.R = Descriptive analysis of the data, and creation of figure.
- anomally_analysis_v0_96.R = Calculation of the anomaly, the observed temperature minus the average in a FUA, and a regression relating this anomaly to predictive variables. This lm object and other outputs are saved for later scripts.
- total_loss_scenariov0_97_no_crop_expansion.R = Takes the lm object from anomally_analysis_v0_95 and calculates temperatures under the total loss scenario.
- 10_per_increase_scenario_v0_95_no_crop_expansion.R = Takes the lm object from anomally_analysis and calculates temperatures under the 10% gain scenario to calculate TCE.
- max_realistic_increase_scenario_v0_98_no_crop_expansion.R = Takes the lm object from anomally_analysis and calculates temperatures under the maximum plausible gain scenario.
- new_FigureE_v1_31.R = A script to calculate waterfall plot figure.
- CoolAir_analysisv0_6.R = A script to input multiple files from i-Tree Cool Air output and make associated figures and summary for the paper.
- new_FigureD_v1_01.R = A script to calculate a figure for the paper.
- SupplementaryFigure_UHI_vs_Latitude_v1_01.R = A script to calculate a figure for the paper.
All input datafiles needed to run the code sequence are included in this data archive. Individual R scripts can take from 5-10 minutes of run time to complete on a normal desktop computer, particularly when reading and writing large datasets or merging large datasets. The data objects created by this script can range in size from 5GB-30GB, and because of how R manages memory usage computers with less than 8GB of RAM may be significantly slower running this code. The code was developed on a Windows computer with 64GB of RAM, and individual R scripts took from 5-10 minutes of run time to complete, particularly when reading and writing large datasets or merging large datasets.
Note that while developing the manuscript, we used letters to refer to figures/tables, since we were not clear what order they would appear in the manuscript. Users should be able to compare the figure generated in R with the published manuscript and easily see the correspondence.
This series of scripts creates as output the figures and tables of McDonald et al. 2026, as well as calculates the facts and numbers written in the manuscript text. In some cases further processing of figures was needed. So for instance for maps, tabular data was joined to the JSON files to produce spatial representations of the data.
The analysis proceeded in 5 main steps. First, we assembled global data from a variety of sources. Second, the data was processed in a GIS to a common analysis grid. Third, we conducted a statistical analysis of the relationship between tree cover and temperature, estimating the effectiveness of tree canopy for cooling and the total cooling intensity. Fourth, we compiled the most current projections of climate changes impact on temperature. Each step of the analysis is described in more detail below.
This Data Dryad archive contains input data and code for use in the analysis of McDonald et al. We used code that is interpreted in languages like R, Python, or Google Earth Engine (GEE).
To replicate our analysis, users would want to run the codes in this order.
1.) Google Earth Engine scripts= Used for LST estimation, and collating global datasets.
2.) Climate change analysis = Python scripts used to extract CMIP6 information.
3.) i-Tree Cool analysis = scripts to use the i-Tree Cool model.
4.) R scripts= Used for descriptive and statistical analysis, and to make figures.
Before attempting to replicate our analysis, users should read the Methods section in McDonald et al. 2026, which contains citations to the datasets used, descriptions of our analytical methodology and algorithms, and much more technical information.
