Replication data for impact evaluation of two large-scale forestry incentive programs in Guatemala
Data files
Aug 30, 2023 version files 76.68 KB
-
README.md
-
SI_Figure1_Figure2.R
-
SI_Figure6.R
Abstract
This study evaluates the impacts of over 16,000 individual PES projects funded through two incentive programs in Guatemala using synthetic controls to understand the impacts and trade-offs of large-scale forest conservation and tree planting programs. Data for the impact estimation was extracted from remotely sensed or modeled datasets, with datasets on forest extent, height, and loss extending from 2000 to 2020. Projects evaluated in this study were enrolled in incentives on a rolling basis, beginning in 2010, with most projects ending by 2018. Program impacts were estimated by comparing changes in forest cover, forest height and forest loss after treatment between the synthetic control and treated units.
Methods
Initial data on project location was collected from the Guatemalan Forestry Institute GIS portal https://sig.inab.gob.gt/portal/home/gallery.html. Project sites were accessed as point location data, and we estimate the area of each project site using available metadata. To generate a valid control, we randomly generated 100,000 untreated 3.1 hectare sites. Relevant data on forest cover and covariates such as population, accessibility to towns, elevation, and rainfall was accessed and extracted from gridded datasets for each treated and untreated site. We accessed these gridded datasets via Google Earth Engine where possible, but also downloaded the raw data from the source (info on specific dataset access in README file). The extracted data were combined into a panel dataset, which was cleaned before untreated sites were weighted to create the synthetic control. We provide initial raster files in .tif format and treated and untreated polygons as .shp files. Extracted tables, before they were combined into a panel dataset, is stored in .csv files. Final panel datasets for untreated and treated sites are also stored as .csv files. Outputs of the analysis are provided as .Rdata files, and we provide the scripts that combine and analyze the data.
Usage notes
Much of our unprocessed data is stored in .tif and shapefile formats. We extracted this data to .csv tables using ArcPro, a proprietary software, however the data format is open sources and there are many open source options for visualizing and processing this data, including QGIS, R (sf & terra packages), and Python (geopandas & fiona packages).
Tables in this analysis are all stored in .csv format, and we generally use R to clean, combine, and analyze these files. We use python via jupyter notbooks in one instance because the geopandas and matplotlib packages are useful for producing figures with spatial datasets. Our results files (synthetic control results) are stored in .Rdata files, which can be read into R using the load() function.