Skip to main content
Dryad

Evaluating natural experiments in ecology: using synthetic controls in assessments of remotely-sensed land-treatments

Cite this dataset

Fick, Stephen; Nauman, Travis; Brungard, Colby; Duniway, Michael (2020). Evaluating natural experiments in ecology: using synthetic controls in assessments of remotely-sensed land-treatments [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqjt5

Abstract

Many important ecological phenomena occur on large spatial scales and/or are unplanned and thus do not easily fit within analytical frameworks which rely on randomization, replication, and interspersed a priori controls for statistical comparison. Analyses of such large-scale, natural experiments are common in the health and econometrics literature, where techniques have been developed to derive insight from large, noisy observational datasets. Here, we apply a technique from this literature, synthetic control, to assess landscape change with remote sensing data. The basic data requirements for synthetic control include: (1) a discrete set of treated and un-treated units, (2) a known date of treatment intervention, and (3) timeseries response data that includes both pre- and post-treatment outcomes for all units. Synthetic control generates a response metric for treated units relative to a no-action alternative based on prior relationships between treated and unexposed groups. Using simulations and a case study involving a large-scale brush clearing management event, we show how synthetic control can intuitively infer treatment effect sizes from satellite data, even in the presence of confounding noise from climate anomalies, long-term vegetation dynamics, or sensor errors. We find that accuracy depends on the number and quality of potential control units, highlighting the importance of selecting appropriate control populations. Although we consider the synthetic control approach in the context of natural experiments with remote sensing data, we expect the methodology to have wider utility in ecology, particularly for systems with large, complex, and poorly replicated experimental units.

Methods

Data was generated using simulation code from 10.5281/zenodo.4274935.

For each simulated dataset (NDVI timeseries of focal pixel and controls), four synthetic control methods were applied to the dataset, and the resulting post-treatment effect sizes were recorded. The attached data represents aggregations of the per-simulation error for each method, grouped by true effect size ('sigBin') and level of confounder intensity (conBin). 

Usage notes

These files are used in the 'analysis.R' script in the github repo.

Descriptions of files and columns:

'datFile.RData' : RData file containing a data.table object named 'D'

    - columns:

         - ID : Simulation ID, foreign key to merge with data.table 'R' in file `RUN.RData`

         - Method: CI (Causal Impact), DD (Diff in Difference), GS (gsynth) or IT (Interrupted Timeseries)

         - SigBin : Categorical bin for 'true' effect / or signal size. NA represents placebo (no effect)

         - conBin : Categorical bin for counfounder intensity

         - N : Number of post-treatment observations within a bin

         - CIgt0.ave : Average proportion of cases where the credible interval exceeds 0

         - ERR.abs : Average absolute error

         - ERR.ave : Average error

         - InCI.ave : Average proportion of cases where the true effect is within the credible interval

`RUN.RData` : RData file with data.table object `R` containing parameterization for each simulation run

          - sim : Simulation number. This value is used for set.seed() for reproduceability

          - type : landscape classification (grassland or forest)

          - sdNoise : Standard deviation of random noise added to signal

          - distrubance : Magnitude of initial drop in NDVI due to treatment

          - nControl : Number of control pixels available

          - misMatch: 0,.5, 1, -- the fraction of control pixels with mismatched landscape type

          - climSD : Climate effect SD

          - climCenter : Climate effect center

          - satLambdat : lambda value for satellite noise

          - rwSD : sd value for random walk

          - affinitySD : sd of affinity pixels have with each other

          - timeVaryingAffinitySD : sd of affinity over time

          - randConstantSD : sd of random constant to add to each pixel

          - overrideNoise : enable departure from landscape-type default sdNoise

          - auto_range : range of spatial autocorrelation function

          - auto_type : 

 

Additional synthetic control run data for appendix:

caseStudyData.csv:

  - unid : pixel unique id

  - id : treatment type (B = Burned, P = PileBurned, C = Control, M = Mastication)

  - SATVI*10000 : response variable

  - D : Whether disturbance has occurred (D= 1) or not (D = 0)

  - time : time point relative to disturbance

  - dayt : Date

  - ndvi : NDVI score

  - savi : SAVI score

  - CIpoint.effect : CausalImpact estimated effect size

  - CIpoint.effect.upper : upper Credible Interval for effect size

  - CIpoint.effect.lower : lower CI for effect size

  - CIcum.effect : cumulative CausalImpact estimated effect

  - CIcum.effect.upper: upper CI for cumulative effect

  - CIcum.effect.lower: lower CI for cumulative effect 

  - x.EPSG5070 : pixel x coordinate in Albers projection

  - y.EPSG5070 : pixel y coordinate in Albers projection