Data from: Meteorological versus spatial drivers of the spatial synchrony of forest insect pest outbreaks in North America
Data files
Dec 16, 2025 version files 654.29 MB
-
comparative-synchrony.zip
654.27 MB
-
README.md
24.87 KB
Abstract
Population spatial synchrony has major consequences for the impacts of forest insect pest outbreaks at regional scales. We tested the predictions that the strength and drivers of this synchrony would differ among species according to their dispersal abilities and feeding guild. Using a matrix regression approach, we statistically partitioned the importance of spatial and environmental drivers of outbreak synchrony in six species of phloem-feeding bark beetles and six species of defoliating Lepidoptera in North America. Potential drivers included in the regressions were synchrony of weather conditions and spatial proximity. Overall, model selection operations on the matrix regressions indicated that synchrony in the outbreaks of forest insect pests arises from a combination of spatial drivers such as dispersal and synchrony of weather, also known as Moran effects. Moran effects appeared to be more important in driving the synchrony of bark beetles than defoliators, possibly because weather (e.g., drought) has stronger impacts on bark beetle outbreak dynamics. Nonparametric spatial correlation functions showed that defoliators exhibited stronger synchrony over short distances than bark beetles, possibly because the cyclical nature of defoliator populations allows them to be more easily synchronized. The greater influence of Moran effects on the synchrony of bark beetles compared to defoliators, coupled with climate-change-driven increases in synchrony of weather, may lead to more widespread events of high tree mortality due to bark beetle epidemics.
Dryad DOI: https://doi.org/10.5061/dryad.ht76hdrqc
General information
These data and scripts were used to produce results and figures for the manuscript titled "Factors driving the spatial synchrony of forest insect pest outbreaks in North America."
Spatial synchrony in outbreaks was assessed based on annual aerial surveys of defoliation or tree mortality from bark beetles, conducted from 1997-2020 in the US (except 1997 to 2019 for the spongy moth and 1999-2020 for Douglas-fir tussock moth) and 1999-2020 in Canada. The defoliation and tree mortality data were obtained from the US Forest Service Forest Health Protection National Insect and Disease Detection Survey
(https://www.fs.usda.gov/foresthealth/applied-sciences/mapping-reporting/detection-surveys.shtml) and the National Forest Pest Strategy Information System (Canada). The defoliation and tree mortality data were obtained from the National Insect and Disease Detection Survey (US) and the National Forestry Database (Canada). Data from 1997 and 1998 from Canada were excluded due to inconsistencies in surveying during a transition from federally administered surveys to provincial surveys. For the data in both the U.S. and Canada, all damage maps were initially recorded as polygons but these data were then converted to raster layers representing the presence or absence of defoliation or tree mortality within 250 × 250 m (US) or 500 × 500 m cells (Canada), which were then used aggregated to a common scale of 25 × 25 km cells with the total number of cells damaged computed for each 25 × 25 km cell. The number of cells damaged per 25 × 25 km cell was considered a proxy of pest density. In Manitoba and the Northwest Territories, areas within National Parks are not surveyed. Therefore, cells overlapping with National Park land in these two provinces were excluded. To avoid singular correlation matrices when computing synchrony, we excluded all cells for a given species with < 3 years of damage. First-order differencing of the time series of damage was then performed, yielding annual changes in damage levels, to remove any long-term trends because such trends shared between time series could inflate estimates of synchrony.
Spatial synchrony in weather was assessed using historical estimates of monthly precipitation and mean monthly minimum and maximum temperatures in each of the cells. Weather data for both the USA and Canada were obtained from a Natural Resources Canada dataset generated by the ANUSPLIN climate mapping model (McKenney et al. 2011). These data were collected in raster format at a resolution of 2 × 2 km and then averaged across the same 25 × 25 km cells in which pest species damage was quantified. Separate annual time series were prepared for each of 36 weather variables comprising the means (temperatures) or totals (precipitation) of the three weather variables for each month. Principal components analysis was used to collapse this large number of weather variables into a smaller number of uncorrelated variables. We used the time series of the (first-order differenced) scores from the three dominant principal components of weather in our analyses of the synchrony of weather. Three explanatory matrices were used to represent the synchrony in weather. The elements in each of these matrices were
the pairwise synchronies (Spearman rank correlations) of the scores of the (differenced) first, second, and third principal components of weather, respectively.
The spatial structure of the sample locations was represented in proximity (distance) matrices. Pairwise proximity values were calculated as the inverse power function 𝑐𝑑𝑖,𝑗−𝑏, where 𝑑𝑖,𝑗 is the Euclidean distance between cells i and j, c is a scaling constant, and b determines the rate of decay with increasing distance. The values of c were obtained as regression coefficients from the MRM models, with the value of b arbitrarily set at 1.
We compared synchrony of pest damage at lag distances of 25 km (the shortest distance between any two sample locations), 100 km, 200 km, and 300 km. Synchrony at these lag distances was estimated using the “Sncf” function from the R package ‘ncf’ (Bjørnstad 2022). Synchrony values for lag distances beyond a threshold of one-half of the maximum distance between sample locations were omitted.
Note: Throughout this repository, we refer to defoliation or tree mortality as "defoliation" for shorthand.
Note: Some of these analyses are time-consuming or are not able to run on a normal computer, esp. the MRM analysis. Therefore, we have included most of our derived products here as well as our raw data.
References
Bjørnstad, O. N. 2022. ncf: Spatial Covariance Functions.
McKenney, D. W., Hutchinson, M. F., Papadopol, P., Lawrence, K., Pedlar, J., Campbell, K., Milewska, E., Hopkinson, R. F., Price, D. and Owen, T. 2011. Customized Spatial Climate Models for North America. - Bulletin of the American Meteorological Society 92: 1611–1622.
Folder Organization
- All R and Slurm scripts are stored in the "scripts" folder
- Raw and derived data are stored in the "data" folder. Subfolders are named for the types of data they contain, and each folder contains a metadata readme file.
- Results of the primary analyses are stored in the "results" folder. Subfolders break down results into the stages of our analyses (PCA, sncf, and mrm).
- Figures created in these scripts are exported to the "figures" folder. Subfolders here are similar to the results folder.
Throughout, the folder will indicate file contents, while the file names indicate the data source (USA or Canada), an underscore, followed by their species code as found in "./data/species_codes.csv". As an example, "ca_dfb.csv" under the folder
./data/defoliation/synchrony_matrices/ would contain the synchrony matrices for the Douglas fir beetle in Canada. Data folders also contain a text metadata readme file explaining their contents in more detail.
Also note that data from Canada and the USA were provided in different projections. More information can be found in the README files within the data subfolder.
Species information/file naming codes
| File code | Common Name | Scientific Name | ITIS TSN | Damage Type |
|---|---|---|---|---|
| mpb | Mountain pine beetle | Dendroctonus ponderosae | 114918 | Bark beetle |
| sb | Spruce beetle | Dendroctonus rufipennis | 114921 | Bark beetle |
| dfb | Douglas-fir beetle | Dendroctonus pseudotsugae | 114919 | Bark beetle |
| wpb | Western pine beetle | Dendroctonus brevicomis | 114913 | Bark beetle |
| wbbb | Western balsam bark beetle | Dryocoetes confusus | 114927 | Bark beetle |
| esbw | Eastern spruce budworm | Choristoneura fumiferana | 117862 | Defoliator |
| wsbw | Western spruce budworm | Choristoneura freemani | Defoliator | |
| tcbw | Two-year cycle budworm | Choristoneura biennis | Defoliator | |
| jpbw | Jack pine budworm | Choristoneura pinus | 117864 | Defoliator |
| fe | Fir engraver | Scolytus ventralis | 114953 | Bark beetle |
| dftm | Douglas fir tussock moth | Orgyia pseudotsugata | 939675 | Defoliator |
| ftc | Forest tent caterpillar | Malacosoma disstria | 117544 | Defoliator |
| sm | Spongy moth | Lymantria dispar | Defoliator |
*ITIS TSN - Integrated Taxonomic Information System
Instructions
This software requires R to run, and was initially created in R 4.4.0.
Scripts are all in the folder "scripts", and are numbered roughly in the order they should be run. Each script contains a header explaining what it does, its inputs, and its outputs.
Optional: "install_packages.R" - Exactly what it says. In particular, "mms" is installed directly from GitHub rather than an R repository.
Optional: Scripts 0 to 4 download and set up the data for the primary analyses. This includes the Principal Components Analysis of weather data, which is used as input for later analyses. These have already been completed with the data you downloaded, so you only need to run these scripts if you are interested in these steps. In that case, start with script 0 to download the raw data you will need.
Scripts 5 to 10 perform the main sncf and mrm analyses and compile the results. These are likely of main interest to most readers.
Note that script 9 was run on the UVA computer cluster via the ".slurm" script. This analysis is unlikely to run properly on your own computer due to memory constraints.
"helper_functions.R" contains functions used in script 9. It does not need to be run on its own.
Data Descriptions
Folder: Data
File: species_codes.csv
A simple table containing the information on each species displayed above, but in a machine-readable format. This file is used by some of the R scripts.
Variable List:
- code: The 3-4 letter species code used throughout these scripts and files to indicate each species.
- commonName: The common name of the species used in the manuscript.
- scientificName: The scientific name of the species used in the manuscript.
- tsn: The ITIS Taxonomic Serial Number (TSN) for the species, if one exists.
- type: The type of forest damage caused by these insects, either Bark Beetle or defoliator.
Folder: Defoliation
Folder: 25km_GT2Years_diff
Files in this folder are named _.csv, following the naming conventions described above. Each file contains forest damage data specific to that country and species combination, aggregated to 25 km2 grid cells and then differenced across years. Additionally, we have only included grid cells that have more than 2 years of damage in these files and in our analyses.
The script named "2_usa_defoliation_agg_diff_dist.R" produced the US files. Raw US data were extracted from the US Forest Service Forest Health Protection National Insect and Disease Detection Survey (https://www.fs.usda.gov/foresthealth/applied-sciences/mapping-reporting/detection-surveys.shtml).
The script named "1_canada_defoliation_diff_dist.R" created these files. Raw data for Canada were derived from the National Forest Pest Strategy Information System, National Forestry Database. Note that in addition to the above processing, we dropped years 1997-1998 as they were inconsistently collected across Canada.
Variable List:
- SPECIES - Species code. Same as the filename and constant within each file. See species_codes.csv for info.
- SOURCE - "CA" or "USA" for data from Canada or the United States, respectively.
- X - X-coordinate, projection provided below in spatial information.
- Y - Y-coordinate, projection provided below in spatial information.
- DEF - # of years the cell had >0 defoliation during the study period.
- Remaining - Columns titled by year, which vary depending on species: Year-to-year change in count of defoliated cells within the 25km grid cell.
Spatial information (Canada):
Projected Coordinate System: Canada Albers Equal Area Conic
Projection: Albers
WKID: 102001
Authority: ESRI
Linear Unit: Meters (1.0)
False Easting: 0.0
False Northing: 0.0
Central Meridian: -96.0
Standard Parallel 1: 50.0
Standard Parallel 2: 70.0
Latitude of Origin: 40.0
Spatial information (USA):
XY_Coordinate_System: NAD_1983_Albers
Linear_Unit: Meter (1.000000)
Angular_Unit: Degree (0.0174532925199433)
False_Easting: 0
False_Northing: 0
Central_Meridian: -96
Standard_Parallel_1: 20
Standard_Parallel_2: 60
Latitude_Of_Origin: 40
Datum: D_North_American_1983 \
Folder: synchrony_matrices
For each species and country, these files contain the synchrony matrices (correlation through time among rows) for the files contained in the files in ./data/defoliation/25km_GT2Years_diff.
Rows and columns here do not have names, but they both correspond to the rows in order contained within the corresponding defoliation files. If those raw data files are changed, these matrices will need to be recalculated.
Values are Spearman's rank correlation coefficients.
Files follow the normal naming convention for country and species, and were calculated in scripts "1_canada_defoliation_diff_dist.R" or "2_usa_defoliation_agg_diff_dist.R" for each respective country.
Folder: Distance Matrices
For each species and country, these files contain the distance matrices representing the distance in km between center points of each cell in the corresponding file in ./data/defoliation/25km_GT2Years_diff.
Rows and columns here do not have names, but they both correspond to the rows in order contained within the corresponding defoliation files. If those raw data files are changed, these matrices will need to be recalculated.
Values are distances in kilometers.
Files follow the normal naming convention for country and species, and were calculated in scripts "1_canada_defoliation_diff_dist.R" or "2_usa_defoliation_agg_diff_dist.R" for each respective country.
Folder: Weather
Raw ANUSPLIN data are available at ftp site below, or by running script 0 in this package. ftp://ftp.nrcan.gc.ca/pub/outgoing/North_America_Historical_Grids/geotif/
These data are not included in the package because they are large files. If you run that script, data will first be downloaded to the folder ./data/weather/raw_download/, and then unpacked into folders by year in the folder ./data/weather/raw/. Those folders contain metadata for the weather files.
These data are used to extract data for our analysis. The raw extracted data is available in the folder ./data/weather/25km.
Folder: 25km
This folder contains subfolders named in the typical _ manner.
Each folder contains weather data extracted specifically for the grid cells and years in the defoliation data in ./data/defoliation/25km_GT2Years_diff. We saved these as the extraction process is time-consuming; these data are rearranged in the next "PCA" subfolder for analysis.
Files in those folders are named .csv, where weather variables are:
- maxt: Mean monthly maximum temperature in degrees Celsius
- mint: Mean monthly minimum temperature in degrees Celsius
- pcp: Monthly total precipitation in millimeters
Variable List:
- SPECIES - Species code. Same as the filename and constant within each file. See species_codes.csv for info.
- SOURCE - "CA" or "USA" for the initial defoliation data from Canada or the United States, respectively.
- X - X-coordinate, projection provided below in spatial information.
- Y - Y-coordinate, projection provided below in spatial information.
- dist - If present, the tree damage cell fell outside the range of the weather data layer. The value is the distance in meters from the damage cell to the nearest weather cell, from which these corresponding weather values were extracted and used in analysis.
- Remaining - Columns titled by year, which vary depending on species: Year-to-year values for the weather variable and month indicated in the filename. Units for precipitation variables are mm, and temperature variables are degrees Celsius.
Spatial information (Canada):
Projected Coordinate System: Canada Albers Equal Area Conic
Projection: Albers
WKID: 102001
Authority: ESRI
Linear Unit: Meters (1.0)
False Easting: 0.0
False Northing: 0.0
Central Meridian: -96.0
Standard Parallel 1: 50.0
Standard Parallel 2: 70.0
Latitude of Origin: 40.0
Spatial information (USA):
XY_Coordinate_System: NAD_1983_Albers
Linear_Unit: Meter (1.000000)
Angular_Unit: Degree (0.0174532925199433)
False_Easting: 0
False_Northing: 0
Central_Meridian: -96
Standard_Parallel_1: 20
Standard_Parallel_2: 60
Latitude_Of_Origin: 40
Datum: D_North_American_1983
Folder: PCA
This folder contains the extracted weather information above, rearranged into one file per country and species combination, named in a similar manner to other files.
Variable List:
- SOURCE - "CA" or "USA" for the initial defoliation data from Canada or the United States, respectively.
- X - X-coordinate, projection provided below in spatial information.
- Y - Y-coordinate, projection provided below in spatial information.
- YEAR - Calendar year of the data contained in this row.
- Remaining - 36 more columns titled by , as described for the file names above. E.g., maxtJan. These contain the same weather data present in the 25km2 folder, just aggregated and rearranged for analysis. Units for precipitation are mm. Units for temperatures are degrees Celsius.
Spatial information (Canada):
Projected Coordinate System: Canada Albers Equal Area Conic
Projection: Albers
WKID: 102001
Authority: ESRI
Linear Unit: Meters (1.0)
False Easting: 0.0
False Northing: 0.0
Central Meridian: -96.0
Standard Parallel 1: 50.0
Standard Parallel 2: 70.0
Latitude of Origin: 40.0
Spatial information (USA):
XY_Coordinate_System: NAD_1983_Albers
Linear_Unit: Meter (1.000000)
Angular_Unit: Degree (0.0174532925199433)
False_Easting: 0
False_Northing: 0
Central_Meridian: -96
Standard_Parallel_1: 20
Standard_Parallel_2: 60
Latitude_Of_Origin: 40
Datum: D_North_American_1983
Folder: synchrony_matrices
To reduce the dimensionality of weather data, we performed a PCA on 36 weather variables (months x precipitation, tmax, tmin) amongst locations that experienced defoliation by the species in question. These files contain the synchrony matrices for the scores of the resulting principal components (PCs) #1-3, which explained most variance in weather across our study.
Files are named for .csv, where x is in 1-3 and represents the principal component examined in the file.
Rows and columns here do not have names, but they both correspond to the rows in order contained within the corresponding defoliation files. If those raw data files are changed, these matrices will need to be recalculated.
Values are Spearman's rank correlation coefficients. See the script titled "4_weather_PCA.R" for exact methods and descriptions of the calculations.
Folder: Results
Folder: sncf
This folder contains results of the sncf performed with measurements recorded at fixed distances. There is one csv file per country and species combination. These files were generated by the script "5_defoliation_sncf.R".
Variable List:
- dist: Distance in kilometers.
- sync: Level of synchrony at this distance.
- lower_CI: Lower bound of the synchrony confidence interval at this distance.
- upper_CI: Upper bound of the synchrony confidence interval at this distance.
Folder: PCA
Results from the PCA analysis, which are results on their own and also as input into sncf models and the multiple matrix regression. See script "4_weather_PCA.R".
File: weather_pc_variance.csv
The proportion of weather variance explained by each PC for each species and country combination.
Variable List:
- file: File name corresponding to this country/species combination.
- source: "usa" or "ca" for the United States and Canada, respectively. The country the information in this row corresponds to.
- species: Species for the information in this row. Codes follow those in the table above.
- PC1: Proportion of variance in weather explained by PC1.
- PC2: Proportion of variance in weather explained by PC2.
- PC3: Proportion of variance in weather explained by PC3.
Folder: mrm_diff
Differenced PCA scores results saved as csv files, with years as rows and sites as columns, prepared for the main regression analyses. Files are named <source>__PC.csv, where source is CA or USA for Canada and the United States, respectively. Species codes are listed below. indicates the principal component information contained in the file. Scores were differenced across years at each site.
- YEAR - Calendar year of the data contained in this row.
- Remaining columns: Column names indicating the grid cell for the data contained in this column. Columns start with the country code, followed by the X and Y coordinates in the pattern .
Spatial information (Canada):
Projected Coordinate System: Canada Albers Equal Area Conic
Projection: Albers
WKID: 102001
Authority: ESRI
Linear Unit: Meters (1.0)
False Easting: 0.0
False Northing: 0.0
Central Meridian: -96.0
Standard Parallel 1: 50.0
Standard Parallel 2: 70.0
Latitude of Origin: 40.0
Spatial information (USA):
XY_Coordinate_System: NAD_1983_Albers
Linear_Unit: Meter (1.000000)
Angular_Unit: Degree (0.0174532925199433)
False_Easting: 0
False_Northing: 0
Central_Meridian: -96
Standard_Parallel_1: 20
Standard_Parallel_2: 60
Latitude_Of_Origin: 40
Datum: D_North_American_1983
Folder: sncf_diff
Differenced PCA scores results saved as csv files, with sites as rows and scores as columns, formatted. Files are named <source>__PC.csv, where source is CA or USA for Canada and the United States, respectively. Species codes are listed below. indicates the principal component information contained in the file. Scores were differenced across years at each site.
Variable List:
- SOURCE - "CA" or "USA" for the initial defoliation data from Canada or the United States, respectively.
- X - X-coordinate, projection provided below in spatial information.
- Y - Y-coordinate, projection provided below in spatial information.
- Remaining columns: Years for the PCA data contained within this column.
Folder: MRM
MRM analyses were run on insect damage data derived from the US Forest Service Forest Health Protection National Insect and Disease Detection Survey or the National Forest Pest Strategy Information System (Canada) and weather data derived from ANUSPLIN, as defined in the Readme, scripts, and metadata within this repository.
These analyses were conducted using a large computer cluster (see script 9), which created many small text output files. These were aggregated by script 10 into these two CSV files.
File: mrm_p_values.csv
P-values from the matrix regression analysis. See the R-package mms for more information.
Variable List:
- country: "usa" or "ca" for the United States and Canada, respectively. The country the information in this row corresponds to.
- species: Species for the information in this row. Codes follow those in the table above.
- prox.p: Significance test for proximity (i.e., space) in each model.
- PC1.p: Significance test for PC1 of weather in each model.
- PC2.p: Significance test for PC2 of weather in each model.
- PC3.p: Significance test for PC3 of weather in each model.
File: mrm_weights.csv
Model weight information from the matrix regression analysis. See the R-package mms for more information.
Variable List:
- species: Species for the information in this row. Codes follow those in the table above.
- country: "usa" or "ca" for the United States and Canada, respectively. The country the information in this row corresponds to.
- model: The model for the information in each row, "S" for spatial, "E" for environmental, or "C" for combined (both spatial and environmental variables included).
- freq.top: The number of resamplings (of the
r nrandperformed) on which each model was ranked the best (i.e., had the lowest out-of-sample error rate). The fraction of resamplings for which a model is ranked highest is a number between 0 and 1 and can be interpreted as a model importance weight, similar to an AIC weight [@walter_2017]. The fraction is interpretable as the degree of support of the data for each model. - num.pos: The number of possible leave-n-outs that could have been performed across all resamplings
- num.att: The number of leave-n-outs that were attempted, less than the previous column because
maxrunswasr maxruns, less thanr dim(mats[[1]])[1]chooser n. Regressions with rank deficiency problems do not yield well-determined regression coefficients and cannot be used for out-of-sample predictions. - num.rnk: The number of leave-n-outs attempted for which rank deficiency problems did not occur.
- num.usd: The number of leave-n-outs that succeeded, providing an estimate of out-of-sample forecast accuracy. This can be less than
num.rnkdue to technicalities on how resampling and leave-n-out interact in the algorithm when the same site is selected more than once in the resampling.
- Haynes, Kyle J.; Pureswaran, Deepa S.; Allstadt, Andrew J. et al. (2025). Meteorological versus spatial drivers of the spatial synchrony of forest insect pest outbreaks in North America. Oikos. https://doi.org/10.1002/oik.11833
