Skip to main content

Russian olive distribution and invasion dynamics along the Powder River, Montana and Wyoming, USA

Cite this dataset

Courtney, Karissa et al. (2024). Russian olive distribution and invasion dynamics along the Powder River, Montana and Wyoming, USA [Dataset]. Dryad.


The invasive shrub, Russian olive (Elaeagnus augustifolia), is widely established within riparian areas across the western United States (U.S.). Limited information on its distribution and invasion dynamics in northern regions has hampered understanding and management efforts. Given this lack of spatial and ecological information we worked with local stakeholders and developed two main objectives: 1) map the distribution of Russian olive along the Powder River (Montana and Wyoming, U.S.) with field data and remote sensing; and 2) relate that distribution to environmental variables to understand its habitat suitability and community/invasion dynamics. In the study watershed, field data showed Russian olive has reached near equal canopy cover (18.3%) to native plains cottonwood (Populus deltoides; 19.1%), with higher cover closer to the channel and over a broader range of elevations. At the basin scale, we modeled Russian olive distribution using field surveys, ocular sampling of aerial imagery, and spectral variables from Sentinel-2 MultiSpectral Instrument using a random forest model. A statistical model linking the resulting Russian olive percent cover detection map (RMSE = 15.42, R2 = 0.64) to environmental variables for the entire watershed indicated Russian olive cover increased with flow accumulation and groundwater depth, decreased with elevation, and was associated with poorer soil types. We attribute the success of Russian olive to its broad habitat suitability combined with changing hydrologic conditions favoring it over natives. This study provides a repeatable Russian olive detection methodology due to the use of Sentinel-2 imagery that is available worldwide, and provides insight into its ecological relationships and success with relevance for management across areas with similar environmental conditions.

README: Modeling the distribution of Russian olive and invasion dynamics: a case study from the Powder River, USA

Four datasets are included here. The first is field sampled plot data from the Powder River, used as training data in the Russian olive detection model. Second, is the ocular sampling data also included as training data in the detection model. Third, is the raster output of the final detection model for Russian olive in the study area. And finally, there is a shapefile of the field plot cover data with physiographic and grouping analysis in the attribute table (NAD_1983_UTM_Zone_13N).

Description of the data and file structure

The field data has its own metadata sheet that has a table describing each species name abbreviation. Plots were placed on transects that ran perpendicular to the river to capture topographic gradient and temporal shifts of river over time. Plots were 10 m radius; cover was estimated (ocular) for all woody species for the entire plot. Height of the tallest woody species in each plot (sometimes two species) was estimated using a clinometer or measured with a survey rod. Notes: 3RB, Tran 5, plots 10 and 11 do not exist. This dataset has species heights and percent cover, which could be used in a number of ways.

The ocular sampling data includes 6 rows in the Excel file. FID and PLOT_ID are identifiers, X and Y are the latitude and longitude of each point, F_RO is the percent Russian olive within that plot, and F_OTHER is the percent of anything else within that plot (so the two together equal 100%).

The raster output of the detection model is in .tif format and is viewable within GIS software. This could be used to look at a subset of the study area, or included in other analyses as its own layer.

The shapefile of field plot cover data provides georeferenced points that correspond to the field data excel sheet. These points are mappable and could be used in a number of ways for various GIS projects where locations of specific species are needed. The shapefile contains all of the same information as the excel sheet, but also includes the Canopy Height Model (CHM), Topographic Position Index (TPI), and LIDAR elevation of each point.


Model Training Data

To predict Russian olive percent cover across the Powder River Basin, we created a spectral detection model for the year 2020. The model was trained using two different data collection methods: (1) field data and (2) ocular samples from NAIP 2019 aerial imagery. Field data were collected in June 2021 (Figure 1A). Ten meter radius plots were placed on transects (25 on the east bank and 17 on the west bank) perpendicular to the river and about 50 m apart, for a total of 276 plots (Figure 1A). Within each plot, vegetation cover was estimated for each woody species, including Russian olive, plains cottonwood (Populus deltoides), and tamarisk (Tamarix ramosissima), and height of the tallest woody plant was measured using a survey rod or clinometer.  Of the 276 field data plots, 185 contained Russian olive.

To increase the dataset size and spatial representation, we conducted randomized ocular image sampling using NAIP 2019 true and false color imagery following a similar sampling procedure as described in (Woodward et al. 2018ab). NAIP 2019 false color imagery was referenced to help with species classification. We used Google Earth Engine (GEE) to collect 10-meter radial plots, matching the size of the field plots. We visually determined the percentage of Russian olive coverage present on a scale of 0-100 %, with 0 % being no Russian olive present and 100 % being full Russian olive cover, within each 10-meter radial plot (Figure 2). Prior to making formal observations, all five observers went through a calibration process to train and reduce bias. Due to the rarity of Russian olive in our random sample, we also opportunistically collected 478 additional plots with Russian olive. Most opportunistic aerial imagery ocular sampling points fell along the Powder River between Clear Creek and Crazy Woman Creek in Wyoming. 

 In preliminary model runs, low to moderate Russian olive cover was unrealistically predicted in cropland areas, such as areas of Barley (Categorization Code “21”), Winter Wheat (“24”), Alfalfa (“36”), and Other Hay (“37”), so we created a simple mask to remove most crops from the final analysis. The mask was created using land cover classifications from the 2020 USDA National Agricultural Statistics Service Cropland Data Layer (NASSCD; 2021). Land cover types where Russian olive is known to occur such as Shrubland (NASSCD attribute code “152”),  Grassland/Pasture (“176”), Woody Wetlands (“190”), and Herbaceous Woodlands (“195”)  were retained. Table S1.2 contains a detailed list of land cover types that were not masked from the final model. All NASSCD agricultural land cover types from 0-61, 66-77, and 204-254 were excluded from the Russian olive model. The mask was also used to remove the ocular samples to build the model on sampled points that did not fall within agricultural areas. We built our model on 2,160 points (1,407 random ocular samples, 477 opportunistic ocular samples, 276 field samples), 595 of which had Russian olive present (419 ocular samples).

Random Forest Model

We created a mosaic of 2020 imagery from Copernicus Sentinel-2 MultiSpectral Instrument Level-1C data to cover the Powder River Basin study area, obtained in GEE. We filtered images for those with low cloud cover (<20-30 %), then created a median composite image for each relevant season – spring (2020-04-01 to 2020-05-15), summer (2020-05-16 to 2020-07-31), and fall (2020-08-01 to 2020-09-30) – to account for seasonal phenological variation (Gorelick et al. 2017). Spectral bands and vegetation indices were derived from the images, which included a Normalized Difference Vegetation Index (NDVI), Normalized Difference Moisture Index (NDMI), Normalized Burn Ratio (NBR), Simple Ratio (SR), Tasseled Cap transformation, and others (Table S1.1). The resulting Tasseled Cap brightness, greenness, and wetness (BGW) indices, named for the features they emphasize, improve vegetation classifications because they are sensitive to phenological changes (Crist and Cicone 1984). We also differenced indices between summer and spring and summer and fall to capture seasonal variation of different species to aid Russian olive detection (Evangelista et al. 2009).

We modeled Russian olive percent cover and evaluated predictor variable performance using the ‘randomForest’ package in RStudio (Liaw et al. 2002) using spectral bands and vegetation indices from GEE. Our independent variable was Russian olive percent cover and all 61 predictor variables are identified in Table S1.1. The number of trees (ntree) and number of variables randomly sampled at each split (mtry) were set to 1,000 and 3, respectively (Liaw et al. 2002). We valued a model with fewer predictor variables and removed predictors that did not improve the model to achieve better model performance and greater interpretability (Evans et al. 2010). We first ran a model using all predictor variables to evaluate initial out-of-bag model performance using the R2 value and root mean squared error (RMSE), a standard measure of the magnitude of model error. We then evaluated correlations between variables, removing one variable from pairs correlated by greater than 0.7 (Dormann et al. 2013), leaving us with 18 variables after the initial run. With the remaining 18 uncorrelated variables, Wwe ran 1276 additional models using backwards selection to remove the one or two variables with the lowest variable importance as measured by the increase in mean squared error. Variables with partial dependence plots that suggested the variable contributed to over-fitting or had a weak relationship were removed (Friedman 2001). The greater the R2 value and the smaller the RMSE, the better the model performed. The final model had six variables. Finally, we summarized the random forest model results by 5 km hexagon to show trends of Russian olive cover across the study area.

Species Composition and Russian Olive Habitat Suitability at the Watershed Scale 

Plot data collected along transects perpendicular to the Powder River channel allowed for additional insights when paired with lidar and topographic data, particularly because a robust suite of woody species was recorded in addition to Russian olive. Topographic position index (TPI) was derived from 2016 lidar data (Ackerman 2016). TPI is a measure of position by comparing elevation at a given point to the mean elevation in a surrounding window (Weiss 2001). In this case, a 100 cell (100 m) radius was used and can be interpreted as position relative to the detrended channel. We also derived a Canopy Height Model (CHM), calculated as the digital terrain model minus the digital surface model, calculated in We extracted mean TPI, CHM, and distance from channel centerline to field plots to investigate how these varied by species by considering variable distributions (i.e., boxpolots and basic statistical moments and distributions) by dominant plot species. Additionally, we used the complete suite of species and cover data in a k-means grouping analysis that included distance from the channel in ArcGIS Map. The k-means grouping method is an unsupervised classification method where every point is assigned a group based on their similarity (Davies and Bouldin, 1979). The Pseudo F-Statistic was used to determine how many groups to include in the final analysis. This allowed inference regarding the spatial relationships among Russian olive, cottonwood, and tamarisk, which was not possible in the watershed-scale modeling. 

Previous work (Nagler et al. 2011) describes factors at the continental to reach scale known to influence Russian olive distribution. Robust species cover data for an entire watershed is rare. As such, here we have a unique opportunity to bridge the reach and continental scales (Nagler et al. 2011). At watershed scales, surface and groundwater flow conditions, and their regulation, are known to influence native and invasive riparian species distributions across North America (McShane et al. 2015). Surface flows have declined through time in the Powder River, and the aquifer has experienced increased drawdown. Nagler et al. (2011) additionally outlines soil type as a likely influencer on Russian olive distribution, although Russian olive studies are lacking compared to better-studied invasives, like tamarisk. In contrast, at continental scales, habitat suitability models include factors such as mean annual minimum temperature (Friedman et al. 2005) and distance to surface water (Jarnevich et al., 2011; Perry et al., 2022). In the context of urban locations within Russian olive’s native range in Iran, Karimian and Farashi (2021) found climatic factors, soil, and lithology to be important in predicting habitat suitability. 

Here, we analyzed Russian olive percent cover as a function of a robust suite of variables expected to influence Russian olive distribution based on literature expectations. Datasets included annual minimum, mean, and maximum temperature (PRISM), slope-weighted solar radiation (PRISM), elevation (10 m), precipitation-weighted flow accumulation (precipitation from PRISM; PRISM Climate Group 2004), dominant land cover class (Perry et al. 2018; Dewitz 2021), dominant soil type (Nagler et al. 2011, 2018; we used standard U.S. soil taxonomy (Web Soil Survey;  produced by the National Cooperative Soil Survey), and depth to groundwater (Lopez-Iglesias et al. 2014). Correlations among datasets were calculated in R (4.0.3) with the Raster package (Hijmans 2021). Based on Spearman ranked correlations, variables were included in candidate linear models predicting Russian olive percent cover. We included combinations of additive linear models (lm function in R), excluding variables correlated with one another > 0.7 (see Data Availability for link to data  Dryad repository for correlations and model development). We included 62 candidate models (Table S3.2) and selected models based on AIC (Akaike, 1974) corrected for small sample size (AICc; Mazerolle, 2015). Models were considered plausible if their AICc value compared to the lowest AICc model (ΔAICc) was less than 3 (Richards, 2005). Model results are presented using the effects package (Fox & Weisberg, 2018) where a computed effect absorbs the lower-order terms marginal to the term in question, and averages over other terms in the model.


National Aeronautics and Space Administration, Award: NNL16AA05C

United States Geological Survey, Award: G21AC10021