Data from: Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data

Goodsell, Robert 1 ; Comont, David2 ; Hicks, Helen3 ; Lambert, James1 ; Hull, Richard2 ; Crook, Laura2 ; Fraccaro, Paolo4 ; Reusch, Katharina4 ; Freckleton, Robert1 ; Childs, Dylan1

Published Nov 28, 2023 on Dryad. https://doi.org/10.5061/dryad.9cnp5hqn5

Data files

Nov 28, 2023 version files 11.06 GB

data_and_code.zip
11.06 GB
README.md
6.19 KB

Abstract

A key challenge in the management of populations is to quantify the impact of interven-tions in the face of environmental and phenotypic variability. However, accurate estima-tion of the effects of management and environment, in large-scale ecological research is often limited by the expense of data collection, the inherent trade-off between quality and quantity, and missing data.

In this paper we develop a novel modelling framework, and demographically informed imputation scheme, to comprehensively account for the uncertainty generated by miss-ing population, management, and herbicide resistance data. Using this framework and a large dataset (178 sites over 3 years) on the densities of a destructive arable weed (Alo-pecurus myosuroides) we investigate the effects of environment, management, and evolved herbicide resistance, on weed population dynamics.

In this study we quantify the marginal effects of a suite of common management prac-tices, including cropping, cultivation, and herbicide pressure, and evolved herbicide re-sistance, on weed population dynamics.

Using this framework, we provide the first empirically backed demonstration that herbi-cide resistance is a key driver of population dynamics in arable weeds at regional scales. Whilst cultivation type had minimal impact on weed density, crop rotation, and earlier cultivation and drill dates consistently reduced infestation severity.

Synthesis and applications: As we demonstrate that high herbicide resistance levels can produce extremely severe weed infestations, monitoring of herbicide resistance is a pri-ority for famers across western Europe. Furthermore, developing non chemical control methods is essential to control current weed populations, and prevent further resistance evolution. We recommend that planning interventions that center on crop rotation and incorporate spring sewing and cultivation to provide the best reductions in weed densi-ties. More generally, by directly accounting for missing data our framework permits the analysis of management practices with data that would otherwise be severely compro-mised.

https://doi.org/10.5061/dryad.9cnp5hqn5

Contained are the datasets and code required to replicate the analyses in Goodsell et al (2023), Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data.

Description of the data and file structure

Data: Contains data required to run all stages in the analysis.

Many files contain the same variable names, important variables have been described in the first object they appear in.

all_imputation_data.rds - The data required to run the imputation scheme, this is an R list containing the following:

$Management - data frame containing missing and observed values for management imputation

FF & FFY: the specific field, and field year.

year: the year.

crop: crop

cult_cat : cultivation category

a_gly: number of autumn (post September 1st) glyphosate applications

spray_days_total: total number of herbicide applications of all modes of action

spray days_gw: number of grass weed specific herbicide applications

d_date: drill date (in year-days)

d_year: drill year (in year-days)

c_date: cultivation date (in year-days)\
\
h_date: harvest date (in year-days)

soil_group: soil category\
\
X&Y: spatial locations of each field.

drop_years: years to drop from the analysis.

\
$Resistance data frame containing missing and observed values for resistance imputation:

FF: field ID\
mean_mort: average mortality

$raw_ds: data frame with raw density state data containing only observed values

QID: quadrat ID

x & y: quadrat location with field

DensityState: density state

$unobs_ds: data frame with missing and observed density state data

is_obs: index of whether density state was observed or not.

$init_ds : data frame with density state data with missing values filled with random draws. Used in the initialisation of the imputation.

all_imptation_index.rds: an index of missing fields required for imputation of missing density state data.\
complete_cases.rds: a data frame containing only the complete cases of management variables.

rotation_freqs.rds: an object containing frequencies of different rotations used to produce supplementary figures.

Within imputation results are the extracted results from the cross-validation procedure, as well as the multiply-imputed data and models fit to multiply-imputed data (suffixed _DATA and _MODS respectively).

_DATA objects are R lists, containing:\
$conv - an index of imputation run number and mean density state by field to check for convergence.

$management_data - imputed management data by imputation run.

$cv - cross validation scores for each imputation run.

$ ds_data - imputed density state data for each run.

$ all_data - imputed management and density state data. This is the most important data required for the subsequent analysis and contains many of the variables contained in all_imputation_data. This contains variables suffixed by the year of transition (_t1 / _t2). Variables not indexed. New and or derived variables include:

d_diff - difference between drill dates between seasonal medians (spring & summer - given in julian days)
transition_year - factor variable for the years in which density state measurements were taken (e.g. 2015-2016)
rotation - sequence of crops between years in which measurements were taken.

cv_results.rds: a list containing log-loss scores by hold-out fold and imputation for each model.

Within simulation results are the results required to reproduce the sensitivity analysis (of weed responses to management) included in the manuscript. \
\
raw_coeffs_all_management.rds - contained are the raw (i.e. mean) coefficients for the model including all management variables. \
\
simulated_coeffs_all_management.rds the re-simulated (using MAP posterior simulation implemented in ‘mgcv’) coefficients to account for sampling uncertainty.

simRes_all_management.rds - the simulation results for plotting figures.\
\
coef_summary_all_management.rds - a summary of coefficients used for plotting.

Validation results contains the results (imputed models and data) of the validation exercise for each management / resistance variable, where 20% of each variable was amputed then imputed to assess imputation performance.

Figures: Output folder for main and supplementary figures.

Sharing/Access information

Management, resistance and weed density data were all collected from a network of UK farms. The Blackgrass network studied was established with funding from the Biotechnology and Biological Sciences Research Council (BBSRC; BB/L001489/1) and the Agriculture and Horticulture Development Board (AHDB), and is currently supported through the BBSRC “Growing Health” (BB/X010953/1) Institute Strategic Programme.

Code/Software

R: contains all R scripts required to replicate the analysis and is divided into two folders, imputation and simulation.

Imputation contains scripts to check the cross validation results (check_cv.R), an example of how to run the imputation (example_imputation.R) as the full exercise took several hundred compute hours on a HPC cluster we did not include the full workflow. Functions to run the imputation are contained in the script imputation_helpers.R. validate_imputation.R and validation_helpers.R contain the code required to replicate the validation exercise. The scripts variable_relationships.R and missingness_patterns plot figures in the main text and supplementary.

Simulation contains the scripts to replicate the sensitivity analysis. simulate_coefficients.R re-simulates coefficients to account for sampling uncertainty, simulate_steps_rs.R then takes these simulated coefficients and runs the population dynamic model (two-step projection) to assess weed sensitivity to management. plot_step_sim.R plots the results and simulation_functions.R contains the functions required to run the simulation.