Data and code from: Pest host expansion as a scale-free stepwise process across the host phylogeny
Data files
Oct 03, 2025 version files 169.87 KB
-
N_American_pests_native_and_nonnative.csv
9.07 KB
-
NAHostPestGenus.csv
157.84 KB
-
README.md
2.95 KB
Abstract
This dataset includes data and code for estimation and analysis of the phylogenetic constraint of pests and pathogens on North American tree genera. It contains two data .csv files with information about known pest host genera and pest nativity to North America originating from Wang et al. 2022, and four R files that download necessary files, run simulations and machine learning models, and plot figures, which should be run in order. These data and files are published with a CC0 license and are available for reuse.
Description of the data and file structure
This code reproduces analyses performed in Pest Host Expansion as a Scale-Free Stepwise Process Across the Host Phylogeny (Kruger et al. 2025, DOI: 10.1098/rspb.2025.1793), utilizing data from Wang et al. (https://doi.org/10.1111/1365-2745.13995). The code simulates host ranges of pests under different patterns of phylogenetic constraint, as described by an exponential parameter that controls a power-law function. The code estimates parameters for known host ranges of pests in North America, compares between types of pests, and estimates risk to plant genera upon further host range expansion.
Files and variables
1_filesetup.R downloads files from Wang et al. 2022 and modifies them for use in the following code. This code is optional to run, as the generated files are also included directly in this dataset.
- "N_American_pests_native_and_nonnative.csv" is in wide format and contains four columns: Pest Species (pest species binomials), Pest Type (insect or pathogen), Nativity (Native or Non-native), and Nhosts (number of observed host genera in the dataset).
- "NAHostPestGenus.csv" is in long format and contains three columns: pest (pest species binomials), acceptedHost (plant genera that the pest uses as a host), and hostFamily (the plant family of the acceptedHost).
2_ParamEstimation.R performs simulations and machine learning to estimate parameters from pest host range data, and additionally compares parameters between pest types. This code requires "N_American_pests_native_and_nonnative.csv" and "NAHostPestGenus.csv", which are both included in this dataset and available for download from the original source using 1_filesetup.R.
3_RiskEstimates.R estimates plant risk from estimated parameters and produces Figures 2–5. This code requires files generated by 2_ParamEstimation.R.
4_AlternativeSimulations.R performs the same code as 2_ParamEstimation.R, now using only known hosts to begin each simulation. It additionally plots Supplementary Figures 2 and 3.
plantphylo.txt is a phylogeny of plant genera constructed in V.PhyloMaker (Jin & Qian 2019).
Code/software
Analyses were performed in R version 4.3.2. Code should be run in numerical order.
Package kitchen can be installed from github with devtools: devtools::install_github("https://github.com/avery-kruger/kitchen").
Packages used are as follows:
- ape (5.7-1)
- cli (3.6.2)
- kitchen (0.1.0)
- phytools (2.1-1)
- readxl (1.4.3)
- tidyverse (2.0.0)
- viridis (0.6.4)
Access information
Data were derived from Wang et al. 2022 (https://doi.org/10.5063/F14M930J), which was primarily based on the data from Potter et al. 2019 (https://doi.org/10.3390/f10040304).
