Data from: Disease from leaves to landscapes: Viral hotspots are determined by spatial arrangement and phytochemistry of host plants in specialist caterpillars
Data files
Jan 30, 2025 version files 820.79 KB
-
draws_s.csv
578.03 KB
-
global_data_exp.csv
126.71 KB
-
global_data_merge.csv
7.18 KB
-
June_gps.csv
4.61 KB
-
mrm_big_df.csv
32.21 KB
-
my_data_m.csv
644 B
-
my_data.csv
732 B
-
README.md
12.51 KB
-
site_dailies.csv
50.68 KB
-
site_normals.csv
2.13 KB
-
vir_jun_fcp.csv
761 B
-
vir_jun_mmp.csv
356 B
-
vir_jun_vc.csv
1.01 KB
-
vir_jun_vmc.csv
736 B
-
vir_sep_fcp.csv
678 B
-
vir_sep_mmp.csv
604 B
-
vir_sep_vc.csv
692 B
-
vir_sep_vmc.csv
538 B
Abstract
Although infectious diseases play a critical role in population regulation, the drivers of disease prevalence in insects have most often been investigated in isolation, thus our knowledge of disease dynamics for insects is severely limited. We conducted a field study on Baltimore checkerspots (Euphydryas phaeton) to investigate the roles of host plants, phytochemistry, ontogeny, and larval spatial associations in determining viral prevalence at the landscape scale. We analyzed individuals for viral presence and loads, and quantified secondary phytochemistry from their native and novel host plants. We found caterpillar groups that were more proximate had greater similarity in infection prevalence, with areas of high prevalence indicating the presence of viral hotspots. Post-diapause caterpillars had higher infection rates than other developmental stages. We found that infection prevalence was linked to phytochemistry for both plants; infection prevalence was similar on both host plants at low phytochemical concentrations but diverged at high concentrations with the native plant showing low prevalence and the exotic plant hosting high prevalence. Altogether, our findings reveal that spatial proximity, ontogeny, larval host plant species, and phytochemistry are important in structuring infection risk and thus offer insight into causal drivers of disease prevalence in complex plant-insect systems.
README: Disease from leaves to landscapes: Viral hotspots are determined by spatial arrangement and phytochemistry of host plants in specialist caterpillars
https://doi.org/10.5061/dryad.qnk98sfrf
Description of the data and file structure
Project description: This database contains R scripts and data files for the project including data analysis for all reported results, and figure
creation using ggplot2. This project involved two main analytical approaches: the first used multiple regression on distance matrices
to examine the relative impacts of spatial distances and differences in underlying host plant chemistry to predict
viral prevalence in the larvae of Baltimore checkerspot butterflies. The second involved using natural history traits of the caterpillars
and their host plants to predict viral prevalence and viral loads. Additionally, there are two analyses included in the SI for the article, one where there is a site-level MRM conducted at each site, and another SEM that included climate variables.
The .R files contain the scripts and the .csv files are the data. In the data files, missing data mentioned as NA.
The scripts are as follows:
"models_resubmission.R" contains scripts for all Bayesian models using rjags including the analysis of spatial and chemical distances on viral prevalence, and structural equation models for viral prevalence and viral loads.
Data* *used in "models_resubmission.R":
"mrm_big_df.csv" contains data for the multiple regression on a distance matrix (MRM), and for figures 3b-c
- "gps" are the standardized pair-wise spaital distances (meters) for every given plot combination at each site
- "vir" are the standardized pair-wise distances in viral prevalence (proportion infected/uninfected) for every given plot combination at each site
- "chem" are the standardized pair-wise distances in plant IC concentration (% dry weight) for every given plot combination at each site
- "site" is the site code: FC = MA1, MM = MA2, VC = VT1, VM = VT2
- "month" is the month in which sampling took place (i.e. caterpillar life stage where Jun = 6th instar, Aug/Sep = 4th instar)
- "site_num" is the dummy-coded value for site in alphabetical order
- "month_num" is the dummy coded value for month in chronological order (Jun = 1, Sep/Aug = 2)
"global_data_exp.csv" contains data on individuals that was used to collapse to family group-level averages for the SEM
- "site" is the plot ID in this case, which contains a site code and numeric plot ID
- "population" is the site, FC = MA1, MM = MA2, VC = VT1, VM = VT2, last letter (C or P) indicates the host plant used at that site
- "date" is the date sampling took place
- "individual" is the individual caterpillar ID code
- "lifestage" indicates whether the caterpillars were 4th or 6th instar
- "dela.ct.log" is the log-transformed viral load (relative amount of viralDNA/ insect DNA)
- "host.plant.bin" is binary indicator of host plant (0=native (C. glabra), 1=novel(P.lanceolata)).
- "outcome.bin" is binary indicator of whether individual tested positive for JcDV (0=negative, 1=positive).
- "loc" site code without the plant
- "dens" is the number of individual caterpillars counted within a plot
- "IG" is the total concentration of sequestered iridoid glycosides in the caterpillars from that plot (%dry weight)
- "plt_IG" is the total concentration of iridoid glycosides from plants sampled in the plot (%dry weight)
- "dist_to_cent" is the distance from plot to centroid of all plots (meters)
- "percent_cover" is the percent cover of the host plant in the plot
- "plt_AUC" is the concentration of aucubin (one type of IG) in the plants sampled for the plot (%dry weight)
- "plt_CAT" is the concentration of catalpol (one type of IG) in the plants sampled for the plot (%dry weight)
"figures_resubmission.R" contains scripts for creating all figures made in R (manuscript figures 2-5)
Data used in "figures_resubmission":
"June_gps.csv" contains location information for sites and plots plots sampled during June of 2021, used for creating figure 1.
- "Number" refers to the point ID assigned by the GPS device.
- "Name" is the plot name designated by the researchers. It consists of a site code and numeric value for each plot within a site.
- "GeometryType" geometry of the location, in this case points
- "Latitude" is the latitude of the plot
- "Longitude" is the longitude of the plot
- "Altitude" is the elevation of the plot
"my_data.csv" contains plot-level data summaries at site VT1 for making figure 2a
- "plot" is the plot ID
- "vl" is the relative viral load, log-transformed (relative amount of viralDNA/ insect DNA)
- "freq" is the viral prevalence (frequency of infected/non-infected individuals)
- "latitude" is the latitude of the plot
- "longitude" is the longitude of the plot
- "plant_IG" is the average percent dry weight of iridoid glycosides from the plot
- "dens" is the total number of caterpillars counted within the plot
"my_data_m.csv" contains plot-level data summaries at site MA2 for making figure 2a
- "plot" is the plot ID
- "vl" is the relative viral load (relative amount of viralDNA/ insect DNA), log-transformed
- "freq" is the viral prevalence (frequency of infected/non-infected individuals)
- "latitude" is the latitude of the plot
- "longitude" is the longitude of the plot
- "plant_IG" is the average percent dry weight of iridoid glycosides from the plot (%dry weight)
- "dens" is the total number of caterpillars counted within the plot
"draws_s.csv" contains samples from the posterior distribution from the bayesian MRM model, which were used to create figure 3a
- "beta1" are posterior samples for parameter "spatial distance"
- "beta2" are posterior samples for parameter "chemical distance"
- "beta3" are posterior samples for parameter "chemical distance"
- "u[1]" samples for site intercept for site MA1
- "u[2]" samples for site intercept for site MA2
- "u[3]" samples for site intercept for site VT1
- "u[4]" samples for site intercept for site VT4
- ".chain" refers to the chain number in the sampling by JAGS
- ".iteration" refers to the iteration number in the sampling process
- ".draw" is the unique index of a sample across all chains
"global_data_merge.csv" contains family-group level data for figure making for Figure 4c-f
- "site" is actually the plot ID in this case, which contains a site code and numeric plot ID
- "loc" is in fact the site location, FC = MA1, MM = MA2, VC = VT1, VM = VT2
- "lifestage" indicates whether the caterpillars were 4th or 6th instar
- "dens" number of caterpillars counted in the plot
- "IG" is the average sequestered IG for caterpillars tested from the plot (% dry weight)
- "plt_IG" is the average total IG concentration contained in the plant leaves sampled from the plot (% dry weight)
- "plt_AUC" is the concentration of aucubin, one of the IG compounds (% dry weight)
- "plt_CAT" is the concentration of catalpol, one of the IG compounds (% dry weight)
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed for caterpillars in the plot
- "percent" is the frequency of infected individuals positive in the family group
"individual_MRMs.R" contains script for the site-level multiple regression on distance matrices analyses, which are detailed in the SI.
Data used in "individual*MRMs.R": All data files for this analysis are titled in the following format "vir_month_site.csv", **meaning it contains virus info for a particular sampling period at a particular sit*e.
"vir_jun_fcp.csv" contains plot location, viral load, viral prevalence, and plant IG information for site FCP in June
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_jun_mmp.csv" contains plot location, viral load, viral prevalence, and plant IG information for site MMP in June
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_jun_vc.csv" contains plot location, viral load, viral prevalence, and plant IG information for site VC in June
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_jun_vmc.csv" contains plot location, viral load, viral prevalence, and plant IG information for site VMC in June
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_sep_fcp.csv" contains plot location, viral load, viral prevalence, and plant IG information for site FCP in Aug/Sep
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_sep_mmp.csv" contains plot location, viral load, viral prevalence, and plant IG information for site MMP in Aug/Sep
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_sep_vc.csv" contains plot location, viral load, viral prevalence, and plant IG information for site VC in Aug/Sep
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"vir_sep_vmc.csv" contains plot location, viral load, viral prevalence, and plant IG information for site VMC in Aug/Sep
- "site" is the plot ID
- "long" is the longitude of the plot
- "lat" is the latitude of the plot
- "vl" is the average viral load (relative amount of viralDNA/ insect DNA), log-transformed, among caterpillars in the plot
- freq" is the viral prevalence, or frequency of infected/non-infected individuals
"supp_climate*_*mod.R" contains script for the additional analysis including climate variables, detailed in the SI.
Data used in "supp_climate_mod.R": contains "global_data_merge.csv" which is detailed above, as well as two types of climate data from each site.
"site_dailies.csv" contains daily climate data for all sites for the full year prior to the last date of sampling. Data are derived from the PRISM climate group.
- "date" is the date in year-month-day format
- "precip" is the amount of precipitation for that day in inches
- "tmin" is the minimum temperature recorded that day, in Fahrenheit
- "tmean" is the mean temperature recorded that day, in Fahrenheit
- "tmax" is the maximum temperature recorded that day, in Fahrenheit
- "site" is the site code
"site_normals.csv" contains daily climate data for all sites for the full year prior to the last date of sampling. Data are derived from the PRISM climate group.
- "date" is the date in year-month-day format
- "precip" is the amount of precipitation for that day in inches
- "tmin" is the minimum temperature recorded that day, in Fahrenheit
- "tmean" is the mean temperature recorded that day, in Fahrenheit
- "tmax" is the maximum temperature recorded that day, in Fahrenheit
- "solar" is the average solar energy for the month, in MJ/m^2/day
- "site" is the site code