Canopy insect communities are shaped by the genes and phenotypes of their aspen hosts

Morrow, Clay 1 ; Lind-Riehl, Jennifer 1 ; Cole, Christopher T.1 ; Rubert-Nason, Kennedy1; Lindroth, Richard 1 ; Ané, Cécile 1

Published Jul 03, 2025 on Dryad. https://doi.org/10.5061/dryad.98sf7m0v8

Data files

Jul 03, 2025 version files 728 MB

code-and-data.zip

727.99 MB
README.md

14.87 KB

Abstract

We quantified heritability for 13 tree traits, including phenology, defense chemistry, reproduction, and morphology, and for 18 associated insect species. We performed genomic association analyses to identify genetic links to heritable aspen (Populus tremuloides) traits and insect community structure. By linking intraspecific variation to community composition and structure through probable genomic mechanisms, this data set helps demonstrate the salience of the genes-to-ecosystems paradigm in plant-insect systems.

Clay Morrow

Introduction
Files

Introduction

This file documents the data and code for the community genetics project
conducted on a common garden of aspen (Populus tremuloides). The
nested headings match the nested directory structure of this repository.

We recommend that interested users start with
code/full-wisasp-community-analysis.Rmd.

Files

README

This readme file. The .Rmd file is an Rmarkdown file used to create the
.md file.

code/

This folder contains all the code for this project.

full-wisasp-community-analysis.Rmd

This file contains the full analysis for our paper, including R code,
commentary on the process, and table and figure generation.

The code uses relative paths and should be run from the root directory.

2020-GWA-analyses.R

This file contains the R code for summarizing the GWA output (see
GWA-pipeline-CHTC). It is used by
code/full-wisasp-community-analysis.Rmd.

nmds-function.R

This file contains a custom R function to perform the community NMDS
analysis. It is used by code/full-wisasp-community-analysis.Rmd.

GWA-pipeline-CHTC

insect-and-trait-gwa.R

This is an R script for conducting GWA analysis. From a command (bash)
shell, execute
Rscript code/GWA-pipeline-CHTC/insect-and-trait-gwa.R --help for a
list of options. This script is meant to run in batches, and is called
by the process manager (e.g., Condor via the GWA .sub files).

model-selection-script.R

This R script is for performing model selection on all of the GWA
models. This file is called by the model-running shell script
code/GWA-pipeline-CHTC/run-model-selection.sh, coordinated by the
process manager (e.g., Condor via the model selection .sub files).

HTCondor submission files (.sub)

These files contain code to specify jobs via
HTCondor, conducted through University of
Wisconsin-Madison’s High Throughput Computing program center
(CHTC).

build-R.sub: This file tells the CHTC build server to build the
portable R installation that can be deployed on each server/core to
run the analyses.
GWA submission files: These files (trait-gwa-submission.sub,
MDS-gwa.sub, InsTrait-gwa.sub, gwa-submission.sub, diversity-gwa.sub)
tell the server to perform the GWA analyses on each of the traits.
Different files analyze different groups of response variables.

Shell scripts (.sh)

These are shell scripts, used by the CHTC server to perform the
analyses. The primary of these are build-R-portable.sh which builds the
portable R installation and run-gwa-rscript.sh which is the interface
through which CHTC executes the GWA r script (insect-and-trait-gwa.R).

data/

csv files

These comma-delimited files contain data of various forms:

additive-SNP-data.csv: contains all of the single nucleotide
polymorphism (SNP) data for the the common garden, encoded as the
number of minor alleles available at a base-pair location (column) for
a given genet (row).
phenos-and-covars.csv: is the main data file and contains the
observations on each aspen tree, including measurements of tree ID and
source information (e.g., Unique.ID-Genet, Latitude-Sex.TEMP),
insect and disease counts (Venturia, Harmandia-Unidentified),
observer (Observer.initials_uncorrected), grouped insect metrics
(all.Insects_cnt-endo_sp.cnt), traits (e.g., SLA-Age),
sampling effort (e.g., Min.Per.Tree), weather
(high.temp_F-fog.event), and tree-level diversity
(richness-MDS4). A full description of each variable (including
units) can be found in the file phenos-and-covars-metadata.csv. Note
that insect column names sometimes use common names or shorthand, so
it is also important to use the insect-col-metadata.csv file as a
cross reference (i.e., by “R.column”) for scientific names. Most
columns of phenos-and-covars.csv were not used in the final analysis
for our manuscript.
All-GWA_SNP-annotations.csv: a table of all the SNPs and what was
known about them at the time including base.pair number, the gene
name within which the SNP is located and descriptions of the genes.
Since chromosomal information was not well known at the time of this
study the alignment scaffold is also provided.
insect-col-metadata.csv: a table containing information for each
of the insect taxa identified (to the smallest possible taxonomic
unit). The key columns are the taxonomic classifications
(Order:Species). The R.Column field is important for matching
insect taxa to their column names within the phenos-and-covars.csv
file.
selected-insect-models_AIC.csv and
selected-insect-models_AIC.csv: tables that show which variables
are included in the best (by AIC or BIC) models for each insect
response. variables are included in them.
trait-gwa-models-table.csv: contains information for fitting
responses in mixed models including the formula and model family.

R data objects (.RData, .rda)

These files contain data objects in a format to be loaded with the R
programming language. It is not recommended to use these objects
directly, as the important elements of each are accessed and utilized by
the analysis scripts. However, they are documented here for
completeness.

(GROUP)-mods (rda): fitted heritability random effects models.
Each of these .rda files contains an R object that is a list of fitted
models of the class lmerMod or glmerMod (from lme4 package).
(GROUP) represents the variable grouping for which bootsraps were
calculated. There are separate files for abundance, diversity metrics
(div), insect incidence (insect), NMDS axes (mds), and tree traits
(trait). Additional models include insect models with the inclusion of
tree trait covariates (insect_trait) and different versions of the
trait models where separate models were fit for each time step
(temporal-trait) either with or without tree age as a covariate
(trait-mods-noage). For example, insect-abundance-mods_H2.rda contains
fitted models of environmental sources of variation in insect
abundance for all 18 common insects (by common name):

# load the fitted models for insect abundance
load("data/insect-abundance-mods_H2.rda")
# list the names of each sub-object
names(abun.mods)

##  [1] "Petiole.Gall"          "Harmandia"             "Leaf.Edge.Mine"       
##  [4] "Casebearer.Moth"       "Cottonwood.Leaf.Mine"  "Lombardy.Mine"        
##  [7] "Blotch.Mine"           "Weevil.Mine"           "Blackmine"            
## [10] "Phyllocolpa"           "Smokey.Aphids"         "Green.Aphids"         
## [13] "Leafhoppers"           "Cotton.Scale"          "Pale.Green.Notodontid"
## [16] "Green.Sawfly"          "Aspen.Leaf.Beetle"     "Ants"

# Summary of model results for Harmandia spp.
summary(abun.mods$Harmandia)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Harmandia ~ (1 | Block) + (1 | Dist.Edge) + (1 | Survey.Year) +  
##     (1 | Survey.Month) + (1 | Genet)
##    Data: df
## 
## REML criterion at convergence: 17593.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.5536 -0.3717 -0.2102 -0.0182 19.0794 
## 
## Random effects:
##  Groups       Name        Variance Std.Dev.
##  Genet        (Intercept) 0.070164 0.26488 
##  Dist.Edge    (Intercept) 0.000000 0.00000 
##  Block        (Intercept) 0.002244 0.04738 
##  Survey.Month (Intercept) 0.021659 0.14717 
##  Survey.Year  (Intercept) 0.010392 0.10194 
##  Residual                 0.916837 0.95752 
## Number of obs: 6272, groups:  
## Genet, 492; Dist.Edge, 7; Block, 4; Survey.Month, 2; Survey.Year, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 0.001046   0.129975   0.008
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

(GROUP)_H2s (rda): heritability bootstrap results. Each of these
.rda files contains an R object that is a list of results objects of
class bootMer (lme4) corresponding to a variable. They contain the
bootstrapped heritability estimates. (GROUP) represents the variable
grouping for which bootsraps were calculated. There are separate files
for abundance, diversity metrics (div), insect incidence (insect),
NMDS axes (mds), and tree traits (trait). For example,
abundance-H2s.rda contains bootsraps of heritability (calculated from
models contained within insect-abundance-mods_H2.rda) for all 18
common insects (by common name):

# load R object
load("data/abundance-H2s.rda")
# list the names of each sub-object
names(abun.H2boot)

##  [1] "Petiole.Gall"          "Harmandia"             "Leaf.Edge.Mine"       
##  [4] "Casebearer.Moth"       "Cottonwood.Leaf.Mine"  "Lombardy.Mine"        
##  [7] "Blotch.Mine"           "Weevil.Mine"           "Blackmine"            
## [10] "Phyllocolpa"           "Smokey.Aphids"         "Green.Aphids"         
## [13] "Leafhoppers"           "Cotton.Scale"          "Pale.Green.Notodontid"
## [16] "Green.Sawfly"          "Aspen.Leaf.Beetle"     "Ants"

# get summary statistics for the bootstrapped heritability estimates (Harmandia spp.)
summary(abun.H2boot$Harmandia$t)

##        h2         
##  Min.   :0.04675  
##  1st Qu.:0.06210  
##  Median :0.06845  
##  Mean   :0.06894  
##  3rd Qu.:0.07503  
##  Max.   :0.09890

mds-output.RData: community NMDS results. This contains an R list
object whose elements are:
- data: a nested list of the data objects used to fit the NMDS. Its
  sub elements are groups, com.dat, and grp.dat each of which
  contain elements for the four time points and contain group
  (genotype) labels for each point, tree-level community data
  (densities), and genotype-level community data (average densities),
  respectively.
- D.mats: a nested list of community distances at the individual
  (comm.D) and genotype (grp.D) level. Both comm.D and grp.D
  are further nested by time point.
- MDS: a nested list of MDS results at the individual (comm.mds)
  and genotype (grp.mds) levels. Both comm.D and grp.D are
  further nested by tme point. Each of these elements contain the
  results of an NMDS and are of class metaMDS (vegan). For
  example, here is a quick oridnation plot of the individual-level
  communities in June 2016:

load("data/mds-output.RData")
vegan::ordiplot(mds.obs$MDS$comm.mds$jun16)

variable-groups.RData, a collection of objects that group the
variables in the main data file (phenos-and-covars.csv) according to
their type. For example, a list of the tree traits that were used for
this analysis are included in the tree.traits.of.interest object:

load("data/variable-groups.RData")
print(tree.traits.of.interest)

##  [1] "Hobs"         "Age"          "BA.2012sqrt"  "Sex.TEMP"     "SLA"         
##  [6] "ALA"          "CT"           "PG"           "Npct"         "CN"          
## [11] "BAsqrt"       "Vol"          "EFNMean"      "BBreakDegDay" "Flprev"

r-objects.RData: a collection of data objects (e.g., insect
surveys, insect meta data, weather, etc.). These are simply subsets of
the phenos-and-covars.csv data file, with updated formatting. for
example, insect.surveys contains only variables collected during the
insect surveys:

load("data/r-objects.RData")
tibble::tibble(insect.surveys) |> head()

## # A tibble: 6 × 184
##   Unique.ID survey.event Tree.ID SerialNo Date       Observer.initials Block
##   <fct>     <fct>        <fct>   <fct>    <date>     <fct>             <fct>
## 1 1_aug16   aug16        1_A_1   1        2016-08-02 CM                1    
## 2 1_aug17   aug17        1_A_1   1        2017-07-28 CC                1    
## 3 1_jun16   jun16        1_A_1   1        2016-06-23 CM                1    
## 4 1_jun17   jun17        1_A_1   1        2017-06-21 NP                1    
## 5 10_aug16  aug16        1_A_10  10       2016-08-02 CM                1    
## 6 10_aug17  aug17        1_A_10  10       2017-07-28 CC                1    
## # ℹ 177 more variables: Row <fct>, Position <fct>, Genet <fct>, Venturia <fct>,
## #   Min.per.Tree <int>, Harmandia <int>, Phyllocolpa <int>,
## #   Phyllocolpa.larvae.present <int>, Petiole.Gall <int>, Leaf.Edge.Mine <int>,
## #   Blotch.Mine <int>, Lombardy.Mine <int>, Sawfly.Mine <int>,
## #   Weevil.Mine <int>, Leaf.Vein.Mine <int>, Blackmine <int>,
## #   Cottonwood.Leaf.Mine <int>, Leaf.Rolls <int>, Leaf.Ties <int>,
## #   Oblique.Banded.Leafroller <int>, Poplar.Spotted.Leafroller <int>, …

text files (.txt)

These files contain names of certain variable groups and are used by the
Condor submission files.

master-gwa-results/

This folder contains subfolders for each of the GWA analyses conducted.
Each subfolder contains the individual results files. There is also the
GWA-results_2020.RData R object, which contains all of the cleaned and
summarized results.

figures/

This folder holds only a sub-directory with 2 figures used by the
analysis document output (code/full-wisasp-community-analysis.Rmd).

tar files (.tar.gz)

These compressed files are used by the CHTC job scheduler to perform the
batched GWA analyses.

Not included

Many of the scripts contained within code/GWA-pipeline-CHTC/ require a
file called “compiled-R.tar.gz”. This is a self-contained version of R
compiled for the servers on which Condor was utilized. This compiled R
version is not included but was created with the
code/GWA-pipeline-CHTC/build-R.sub Condor submission file, which was
executed on a Condor build server and calls the
code/GWA-pipeline-CHTC/build-R-portable.sh shell file. Together, these
scripts install and build R (and packages required for the analyses)
from R-3.6.0.tar.gz. This R source tarball is also not included here,
but can be downloaded from https://cran.r-project.org/src/base/R-3/.