Data from: Climate and regional plant richness drive diet specialization in butterfly caterpillars
Data files
Apr 25, 2026 version files 3.33 GB
-
Gross_et_al_Dryad.zip
3.33 GB
-
README.md
13.17 KB
Abstract
Studies of coevolution, ecosystem processes, and latitudinal diversity gradients are improved by understanding variation in resource specialization. Insect herbivory is one of the most ubiquitous terrestrial ecological associations, and is important for understanding the evolution of both plants and insects, yet the processes underlying global variation in diet breadth remain poorly understood. Here, we use global datasets of butterfly and plant distributions to investigate the patterns and drivers of butterfly larval diet breadth. Diet breadth showed a negative relationship with plant family richness, but this was offset by a direct effect of temperature acting in the opposite direction. Islands generally harbor species with broader diets, but islands with higher endemism had narrower diets than average. Our study provides a global baseline for understanding how plant and herbivore interactions structure ecological communities in the face of global environmental changes.
Description of the data and file structure (Gross_et_al_Dryad.zip)
The dataset consists of all input data required to execute the analyses included under the section "Code/Software".
1. all_lep_traits.csv
Dataframe covering traits for 10,372 species of butterflies worldwide. This file can be viewed and manipulated in Microsoft Excel or R.
- genus: Butterfly genus
- family: Butterfly family (Papilionidae, Hedylidae, Hesperiidae, Pieridae, Nymphalidae, Riodinidae, or Lycaneidae)
- n_hosts_2: Number of hostplant families observed to be fed upon by a given butterfly species, restricted to pairwise interactions between plant families and butterfly species for at least 2 records exist.
- n_hosts_10: Number of hostplant families observed to be fed upon by a given butterfly species, restricted to pairwise interactions between plant families and butterfly species for at least 10 records exist.
- strict_pd_2: Phylogenetic diversity (Faith's PD - sum of phylogeny branch lengths in millions of years) of hostplants, restricted to pairwise interactions between plant families and butterfly species for at least 2 records exist.
- strict_pd_10: Phylogenetic diversity (Faith's PD - sum of phylogeny branch lengths in millions of years) of hostplants, restricted to pairwise interactions between plant families and butterfly species for at least 10 records exist.
- strict_ses_mpd: Hostplant phylogenetic dispersion measured as the standard effect size of the mean pairwise distance (in millions of years) between families
- strict_ses_mntd: Hostplant phylogenetic dispersion measured as the standard effect size of the mean nearest taxon distance (in millions of years) among host families
- range_type: Is the species an island endemic (1) or not (0)?
- range_area: Total range polygon area measured in km^2
- centroid_latitude: Latitude of the range polygon centroid
- columns 12-30: Mean values of the 19 bioclimatic variables (Fick and Hijmans 2017) averaged across the range of each butterfly species.
- Ann.Mean.Temp: BIO1, annual mean temperature (ºC)
- Mean.Diur.T.Range: BIO2, mean diurnal temperature range (mean of monthly (maximum temperature - minimum temperature), ºC)
- Isothermality: BIO3, mean diurnal temperature range/temperature annual range (unitless)
- Temp.Seas: BIO4, temperature seasonality (standard deviation of
- temperature x 100, unitless)
- Max.Temp.Warmest: BIO5, Maximum temperature of the warmest month (ºC)
- Max.Temp.Coldest: BIO6, Maximum temperature of the coldest month (ºC)
- Temp.Ann.Range: BIO7, Temperature annual range (maximum temperature of the warmest month - maximum temperature of the coldest month, ºC)
- Mean.Temp.Wettest: BIO8, Mean temperature of the wettest quarter (ºC)
- Mean.Temp.Driest: BIO9, Mean temperature of the driest quarter (ºC)
- Mean.Temp.Warmest: BIO10, Mean temperature of the warmest quarter (ºC)
- Mean.Temp.Coldest: BIO11, Mean temperature of the coldest quarter (ºC)
- Ann.Prec: BIO12, Annual precipitation (mm)
- Prec.Wet.Month: BIO13, Precipitation of the wettest month (mm)
- Prec.Dri.Month: BIO14, Precipitation of the driest month (mm)
- Prec.Seas: BIO15, Precipitation seasonality (coefficient of variation, unitless)
- Prec.Wet.Quart: BIO16, precipitation of the wettest quarter (mm)
- Prec.Dri.Quart: BIO17, precipitation of the driest quarter (mm)
- Prec.Warmest: BIO18, precipitation of the warmest quarter (mm)
- Prec.Coldest: BIO19, precipitation of the coldest quarter (mm)
- plant_sp_richness: Mean species richness of plants across all 100 x 100 km^2 grid cells in each range polygon.
- plant_fam_richness: Mean family richness of plants across all 100 x 100 km^2 grid cells in each range polygon.
- host_richness: Mean number of species in the hostplant family of a given butterfly, averaged across all 100 x 100 km^2 grid cells in its range polygon
- elevation: Mean elevation (in m) across the range of each butterfly species, from Dubayah et al. (2021)
- NPP: Mean net primary productivity (kg C m^-2) across the range of each butterfly species, from Running et al. (2021)
- columns 34-36: Mean percent vegetation cover across the range of each butterfly species, from Townsend and DiMiceli (2015)
- n_fam_in_range: Total number of plant families summed across the range of each butterfly species
2. comm_sparse_matrix.rds
Sparse grid-cell-by-species matrix cataloguing the presence of butterfly species in 100 x 100 km2 grid cells worldwide. This file can be viewed, manipulated, and analyzed in R.
3. distinct_comm.csv
Data frame of climate and mean trait values for each of 13,316 100 x 100 km2 grid cells worldwide. Variable names identical to those in all_lep_traits.csv represent individual grid-cell-level centroid values of climate, vegetation, elevation, and latitude. This file can be viewed and manipulated in Microsoft Excel or R. The following variables are not represented in all_lep_traits.csv:
- site: The unique name of the grid cell
- alpha_n_2: Host specificity - the shape parameter (alpha) of a discrete truncated Pareto power distribution of diet breadths in an assemblage, restricted to pairwise interactions between plant families and butterfly species for at least 2 records exist.
- avg_pd_2: Mean host family phylogenetic diversity (Faith's PD) across all butterfly species present in the cell, restricted to pairwise interactions between plant families and butterfly species for at least 2 records exist.
- alpha_n_10: Host specificity - the shape parameter (alpha) of a discrete truncated Pareto power distribution of diet breadths in an assemblage, restricted to pairwise interactions between plant families and butterfly species for at least 10 records exist.
- avg_pd_10: Mean host family phylogenetic diversity (Faith's PD) across all butterfly species present in the cell, restricted to pairwise interactions between plant families and butterfly species for at least 10 records exist.
- avg_ssmpd: Mean host family phylogenetic dispersion measured as the standard effect size of the mean pairwise distance between families, averaged across all butterfly species present in the cell
- avg_ssmntd: Mean host family phylogenetic dispersion measured as the standard effect size of the mean nearest taxon distance among host families, averaged across all butterfly species present in the cell
- lep_richness: Total number of butterfly species present in the grid cell
- lep_mpd: Mean pairwise phylogenetic distance among butterfly species present in the grid cell
4. full_grids_100km
Shapefile (.shp) for 100-km^2 grids covering the globe including the
main file .shp and companion files: .dbf, .prj, .sbx, .shx.
Description of these file extensions is given as follows:
.shp: The main geospatial data file that contains feature geometry.
.dbf: The dBASE that contains the attributes of features.
.prj: The file that contains the coordinate system and map projection
information.
.shx: The file containing the index of feature geometry.
The main .shp file can be opened and analyzed in R, and many
other programming languages, and open-source geospatial software such as
QGIS, SAGA GIS, GRASS GIS, GeoDa, etc.
5. Hostplant_families_aug.csv
Dataframe of hostplant family records for butterflies in our geographic dataset. This file can be viewed and manipulated in Microsoft Excel or R.
- Tree_label: Tip label in the original time-calibrated phylogeny from Kawahara et al. (2023), if applicable, otherwise NA
- Lep_family: Butterfly family name (Papilionidae, Hedylidae, Hesperiidae, Pieridae, Nymphalidae, Riodinidae, or Lycaneidae)
- Lep_accepted_name: Butterfly species
- Host_order: Hostplant order name
- Host_family: Hostplant family name
- Count_of_Lep_accepted_name: Number of records for which the given butterfly-hostplant pair is confirmed.
6. phylogenies
6.1. double_family_tree.tre
Family-level tree of seedplants derived from Smith and Brown's (2018) phylogeny, obtained from the V.Phylomaker2 package (Jin & Qian 2022). Each family in the tree has two tips representing the two most distantly-related species within a family sharing a common ancestor. This phylogeny was used to obtain hostplant PD, SESMPD, and SESMNTD values for caterpillars documented to feed on just one hostplant family. The phylogeny can be viewed and analyzed in R using the ape
package or specialized phylogenetic tree manipulation software such as
FigTree.
6.2. family_tree.tre
Family-level tree of seedplants derived from Smith and Brown's (2018) phylogeny, obtained from the V.Phylomaker2 package (Jin & Qian 2022). Each family is represented by a single tip in the tree. This phylogeny was used to obtain hostplant PD, SESMPD, and SESMNTD values for caterpillars documented to feed on more than one hostplant family. The phylogeny can be viewed and analyzed in R using the ape package or specialized phylogenetic tree manipulation software such as
FigTree.
6.3. full_phylogeny_100trees.tre
100 species-level phylogenetic trees generated using the Kawahara et al. (2023) backbone phylogeny and the additional species in our dataset using Jin and Qian's (2022) Scenario 2. This is a multiphylo object that can be viewed and analyzed in R using the phytools package or specialized phylogenetic tree manipulation software such as FigTree.
6.4. test_tree.tre
A single tree with 50 species randomly selected from a tree in the full_phylogeny_100trees.tre multiphylo object, suitable for testing on a personal computer.
7. plant_family_comm.RDS
Sparse grid-cell-by-species matrix cataloguing the presence of plant families in 100 x 100 km2 grid cells worldwide. This file can be viewed, manipulated, and analyzed in R.
8. polygons
Geopackages of range polygons for all butterflies included in our analyses, created by Daru (2024b). Polygons can be viewed and analyzed in
R using the terra package, many other programming languages, and open-source geospatial software such as QGIS, SAGA GIS, GRASS GIS, GeoDa, etc.
9. wrld_simpl
Shapefile of national borders circa 2010 including the
main file .shp and companion files: .dbf, .prj, .sbx, .shx as described above. The shapefile (.shp) can be viewed and analyzed in R using the terra package, many other programming languages, and open-source geospatial software such as QGIS, SAGA GIS, GRASS GIS, GeoDa, etc.
Sharing/Access information
Data was derived from the following sources:
- Climate, elevation, and climate change .tif files are derived from
Fick & Hijmans (2017), Dubayah et al. (2021), and Sandel et al.
(2011) - The initial butterfly phylogeny backbone is derived from Kawahara et
al. (2023) - Butterfly range polygons were created by Daru (2024b)
- Plant distribution data are derived from Daru (2024a)
- The family-level plant phylogeny was obtained from Jin & Qian (2022), and ultimately based on a phylogeny of seed plants from Smith & Brown (2018) and Zanne et al.’s (2014) phylogeny of pteridophytes.
- Larval hostplant data is derived from Kawahara et al. (2023), Leptraits v.1.0 (Shirey et al. 2022), and the NHM HOSTS database (Robinson et al. 2023)
- The
wrld_simplshapefile is available from the R package
maptools(Bivand & Lewin-Koh 2023)
Code/Software
Code is written in R v.4.4.1 (1_Latitude_modeling.R, 4_Island_mainland_assemblages.R) and R v. 4.2.0 (2_Assemblage_SEM.R, 3_Species_SEM.R). R can be downloaded from https://www.r-project.org/ by selecting a local CRAN mirror and clicking the download link for your operating system. Install time is typically under 10 minutes on a personal computer.
The code and output plots can be displayed and manipulated in the RStudio GUI, which is available for download at https://posit.co/download/rstudio-desktop/. Installation time is typically under 10 minutes on a personal computer.
2.1. 1_Latitude_modeling.R
This script models diet breadth and plant family richness as a function of latitude and maps four metrics of diet breadth across the globe.
2.2. 2_Assemblage_SEM.R
This script selects predictors of assemblage-level average diet breadth and models diet breadth as a function of climate, vegetation, and plant diversity in structural equation models. It also plots the interactive effects of annual mean temperature and precipitation seasonality on plant family richness, net primary productivity, and percent tree cover.
2.3. 3_Species_SEM.R
This script selects predictors of species-level diet breadth and models diet breadth as a function of mean species-range-wide climate, vegetation, and plant diversity in structural equation models, including phylogenetic generalized least squares models with diet breadth as a response.
2.4. 4_Island_mainland_assemblages.R
This script tests for differences between island and mainland assemblages in diet breadth, and examines how the proportion of island-endemic species affects the diet breadth of island assemblages.
Butterfly and plant distribution polygons were modeled as detailed in Daru 2024a and Daru 2024b (linked datasets). Butterfly hostplant associations were compiled from data in LepTraits v1.0 (Shirey et al. 2022), the Natural History Museum HOSTS dataset (Robinson et al. 2023), and Kawahara et al. (2023).
