Ecological network structure in response to community assembly processes over evolutionary time

Graham, Natalie 1 ; Krehenwinkel, Henrik2; Lim, Jun Ying3; Staniczenko, Phillip4; Callaghan, Jackson5; Andersen, Jeremy5; Gruner, Dan6; Gillespie, Rosemary1

Published Nov 22, 2022 on Dryad. https://doi.org/10.6078/D1DX4T

Abstract

The dynamical structure of ecological communities results from interactions among taxa that change with shifts in species composition in space and time. However, our ability to study the interplay of ecological and evolutionary processes on community assembly remains relatively unexplored due to the difficulty of measuring community structure over long temporal scales. Here, we made use of a geological chronosequence across the Hawaiian Islands, representing 50 years to 4.15 million years of ecosystem development, to sample 11 communities of arthropods and their associated plant taxa using semi-quantitative DNA metabarcoding. We then examined how ecological communities changed with community age by calculating quantitative network statistics for bipartite networks of arthropod-plant associations. The average number of interactions per species (linkage density), ratio of plant to arthropod species (vulnerability), and uniformity of energy flow (interaction evenness) increased significantly in concert with community age. The index of specialization H₂’ has a curvilinear relationship with community age. Our analyses suggest that younger communities are characterized by fewer but stronger interactions, while biotic associations become more even and diverse as communities mature. These shifts in structure became especially prominent on East Maui (~0.5 my) and older volcanos, after enough time had elapsed for adaptation and specialization to act on populations in situ. Such natural progression of specialization during community assembly is likely impeded by the rapid infiltration of non-native species, with special risk to younger or more recently disturbed communities that are composed of fewer specialized relationships.

Materials and Methods

Site selection methods

The Hawaiian Islands are formed as the Pacific plate moves northwestward across a stationary volcanic hotspot, therefore the archipelago represents a chronosequence of geological age from the youngest island (Hawaii, ~0–0.5 Myr), to the oldest high island of Kauai (~5 Myr). We selected 14 sites of varying geologic age, ranging from 50 to 4.15 x 10⁶ years old, across four islands of the archipelago: Hawaii, Maui, Molokai, Kauai. To control for abiotic and biotic differences the sites were constrained by elevation, precipitation, disturbance, and forest structure (lidar). 20 randomized candidate plots were generated for each site, with the intention of ultimately selecting six, 15 m-radius plots, each representing a substrate age. Substrate age was then used as a proxy for community age in the downstream ecological network analyses described below.

DNA sequencing

We collected arthropods from the understory plants in Hawaiian wet forest. We sampled each plant species in proportion to their relative abundance. We used vegetation beat sampling. Arthropods dislodged by the agitation which drop onto the beating sheet are aspirated into a vial containing 95% ethanol. We generated 2276 metabarcode amplicon libraries by amplified a primer combination (ArF1/ Fol-degen-rev) that targets a 418 bp fragment in the barcode region of the Cytochrome Oxidase I (COI) gene. Each library represents the total arthropods collected for each plant genus for each plot (a sampling event), sorted into one of four size categories (a sequencing pool). Specimens from public and private collections were also used to generate a DNA barcode reference library for 57 species. We rarified the sequence data based on expected specimen abundance from count data of individual arthropod in each library.

Bioinformatics

We generated zero-radius OTUs (zOTUs), from the rarefied raw reads with the unoise3 command following the recommended protocols in the USEARCH v11 pipeline (Edgar, 2010). The resulting zOTUs were compared against the NCBI Genbank database and our custom-made DNA reference library for Hawaiian taxa using BLASTn with a maximum of 10 target sequences. All non-arthropod zOTUs were removed after which 5,046 zOTUs remained. To remove putative pseudogenes from the zOTU dataset we ran metaMATE with default specifications and the example specifications file (Graham et al. 2021).

Taxonomic matching and abundance estimates

About a quarter of the zOTUs (n = 901) were matched to the Blast or voucher DNA reference library with less than 85 percent similarity. To validate the taxonomic identification for each zOTU at higher taxonomic levels (e.g. order, family) we compared the top 10 blast and reference library hits with phylogenetic clustering from a ML tree. Taxonomic assignment was considered trustworthy if the percent similarity of the metabarcoding sequence to the NCBI GenBank or DNA reference voucher was: between 88-94% for family, between 94%-98% for genus and greater than 98% percent similarity for species, while matches below 88% similarity were made only to order.

To create a table with OTU abundances for community analyses we mapped a query set of raw reads to the filtered and taxonomically identified search database of zOTUs in USEARCH. After OTU mapping and non-target read removal the number of OTUs was 3785.

To review, (1) for each site (community) there was six plots, (2) within each plot each plant taxon was sampled by seconds of time corresponding to its relative abundance in the plot, (3) we sorted bulk arthropod samples by size and counted the individuals, (4) sequences were generated using a DNA region with demonstrated success for Hawaiian arthropod taxa and false reads (pseudogenes) were removed, (5) we randomly sampled the sequencing reads based on the count of individuals in each size class, and finally, (6) sequence reads were summed across size classes and plots.

Calculation of quantitative ecological network metrics

Data processing and statistical analyses were performed in R version 4.0.2. To distinguish between taxa that have colonized the archipelago historically or more recently, we characterized the probable native and non-native composition for each aged community based solely on sequence characteristics using NIClassify (https://github.com/tokebe/niclassify).

We aggregated the sequence abundance for each arthropod OTU according to its association with a particular plant genus within a site. We configured the arthropod-plant abundance data as a matrix with arthropods as columns and plants as rows; there were 11 matrices, one for each site of different substrate age. We plotted the ecological network matrix for each community age using the ‘plotweb’ command in the R package bipartite.

Using the ‘networklevel’ commands in the R package bipartite (Dormann et al., 2008) we calculated six quantitative indices for our bipartite networks of arthropods and associated plants: (1) linkage density, (2) connectance, (3) generality, (4) vulnerability, (5) interaction evenness, and (6) the index of specialization H₂’(Table 1), that we reasoned would be associated with network specialization (Table 2). We converted each matrix to a binary presence-absence matrix and calculated the qualitative equivalent of: (1) linkage density, (2) connectance, (3) generality, and (4) vulnerability. We additionally calculated the ratio of resource species to consumers for the qualitative matrices, which is the ratio of plant genera to arthropod OTUs. These are fundamental food web metrics (Bersier et al., 2002) that have values that are interpretable with respect to their effect on specialization over time.

Tests of network metric significance and correlation between network properties

We used null models (Vázquez & Aizen, 2006) to test the statistical significance of empirical network metric values for the weighted data. For each weighted empirical network, we generated 1000 synthetic networks so that the total number of interactions and the identity of interaction partners is maintained while the weight associated with each interaction is shuffled (Staniczenko et al., 2013). We calculated p-values and z-scores for each combination of empirical network and metric by comparing the observed metric value calculated from the empirical network to the distribution of metric values calculated from synthetic matrices generated by the null model, i.e., the p-value quantifies how unlikely the observed, empirical metric value is to have been generated by the null model.

To compare the effect of community assembly on network size, arthropod diversity, and network metrics, we regressed the dependent variables by mean substrate age for each collection site. We tested the significance of the correlation between network size and community age, each network metric and community age, and each network metric and network size, using Spearman's correlation tests. Additionally, we fit a second-degree polynomial equation for the curvilinear relationship between the index of specialization H₂’ and community age.

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460 2461.

Graham, N. R., Gillespie, R. G., & Krehenwinkel, H. (2021). Towards Eradicating the Nuisance of Numts and Noise in Molecular Biodiversity Assessment. Molecular Ecology Resources, n/a(n/a). https://doi.org/10.1111/1755-0998.13414

Dormann, C. F., Gruber, B., & Fründ, J. (2008). Introducing the bipartite package: Analysing ecological networks. Interaction, 1(0.2413793).

Bersier, L.-F., Banašek-Richter, C., & Cattin, M.-F. (2002). Quantitative Descriptors of Food-Web Matrices. Ecology, 83(9), 2394–2407.Vázquez & Aizen, 2006

Staniczenko, P. P., Kopp, J. C., & Allesina, S. (2013). The ghost of nestedness in ecological networks. Nature communications, 4(1), 1–6.

beatingRedone.csv – Main file with sample information and abundances of taxa per plant

The necessary data to run each script are discussed at the top under "load data" and are provided as CSV or fasta files in this dryad depository.

The R scripts for species interaction networks:

HDIM_arth-plant_networks_Script_1.Rmd
HDIM arth-plant networks Script 2 null models.Rmd
HDIM arth-plant networks Script 3 correlations.Rmd

The following R markdowns were used for cleaning and analyzing the DNA metabarcoding data:

OTU_cleanup.Rmd
filterFastabyRemainingOTUs.Rmd

DNA sequence data:

ARF_arthropodASVs_aligned.fasta – 5046 arthropod zOTUs from USEARCH processing. *caution* pseudogenes and very low abundance OTUs (<5 reads) not removed
taxonomynamed.fasta – 3785 OTUs with taxonomy assigned from blast search to NCBI and HI Barcode Reference Library. Naming convention = OTU#_percent match_taxonomy to cutoff confidence
knownASVs.fasta – HI Barcode Reference Library. Includes 57 species and multiple OTUs per species (often based on geography).

HawaiianBarcodeRef_Nov2020.csv – information for barcode reference collection ids

We provide site characteristics csv files:

dimensions_plots_envData.csv
DimensionsSites.csv

The results from the poisson tree process PTP for species delimitation:

All files are named as "ptp_arthropodorder.csv" (e.g., ptp_araneae.csv)

The results from NiClassify for native and non-native species assignment

NIClassify-predictions_replacedfalseendemics.csv

We also provide the R scripts for site selection methods: We do not provide the lidar data in this repository.

find_pixels.R - function to generate a random sample of size `n' of pixels for potential use as actual
find_rng.R - function to determine the acceptable range of canopy heights
generate_plots.R - function to generate randomized plots within pre-selected site polygons
HDIM_plotselection_descriptivestats_lidar - Creating histograms and extracting summary statistics from CAO lidar data

Some additional intermediate files from DNA metabarcode data processing, along with notes about the UNIX commands, are found in the files download and described in the README

Ecological network structure in response to community assembly processes over evolutionary time

Data files

Abstract

Methods

Usage notes

Works referencing this dataset