COI metabarcoding data from arthropod pollinator communities in burned and unburned sites of California
Data files
Jan 09, 2024 version files 1.35 GB
-
fastq_gz.zip
1.35 GB
-
fire_dist.csv
257.81 KB
-
mco_97_wsites.csv
1.15 MB
-
mco_fire_all.csv
3.88 MB
-
README.md
3.26 KB
Abstract
Novel wildfire regimes are rapidly changing global ecosystems and pose significant challenges for biodiversity conservation and ecosystem management. In this study, we used DNA metabarcoding to assess the response of arthropod pollinator communities to large-scale wildfires across diverse habitat types in California. We sampled six reserves within the University of California Natural Reserve System (UCNRS), each of which was partially burned in the 2020 Lightning Complex wildfires in California. Using yellow pan traps to target pollinators, we collected arthropods from burned and unburned sites across multiple habitat types including oak woodland, redwood, scrub, chamise, grassland, forest, and serpentine habitats. We found no significant difference in alpha diversity values between burned and unburned sites; instead, seasonal variations played a significant role in arthropod community dynamics, with the emergence of plant species in Spring promoting increased pollinator richness at all sites. When comparing all sites, we found that burn status was not a significant grouping factor. Instead, compositional differences were largely explained by geographic differences, with distinct communities within each reserve. Within a geographic area, the response of arthropods to fire was dependent on habitat type. While communities in grasslands and oak woodlands exhibited recovery following burn, scrublands experienced substantial changes in community composition. Our study highlights the importance of examining community responses to wildfires across broad spatial scales and diverse habitat types. By understanding the nuanced dynamics of arthropod communities in response to fire disturbances, we can develop effective conservation strategies that promote resilience and maintain biodiversity in the face of increasing wildfire frequency and severity driven by climate change.
README: COI metabarcoding of arthropod pollinator communities in burned and unburned sites of California
https://doi.org/10.5061/dryad.fttdz090v
Description of the data and file structure
Data includes:
a) mco_fire_all.csv - Output sequences from DADA2 without any additional filtering. Columns include: the sequence (seq), the number of reads associated with the sequence (asv_size), an ASV identifier (asv), the sample in which the sequence was detected (sample), the number of reads detected for a particular sample (count), the marker (marker), year collected (year), which PCR replicate the specific sequence was from (rep), the UC reserve (reserve), the burn status (status) and the site identifier (site). Reserves are abbreviated as follows - HT: Hastings, QR: Quail Ridge, AN: Ano Nuevo, BO: Blue Oak, ML: McLaughlin and BR: Big Creek.
b) mco_97_wsites.csv - Sequence data after additional curation and clustering at 97% and the associated site data. Columns include the ASV identifier (asv), information on BLAST match (percent, Accession, genus_blast, species_blast, kingdom, phylum, subphylum, class, subclass, infraclass, cohort, order, suborder, infraorder, superfamily, family, superorder, subfamily, tribe, genus, subkingdom, parvorder, superclass, subgenus), the sequence associated with the ASV (asv_seq), the number of reads associated with the sequence across the data set (asv_size), the original file name (sample), the number of reads detected for a particular sample (count), primer used (marker), the revised sample name with PCR replicate information (sample_corr), year sample was collected (year), which PCR replicate the specific sequence was from (rep), sample information (including reserve, status, site and sample_name), the number of PCR replications that were performed (total_reps), the combined reads across replicates (rep_combined), the number of replicates an ASV was detected in (num_reps), the information on the OTU in which the ASV was clustered (otu, otu_seq, threshold), consensus species of OTU sequence (species) and information associated with the sites themselves (site_name and habitat). Availability of taxonomic data varies; BLAST matches with NA values or empty cells did not contain information for that taxonomic level.
c) fire_dist.csv - Distance between all sites, produced using QGIS. Columns include the site (InputID), the comparison site (TargetID) and the distance in kilometers (Distance).
d) fasta_gz.zip - Compressed directory of .fastq files produced following CutAdapt primer trimming, referred to in sample column of mco_fire_all.csv
Sharing/Access information
Links to other publicly accessible locations of the data:
Code/Software
Docker containers and code are available along with the associated data at the above Zenodo link.
Methods
Sampling in the University of California Reserve System (UCNRS)
The UC Natural Reserve System comprises 47,000 acres distributed across most of the major California ecosystems. Between August 15-16, 2020, an exceptionally fierce electrical storm crossed northern California, peaking at 200 lightning strikes in 30 minutes. Over the following 72–96 hours, more than 12,000 lightning strikes sparked multiple wildfires around the Greater Bay Area. Many of the smaller fires merged into four large fire complexes, including two in Monterey County. Three of these fire complexes are considered to be the second, third, and fourth largest fires in California history (CAL Fire, 2022). These fires burned eight of the UC Natural Reserve System’s 41 reserves and consumed more than 20,000 acres of reserve lands. Affected NRS reserves include Año Nuevo Island Reserve (CZU Lightning Complex), Blue Oak Ranch Reserve (SCU Lightning Complex), Landels-Hill Big Creek Reserve (Dolan Fire), Hastings Natural History Reservation (River Fire), McLaughlin Natural Reserve (LNU Lightning Complex), Point Reyes Field Station (Woodward Fire), Quail Ridge Reserve (LNU Lightning Complex), and Stebbins Cold Canyon Reserve (LNU Lightning Complex). A UC Davis property affiliated with Stebbins Cold Canyon, the Cahill Reserve, also burned extensively due to the LNU Lightning Complex. The current study sampled six burned reserves: Quail Ridge, McLaughlin, Hastings, Blue Oak, Big Creek, and Ao Nuevo.
Field collection
A joint effort between the University of California: Berkeley and the University of California: Santa Cruz was undertaken to sample plant and arthropod communities across the six burned reserves. At each reserve, burned and unburned sites were selected within eight habitat types: oak woodland, redwood, scrub, chamise, grassland, forest, and serpentine. The habitat types sampled were those that occurred at all reserves. Potential sites were selected by using aerial imagery captured both pre- and post-burn. Proposed sites were assessed on the ground by the sampling team to determine suitability and confirm burn status; four sites in Big Creek Reserve, outlined in grey, were classified as unburned although they appear in burned area data as published by CALFire. Environmental variability in habitat type was controlled as much as possible in the site selection process; for example, scrub sites were chosen only if located on S/SW facing slopes. The final sites were located at least 40 meters apart.
Arthropods were collected at each site using one yellow pan trap filled with water and soap which breaks the surface tension. The yellow pan traps were standard plastic bowls with a 7-inch diameter and 12-ounce volume. Yellow pan traps typically attract pollinators, but can attract multiple other insect types; all taxa caught in pan traps were used in the study. Fall 2020 pan traps were placed in October – December and Spring 2021 pan traps were placed in May – June. Each trap was deployed for one day and moved to whirlpaks following the sampling period. Samples were sorted and stored in ethanol at -20°C until further processing.
Molecular procedures
Samples were counted and then subdivided into hard-bodied and soft-bodied arthropods to allow varied lysing time. Once sorted, samples were stored in 95% EtOH at -20°C in racks of Qiagen collection microtubes kit (Qiagen, Hilden, Germany) totaling 96 wells. Specimens from each site were pooled in an individual microtube, with tissue filling a maximum of 50% of each well to allow adequate volume for cell lysis buffer. For sites with high abundances of taxa, multiple microtubes were filled. During the plating process, specimens were roughly sorted by size to reduce the domination of any one pool by larger specimens and to prevent amplification bias following DNA isolation.
Before DNA extraction, ethanol was removed using a BenchSmart 96 (Mettler-Toledo Rainin, Colombus, OH, USA) after spinning plates down briefly. The remaining ethanol was evaporated using a vacuum centrifuge and checked at short intervals (every 15 minutes) until no ethanol remained. DNA extractions were performed using the Qiagen PureGene extraction kit (Qiagen, Hilden, Germany). Specimens were extracted non-destructively by filling wells containing whole specimens with cell lysis buffer and proteinase K. The volume of cell lysis buffer and proteinase K were altered depending on tissue volume: 600 µl of cell lysis buffer and 3 µl of proteinase K were used for wells filled to a max volume of 50% while 300 µl of cell lysis buffer and 1.5 µl of proteinase K was used for wells filled with less tissue. Because extraction success varies based on the degree of arthropod sclerotization (Carew et al., 2018), hard-bodied insects were incubated in buffer for 24 hours while soft-bodied insects were incubated for 3 hours, both at 55°C; the incubation was shortened for soft-bodied insects to preserve morphological structures. Following incubation, all lysate was transferred to a second plate. The wells containing specimens were repeatedly filled with DI water to remove remaining lysis buffer from specimens and then filled with 70% EtOH to preserve specimens for future taxonomic work.
RNAse was added to the lysate according to the PureGene protocol and incubated at 37°C for 30 minutes to remove RNA. Plates were then placed on ice and, once cool, Protein Precipitation Solution was added. Plates were spun for 10 minutes at 2000rcf to form protein pellets, and the supernatant was transferred to isopropanol; 1 µl of glycogen was added to assist in DNA precipitation. Samples were incubated at room temperature overnight and then spun down to form DNA pellets. Isopropanol was removed and pellets were washed with freshly prepared, cold 70% ethanol. Ethanol was removed and, once the remaining ethanol evaporated, elution buffer was added; plates were incubated at 65°C for 1 hour. The resulting DNA eluate was stored at -20°C until further use.
Amplification of COI was performed using the Qiagen multiplex kit (Qiagen, Hilden, Germany). Two sets of primers were used – MCO and BF3 (Elbrecht et al., 2019; Leray et al., 2013; Yu et al., 2012); primers additionally had a 5` TruSeq tail for indexing. Amplification was performed using 96-well plates and consisted of 5 µl of the Qiagen PCR MasterMix (MM) (Qiagen, Hilden, Germany), 3 µl of H20, 0.5 µl of each primer (10mM), and 1 µl of template DNA. Three PCR replicates were performed. An annealing temperature of 46°C was used and ran for 30 cycles. A negative PCR control was included on each plate, consisting of MM, H2O, and primers. PCR products were visualized using gel electrophoresis on a 3% agarose gel run at 120V. A dual indexing strategy was implemented using a second round of PCR to attach 8bp indexes. The annealing temperature for indexing PCR was 55°C and was run for 6 cycles. PCR products were visualized once again using a 3% agarose gel and ran at a lower voltage (100V). Final libraries were constructed by pooling PCR products proportionally based upon band strength – all gel images were compared and 1, 2, or 3 µl of product was used dependent on band intensity in comparison to all others. Further processing was conducted at Berkeley QB3 Genomics (QB3, Berkeley, CA USA). Libraries were sequenced using Illumina MiSeq with v3 chemistry and paired-end sequencing; demultiplexing was performed by QB3. The library was pooled with a separately indexed library; the pooling ratios allotted 0.791 of the run for sequencing of the wildfire samples.
Bioinformatic processing
Reads were trimmed of primer adapters using CutAdapt (Martin, 2011). Reads were processed using the DADA2 denoising algorithm (Callahan et al., 2016) to form amplicon sequence variants (ASVs). The removeBimeraDenovo function included in the DADA2 package was used to remove chimeras. The BF3 primer pair produced low-quality sequences with over half being discarded through normal filtering procedures, and was not included in further analysis. The PCR negative controls were utilized for decontamination, using the package decontam (Davis et al., 2018). Following decontamination, the curation algorithm LULU (Frøslev et al., 2017) was used to remove erroneous sequences. Sequences were aligned in Geneious Prime v.2022.0.2 to detect any remaining pseudogenes, identified by stop codons or shifts in the reading frame. Following removal, BLAST was performed using megablast. A custom script in R using the package rentrez (Winter, 2017) was then used to generate full lineage information from accession numbers. Only phylum Arthropoda was retained, and sequences with below 80% percent identity matches were also removed. The final filtering step retained only ASVs found in two of three replicates. PCR replicates can allow for the detection of rare taxa that may amplify less efficiently as well as assist with detecting contaminants. By retaining ASVs that occur in at least two replicates, we were reducing the risk of erroneous sequences while additionally keeping sequences that may be from less abundant taxa or taxa that amplified less effectively. Operational taxonomic units (OTUs) were generated using otu in the package kmer (Wilkinson, S, 2018), clustered at 97% with kmer = 5. This cluster threshold was used to examine taxa at a coarser level, closer to genus than species.
Statistical analysis
The BF3/BR2 primer pair was found to generate high errors with over half of the reads associated with the primer set being discarded when using DADA2. Because of this, only amplicons produced from the MCO primer pair were used. Reads were summed for each ASV detected across replicates for a particular sample, then condensed into OTUs, and a community matrix was created. Because reads do not translate directly to abundance count, read data was transformed using Hellinger standardization; this method has been widely adopted in metabarcoding. Standardization was performed using decostand in vegan (Oksanen et al., 2009).
Quail Ridge had no unburned sites and was excluded from the analysis. The number of sites in redwood (2 burned and 1 unburned), chamise (3 burned and 2 unburned), forest (2 burned and 2 unburned), and serpentine (2 burned and 2 unburned) were low; scrub, grassland, and oak woodland habitat types were the best sampled, so the data were narrowed to include only these habitat types for richness and community composition analyses.
We used OTU richness, the Shannon index, and the Simpson index to calculate alpha diversity metrics, with the standardized read matrix as input. F-tests were performed to compare variances; the non-parametric Mann-Whitney U test was used to assess median differences between alpha diversity of burned and unburned sites within the same reserve, habitat type, and season. For compositional comparisons, only sites with ≥ 3 OTUs were used. To isolate the effect of wildfire on community differences, comparisons were made between unburned and burned sites found within the same reserve, habitat type, and the same season (Fall 2020 or Spring 2021). All analyses were performed using both incidence and Hellinger-standardized read data. Nonmetric multidimensional scaling was performed using Hellinger distance, by applying Euclidean distance to the Hellinger-standardized data; this was done using metaMDS in vegan with up to 1,000 random starts and 3 dimensions. The argument -noshare was used which creates extended dissimilarities for sites that share no species. Outlier samples with entirely distinct communities prevented convergence and were removed from NMDS, but all statistical tests were run using the data with and without the outlier data and results did not change. Permutational multivariate analysis of variation (PERMANOVA) was used to test group differences based on burn, habitat type, reserve, and year using the Hellinger distance. This included testing interactions between variables. This was done using adonis2 in vegan (Oksanen et al., 2009). Differences in group dispersions were tested using PERMDISP2 with the function betadisper. Additionally, we split the data by habitat type and performed NMDS, PERMANOVA, and PERMDISP2 for each habitat type.
To calculate compositional differences in an additional way, we used beta in the package BAT (Cardoso et al., 2015), which partitions beta diversity into differences driven by species replacement and by richness differences. We performed the non-parametric Kruskal-Wallis test to assess if burned and unburned sites had higher beta diversity values than sites within burned habitat and sites within unburned habitat. We compared differences in beta diversity against distances between sites within reserves to examine spatial autocorrelation and the key components driving dissimilarities. Spatial distance between sites was calculated using QGIS v.3.22.4 and the standard setting under distance matrix calculation. We performed linear regression to assess the association with distance. Using ANOVA, we tested the difference between beta diversity values across habitat types.