Soil metabarcoding helps identify recalcitrant taxa from chaparral seed banks
Data files
Jan 30, 2026 version files 533.29 KB
-
Attrition.csv
1.24 KB
-
Attrition.Rmd
2.48 KB
-
Data.csv
14.39 KB
-
Greenhouse.Rmd
13.56 KB
-
Piru_ITS.Rmd
53.20 KB
-
Piru_ITS2f_123.fasta
57.79 KB
-
Piru_ITS2f_meta_104.csv
6.69 KB
-
Piru_ITS2f200_123_104_new.csv
28.42 KB
-
Piru_ITS2f200_taxa_123.csv
4.82 KB
-
Piru_rbcL_127.fasta
39.73 KB
-
Piru_rbcL_ASVt_113_104.csv
25.88 KB
-
Piru_rbcL_meta_127_104.csv
6.88 KB
-
Piru_rbcL_taxa_113_v2.csv
5.52 KB
-
Piru_rbcl.Rmd
53.04 KB
-
README.md
17.98 KB
-
Test_ITS2_113taxa.csv
7.40 KB
-
Test_ITS2_ASV_table_113.csv
22.45 KB
-
Test_ITS2.Rmd
21.36 KB
-
Test_ITS2f_133_new.fasta
31.60 KB
-
Test_ITS2f_meta.csv
5.01 KB
-
Test_rbcL_61.fasta
22.04 KB
-
Test_rbcL_ASVt_61.csv
11.94 KB
-
Test_rbcL_meta_80_84.csv
3.82 KB
-
Test_rbcl_taxa_61.csv
2.62 KB
-
Test_rbcl.Rmd
29.86 KB
-
Test_trnL_31.csv
5.85 KB
-
Test_trnL_31.fasta
2.99 KB
-
Test_trnL_meta_34.csv
3.66 KB
-
Test_trnL_taxa_31.csv
1.68 KB
-
Test_trnL.Rmd
29.38 KB
Abstract
Evaluating seed bank composition by germinating seeds from soil cores is a common technique used in ecological studies to identify the plant biodiversity reservoir of a site. However, failure to meet required germination cues or to correctly detect uncommon species are major hurdles to creating a comprehensive plant list from the soil seed bank. Identifying plant species from genetic material within the soil environment (eDNA or eRNA) via metabarcoding offers a potential solution that has not yet been widely utilized at least in part because interpretations of results are not always straightforward. To address this issue, we first assessed extraction and amplification protocols in a series of proof-of-concept experiments where we controlled the soil seed bank and soil environments. We found that barcodes from DNA were more consistently amplified than from RNA and adding a germination stimulant, such as water, did not significantly influence sequencing yield. We then compared our molecular methods to traditional methods of germinating seed banks using soil samples collected from a degraded chaparral site in southern California where germinating native plants ex situ is challenging. We found that the rbcL barcode identified the largest number of plant families while the ITS2 barcode identified the most plant genera. Species that are traditionally challenging to germinate, such as fire-followers and hemiparasitic plants, were among those identified by metabarcoding but not by traditional methods. Pairing molecular tools with ecological site familiarity will make the species identification process more efficient, complete, and especially conducive for identifying the recalcitrant species of soil seed banks.
https://doi.org/10.5061/dryad.hdr7sqvsw
Description of the data and file structure
Identifying plant species from genetic material within the soil environment (eDNA or eRNA) via metabarcoding offers a potential solution that has not yet been widely utilized at least in part because interpretations of results are not always straightforward. To address this issue, we first assessed extraction and amplification protocols in a series of proof-of-concept experiments where we controlled the soil seed bank and soil environments. We then compared our molecular methods to traditional methods of germinating seed banks using soil samples collected from a degraded chaparral site in southern California where germinating native plants ex situ is challenging. Here, we provide the files necessary to perform the statistical analyses included in a submitted manuscript, currently titled "Soil metabarcoding helps identify recalcitrant taxa from chaparral seed banks". Sequence data comes from Illumina MiSeq, standard metabarcoding procedures for rbcL, ITS2, and trnL from soil samples.
Files and variables
File: Test_ITS2.Rmd
Description: R code that reads Test_ITS2_ASV_table_113.csv, Test_ITS2f_meta.csv, and Test_ITS2_113taxa.csv and analyzes the efficiency and accuracy with which soil samples can be analyzed for plant community presence with ITS2 metabarcoding. Test_ITS2f_133_new.fasta contains the raw data.
File: Test_ITS2_ASV_table_113.csv
Description: Community ASV table in which the columns are sample names, described in the meta file Test_ITS2f_meta.csv, and rows are plant taxa, described in the taxa file Test_ITS2_113taxa.csv This ASV table describes the ITS2 amplicon community from the proof-of-concept study.
Variables:
- Variables are described in Test_ITS2f_meta.csv
File: Test_ITS2f_meta.csv
Description: Contains the meta file, description of treatments that each soil sample received or were derived from.
Variables:
- Sample: Raw sample name included in the MiSeq Library manifest
- Sample_name: Sample name of the actual sample (may be different from how it was labeled in the MiSeq Library manifest). Typically, the naming follows a group label that describes the treatment combinations, barcodes, and nucleotide extraction type (see below). It is followed by a letter at the end to denote barcode, r = rbcL, I = ITS, t = trnL
- DNA_RNA: DNA or RNA extraction
- media: Describes the media from which the plant DNA were extracted; sterile sand, sterile (field) soil, air-dried (field) soil
- bio: Whether there were biological plant materials included or not, SALE = Salvia leucophylla, BRDI = Bromus diandrus
- DI_addition: Sample was incubated in deionized water for 24 hours (DI) or not (none)
- replicate: There were five replicates total, labeled from a-e
- sample: Group label to describe the unique combination of treatments, e.g., D6 = sterile soil with Salvia leucophylla incubated in water and extracted for DNA
- sample_trt: Group label to describe the unique combination of treatments followed by the replicate designation (a-e)
- extraction: Group label to describe the unique combination of treatments followed by the replicate designation (a-e), but without the nucleotide designation (i.e., "D" or "R" missing from the sample_trt name).
File: Test_ITS2_113taxa.csv
Description: Taxonomic assignments of the ITS2 ASV sequences based on BLAST results
File: Test_rbcl.Rmd
Description: R code that reads Test_rbcL_ASVt_61.csv, Test_rbcL_meta_80_84.csv, and Test_rbcl_taxa_61.csv and analyzes the efficiency and accuracy with which soil samples can be analyzed for plant community presence with rbcL metabarcoding. Raw sequence file is available in Test_rbcL_61.fasta
File: Test_rbcL_ASVt_61.csv
Description: Community ASV table in which the columns are sample names, described in the meta file Test_rbcL_meta_80_84.csv, and rows are taxa, described in the taxa file Test_rbcl_taxa_61.csv. This ASV table describes the rbcL amplicon community from the proof-of-concept study.
File: Test_rbcL_meta_80_84.csv
Description: Contains the meta file, description of treatments that each soil sample received or were derived from.
Variables:
- Sample: Sample name; typically, the naming follows a group label that describes the treatment combinations, barcodes, and nucleotide extraction type (see below). It is followed by a letter at the end to denote barcode, r = rbcL, I = ITS, t = trnL.
- DNA_RNA: DNA or RNA extraction
- media: Describes the media from which the plant DNA were extracted; sterile sand, sterile (field) soil, air-dried (field) soil
- bio: Whether there were biological plant materials included or not, SALE = Salvia leucophylla, BRDI = Bromus diandrus
- DI_addition: Sample was incubated in deionized water for 24 hours (DI) or not (none)
- replicate: There were five replicates total, labeled from a-e
- sample: Group label to describe the unique combination of treatments, e.g., D6 = sterile soil with Salvia leucophylla incubated in water and extracted for DNA. Each number, typically, represents unique combination of media x bio x DI_addition. D or R represents DNA or RNA.
- sample_trt: The unique name to describe the unique combination of treatments followed by the replicate designation (a-e)
File: Test_rbcl_taxa_61.csv
Description: Taxonomic assignments of the rbcL ASV sequences based on BLAST results for the proof-of-concept study
File: Test_trnL.Rmd
Description: R code that reads Test_trnL_31.csv, Test_trnL_meta_34.csv, and Test_trnL_taxa_31.csv and analyzes the efficiency and accuracy with which soil samples can be analyzed for plant community presence with trnL metabarcoding. Raw sequence files used to make the taxonomic assignments are available in Test_trnL_31.fasta
File: Test_trnL_31.csv
Description: Community ASV table in which the columns are sample names, described in the meta file Test_trnL_meta_34.csv, and rows are taxa, described in the taxa file Test_trnL_taxa_31.csv. This ASV table describes the trnL amplicon community from the proof-of-concept study.
File: Test_trnL_meta_34.csv
Description: Contains the meta file, description of treatments that each soil sample received or were derived from.
Variables:
- First column does not have name but this is the sample name column; typically, the naming follows a group label that describes the treatment combinations, barcodes, and nucleotide extraction type (see below). It is followed by a letter at the end to denote barcode, r = rbcL, I = ITS, t = trnL.
- DNA_RNA: DNA or RNA extraction
- media: Describes the media from which the plant DNA were extracted; sterile sand, sterile (field) soil, air-dried (field) soil
- bio: Whether there were biological plant materials included or not, SALE = Salvia leucophylla, BRDI = Bromus diandrus
- DI_addition: Sample was incubated in deionized water for 24 hours (DI) or not (none)
- replicate: There were five replicates total, labeled from a-e
- sample: Group label to describe the unique combination of treatments, e.g., D6 = sterile soil with Salvia leucophylla incubated in water and extracted for DNA. Each number, typically, represents unique combination of media x bio x DI_addition. D or R represents DNA or RNA.
- sample_trt: The unique name to describe the unique combination of treatments followed by the replicate designation (a-e)
File: Test_trnL_taxa_31.csv
Description: Taxonomic assignments of the trnL ASV sequences based on BLAST results for the proof-of-concept study
File: Greenhouse.Rmd
Description: R code that reads Data.csv and analyzes the identification of plant species from a germination study.
File: Data.csv
Description: Species survey from germination study in which soil samples were watered and seedlings were identified to species. The species list was compared to what could be sequenced with rbcL or ITS2 directly.
Variables
- Sample: Sample name, typically made from the combination of Plot, Depth and Treatment (see below).
- Plot: Different areas of a restoration site were studied, there were ten plots total (numbered from 1-10), but five were chosen a priori for this study and renamed A-E for the manuscript. However, they are referenced by their numbers (e.g., 'Two', or 'Six') during the statistical analyses and in the R codes.
- Depth: Top 4 cm (i.e., "A") or Bottom, between 4-20 cm (i.e., "B")
- Plot_Depth: Plot and depth information is the combination of the Plot (designated by their numbers still, e.g., '2', '6') and the Depth ('A' or 'B'). Hence, '2A' is top soil in Plot 2, which is Plot A in the manuscript.
- Treatment: Soils received treatments of heat, charate, heat + charate, or no treatment (control)
- species: Plant species name of the seedling
- Family: Family plant taxa to which the species belongs
- Abundance: Abundance of the seedling in the soil sample
File: Piru_ITS.Rmd
Description: R code that reads Piru_ITS2f200_123_104_new.csv, Piru_ITS2f_meta_104.csv, and Piru_ITS2f200_taxa_123.csv and analyzes the efficiency and accuracy with which field soil samples can be analyzed for plant community presence with ITS2 metabarcoding. Soil samples come from a chaparral restoration site in Piru, CA. Raw sequence data for taxonomic assignments are available at Piru_ITS2f_123.fasta
File: Piru_ITS2f200_123_104_new.csv
Description: Community ASV table in which the columns are sample names, described in the meta file Piru_ITS2f_meta_104.csv, and rows are taxa, described in the taxa file Piru_ITS2f200_taxa_123.csv. This ASV table describes the ITS2 amplicon community from the Piru field study.
File: Piru_ITS2f_meta_104.csv
Description: Contains the meta file, description of treatments that each soil sample received or were derived from. Samples are from the field site in Piru, California. This file was used in Piru_ITS.Rmd for analyses.
Variables
- Sample: Sample name; typically, the naming follows a group label that describes the treatment combinations, barcodes, and nucleotide extraction type (see below), e.g., 2A2ADI. The first position designates the Plot (e.g., '2', see below). The second position designates Depth (Top = A, Bottom = B). The third position designates DI_addition treatment (1 = no DI treatment, 2 = DI treatment). The fourth position designates the replicate, with four replicates at most (A-D). The fifth position designates DNA or RNA extraction (DNA = D, RNA = R). The last position denotes barcode type, r = rbcL, I = ITS, t = trnL.
- DNA_RNA: DNA or RNA extraction
- Plot: Different areas of a restoration site were studied, there were ten plots total (numbered from 1-10), but five were chosen a priori for this study and renamed A-E for the manuscript. However, they are referenced by their numbers (e.g., 'Two', or 'Six') during the statistical analyses and in the R codes. They are changed to numerical shorthand (e.g., '2' or '6') in labels.
- Depth: Top 4 cm or Bottom, between 4-20 cm
- DI_addition: Sample was incubated in deionized water for 24 hours (DI) or not (none)
- replicate: There were four replicates per treatment type (A-E)
- marker: rbcL or ITS2; ITSf = forward only of ITS2
- soil_sample: Plot and depth combination (e.g., 2A = Top depth of plot 2). A = Top depth, B = Bottom depth. We investigated plots 2, 3, 6, 7, and 9, which became Plots A, B, C, D, and E respectively in the manuscript to not confuse readers for the missing plots.
- soil_prep: Depth x DI_addition category combination (e.g., A2 = Top depth of plot 2). A = Top depth, B = Bottom depth. 1 = no DI treatment, 2 = DI treatment
- treatment: Plot x Depth x replicate x DNA_RNA combinations x marker, e.g., 2AADI = Plot 2 + Top depth (A) + replicate A + DNA extraction (D) + ITSf (I)
- extraction: Plot x Depth x DI_addition x replicate, e.g., 2A2A = Plot 2 + Top depth (A)+ DI treatment (2) + replicate A
- sample_trt: Plot x Depth x DI_addition x replicate x DNA_RNA, e.g., 2A2AD = Plot 2 + Top depth (A) + DI treatment (2)+ replicate A + DNA extraction (D)
- replicate_1: replicate x DI_addition, e.g., a + DI = DI treatment + replicate A (not new information, just needed for statistical grouping). When there is no "+ DI", that means there was no DI addition.
- order: order in which to make the figure (not important for analysis)
File: Piru_ITS2f200_taxa_123.csv
Description: Taxonomic assignments of the ITS2 ASV sequences from the Piru field study based on BLAST results. Data only uses forward reads from MiSeq 500 cycle (2 x 250 bp) run. Reads less than 200 base pairs were excluded from downstream analyses.
File: Piru_rbcl.Rmd
Description: R code that reads Piru_rbcL_ASVt_113_104.csv, Piru_rbcL_meta_127_104.csv, and Piru_rbcL_taxa_113_v2.csv and analyzes the efficiency and accuracy with which field soil samples can be analyzed for plant community presence with rbcL metabarcoding. Soil samples come from a chaparral restoration site in Piru, CA. Raw sequences for taxonomic assignments can be found at Piru_rbcL_127.fasta
File: Piru_rbcL_ASVt_113_104.csv
Description: Community ASV table in which the columns are sample names, described in the meta file Piru_rbcL_meta_127_104.csv, and rows are taxa, described in the taxa file Piru_rbcL_taxa_113_v2.csv. This ASV table describes the rbcL amplicon community from the Piru field study.
File: Piru_rbcL_meta_127_104.csv
Description: Contains the meta file, description of treatments that each soil sample received or were derived from. Samples are from the field site in Piru, California. This file was used in Piru_rbcL.Rmd for analyses.
Variables
- Sample: Sample name; typically, the naming follows a group label that describes the treatment combinations, barcodes, and nucleotide extraction type (see below), e.g., 2A2ADr. The first position designates the Plot (e.g., '2', see below). The second position designates Depth (Top = A, Bottom = B). The third position designates DI_addition treatment (1 = no DI treatment, 2 = DI treatment). The fourth position designates the replicate, with four replicates at most (A-D). The fifth position designates DNA or RNA extraction (DNA = D, RNA = R). The last position denotes barcode type, r = rbcL, I = ITS, t = trnL.
- DNA_RNA: DNA or RNA extraction
- Plot: Different areas of a restoration site were studied, there were ten plots total (numbered from 1-10), but five were chosen a priori for this study and renamed A-E for the manuscript. However, they are referenced by their numbers (e.g., 'Two', or 'Six') during the statistical analyses and in the R codes. They are changed to numerical shorthand (e.g., '2' or '6') in labels.
- Depth: Top 4 cm or Bottom, between 4-20 cm
- DI_addition: Sample was incubated in deionized water for 24 hours (DI) or not (none)
- replicate: There were four replicates per treatment type (A-E)
- marker: rbcL or ITS2 (important only when combining rbcL and ITS2 data into one dataset for nmds)
- soil_sample: Plot and depth combination (e.g., 2A = Top depth of plot 2). A = Top depth, B = Bottom depth. We investigated plots 2, 3, 6, 7, and 9, which became Plots A, B, C, D, and E respectively in the manuscript to not confuse readers for the missing plots.
- soil_prep: Plot x Depth x DI_addition category combination (e.g., 9B1 = Bottom depth of plot 9 with no DI treatment). A = Top depth, B = Bottom depth. 1 = no DI treatment, 2 = DI treatment
- treatment: Plot x Depth x replicate x DNA_RNA combinations, e.g., 9BADNA = Plot 9 + Bottom depth + replicate A + DNA extraction
- extraction: Plot x Depth x DI_addition x replicate, e.g., 2A2A = Plot 2 + Top depth + DI treatment + replicate A
- sample_trt: Plot x Depth x DI_addition x replicate x DNA_RNA, e.g., 2A2AD = Plot 2 + Top depth + DI treatment + replicate A + DNA extraction
- replicate_1: Replicate x DI_addition, e.g., a + DI = DI treatment + replicate A (not new information, just needed for statistical grouping). When there is no "+ DI", that means there was no DI addition.
- order: order in which to make the figure (not important for analysis)
File: Piru_rbcL_taxa_113_v2.csv
Description: Taxonomic assignments of the rbcL ASV sequences from the Piru field study based on BLAST results.
File: Attrition.Rmd
Description: R code that reads Attrition.csv and analyzes how many samples are successfully processed depending on treatment type.
File: Attrition.csv
Description: Summary of the number of samples that passed different stages to successful sequencing in each treatment group of samples.
Variables
- Treatment: DNA or RNA extraction, DI or no DI treatment
- Primer: rbcL or ITS2
- X: categories of samples that passed different stages of the analysis; Total, Extracted successfully, Amplified successfully, Sequenced successfully
- Y: number of samples that were in Total, Extracted successfully, Amplified successfully, or Sequenced successfully
Files: Test_trnL_31.fasta, Test_rbcL_61.fasta, Test_ITS2f_133_new.fasta, Piru_rbcL_127.fasta, Piru_ITS2f_123.fasta
Description: Sequences generated from MiSeq. Title of the file refers to the study (proof-of-concept = Test, field study of the five plots comparing to greenhouse results = Piru). Barcode also referenced in the titlee.g., trnL, rbcL, and ITS2f. ITS2f refers to just the forward sequenced (no reverse sequences).
Code/software
R is needed to read the .Rmd files. Each .Rmd file has list of packages of needed to run the analyses.
MATERIALS and METHODS
Sequencing known seed bank (Proof-of-concept)
We tested the accuracy and efficiency of metabarcoding methods across a combination of different soil complexities, added plant species, and soil preparations, for a total of 12 treatment groups with five replicates each. We first tested how efficiently S. leucophylla seeds could be sequenced from the simplest soil matrix - sterilized sand - under three germination treatments prior to nucleotide extraction, which included no treatment, 24 hour incubation with deionized (DI) water, and 24 hour incubation with gibberellic acid (GA). We expected stimulation for germination with water or GA would increase RNA transcription and yield. S. leucophylla seeds were chosen as our control species since their seeds had been recently collected from the wild and were abundantly available for testing.
Next, we tested how accurately and efficiently S. leucophylla seeds could be sequenced with metabarcoding from sterilized field soils, which were collected from Piru, CA, a chaparral ecosystem undergoing restoration (site history described in next section). The site occurs on a mix of ~30% Lodo and ~25% Botella soils, plus similar soil series, and are associated with ~25% rock outcrops occurring on 30-60% slopes (Web Soil Survey). The soils were collected from a depth of 12 cm. The sand and field soil (<100 g) were both sterilized at 125 ℃ in the autoclave for 30 minutes.
We also tested whether metabarcoding would preferentially sequence S. leucophylla seeds over non-seed material by adding tissues of air-dried Bromus diandrus roots to some of the samples. Bromus diandrus, along with other non-native Bromus species, is a common non-native annual grass with a fibrous root system, generally found near the soil surface, that can outcompete shrub seedlings (Park, 2020) and invade chaparral stands (Haidinger and Keeley 1993). Bromus diandrus roots were freshly collected from senesced stands in July 2019 from Isla Vista, CA, and air-dried for three weeks on the lab bench. Large fresh quantities of B. diandrus were readily available and suitable to compare against S. leucophylla since they are taxonomically distant (Poaceae vs. Lamiaceae). Hence, we tested four combinations of added plant materials - S. leucophylla seeds only, B. diandrus roots only, S. leucophylla seeds with B. diandrus roots, and no plant material. These four treatment groups were tested with and without incubation in 24 hours of DI water. Finally, we sequenced the field soil without sterilizing (only air-dried and no incubation treatment) to assess potential sources of contamination. All 12 treatments were analyzed by both DNA and RNA and subsequently amplified for three barcode regions: rbcL (ribulose-bisphosphate carboxylase), ITS2 (second spacer of the internal transcribed region of the nuclear ribosome), and trnL (a non-coding chloroplast intron). We assessed accuracy based on identification of S. leucophylla reads, if the seeds were added, and efficiency based on number of replicates that were successfully amplified and sequenced.
Collecting chaparral soils
Soil samples were collected in June 2019 from two degraded chaparral stands as part of an ongoing restoration study at the Los Padres National Forest in Piru, California. One stand (Stand 1: Sites A and B) burned in a 2007 wildfire while the other (Stand 2: Sites C, D, and E) burned in 2003 and 2007. At both stands, native chaparral shrubs have declined in diversity and abundance and the canopy is co-dominated by a native, early-successional shrub, Malacothamnus fasciculatus, and various non-native grasses and forbs (e.g., Avena barbata, Bromus madritensis, Centaurea melitensis). Native shrubs Salvia leucophylla, Rhus ovata, Baccharis pilularis, Artemisia californica are also present at the northern Stand 1. Stands 1 and 2 were subdivided into a total of ten sites (approx. 350 m2 per site). From each site, 4.5 cm diameter soil cores were collected from 0-4 cm (top soil) or 4-12 cm (bottom soil) for a total of two soil samples per site. Deeper soil cores were predicted to include older more-native seed banks whereas the shallower soil cores were predicted to capture more non-native species due to recent history of introduction. Soil samples were transported to UC Santa Barbara where they were air dried for three weeks before being sieved through 4 mm and 2 mm sieves to remove rocks and to homogenize the soil sample, and then either prepared for the germination study or stored at 3℃ (35.6°F) prior to metabarcode sequencing.
Based on the results from the completed germination study (methods described below), five sites (Sites A, B, C, D, E) with the greatest and lowest plant species richness were chosen a posteriori for metabarcode sequencing in order to test correlations in species richness between the two methods. For each soil sample, two subsamples of 20 g of soil were placed in a 15 cm diameter Petri dish and incubated in a growth chamber at 35℃ for 18 hours to simulate a very hot day. Subsequently, one Petri dish of soil was incubated in DI water for 24 hours (+ DI), while the other received no treatment.
Nucleic Extraction, Amplification, and Sequencing
All soil samples were ground in liquid nitrogen with a mortar and pestle and stored at -80°C prior to isolation of DNA or RNA. The Qiagen RNeasy PowerSoil Total RNA Kit and the Qiagen RNeasy PowerSoil DNA Elution Kit (Qiagen, Maryland, United States) were used to simultaneously isolate DNA and RNA from the same replicate samples. The extracted nucleotides were quantified with dsDNA or RNA Qubit Assays (Qiagen, Maryland, United States).
All molecular work was performed with sterile techniques in a biosafety cabinet. Three ng of DNA and RNA were used for amplification of the target barcode regions. We tested the barcode regions rbcL, ITS2, and trnL for the proof-of-concept study, but we did not amplify the trnL region for the field study since trnL is considerably shorter, used less in other studies, and therefore unlikely to provide additional taxonomic resolution to the study. RNA samples were reverse-transcribed and amplified with the Qiagen OneStep RT-PCR Kit (Qiagen, Maryland, United States). Cleaned amplified products were standardized to 5 nM and pooled. The pooled libraries were sequenced with paired-end sequencing (2 × 250 bp) on an Illumina MiSeq sequencer at the California Nanosystems Institute at the University of California - Santa Barbara, CA, with a 15% spike-in of PhiX.
Sequences were filtered and denoised into amplicon sequence variants (ASVs), which target variation within the finest level possible within a given barcode region (Callahan et al. 2017). ASVs were generated using a custom dada2 pipeline by combining identical reads, or nucleotide sequences (Callahan et al. 2016). Plant taxonomic identities were manually checked with BLAST (Altschul et al. 1990). The samples were sequenced across three separate MiSeq libraries with some of the same samples sequenced twice (Accession PRJNA1126262).
