Skip to main content

Genetic diversity of Shaw's agave and soil associated microbes in Southern California preserve

Cite this dataset

Bozinovic, Goran; Vu, Jeanne; Vasquez, Miguel; Lombardo, Keith (2021). Genetic diversity of Shaw's agave and soil associated microbes in Southern California preserve [Dataset]. Dryad.


Shaw’s Agave (Agave shawii ssp. shawii) is an endangered maritime succulent growing along the coast of California and Northern Baja California. The population inhabiting Point Loma Peninsula has a complicated history of transplantation without documentation. The low effective population size in California prompted agave transplanting from the U.S. Naval base site (NB) to Cabrillo National Monument (CNM). Since 2008, there are no agave sprouts identified on the CNM site, and concerns have been raised about the genetic diversity of this population. We sequenced two barcoding loci, rbcL and matK, of 27 individual plants from 5 geographically distinct populations, including 12 individuals from California (NB and CNM). Phylogenetic analysis revealed the three US and two Mexican agave populations are closely related and have similar genetic variation at the two barcoding regions, suggesting the Point Loma agave population is not clonal. Agave-associated soil microbes used significantly more carbon sources in CNM soil samples than in NB soil likely due to higher pH and moisture content; meanwhile, soil type and soil chemistry analysis including phosphorus, nitrate nitrogen, organic matter, and metals revealed significant correlations between microbial diversity and base saturation (p < 0.05, r2 = 0.3676), lime buffer capacity (p < 0.01, r2 = 0.7055), equilibrium lime buffer capacity (p < 0.01, r2 = 0.7142), and zinc (p < 0.01, r2 = 0.7136). Soil microbiome analysis within the CNM population revealed overall expected richness (H’ = 5.647 to 6.982) for Agave species, while the diversity range (1 - D = 0.003392 to 0.014108) suggests relatively low diversity marked by high individual variation. The most prominent remaining US population of this rare species is not clonal and does not seem to be threatened by a lack of genetic and microbial diversity. These results prompt further efforts to investigate factors affecting Agave's reproduction and fitness.



Shaw’s Agave leaf tissues were collected from 27 individual plants (Figure 1G, 1H) across the habitat range (Figure 1G) i.e. Point Loma (CNM, NB), Border Field State Park, Rosarito, and Arroyo Hondo. An individual is one tightly grouped cluster of rosettes (Figure 1A-C) or a lone rosette (Figure 1D). Individual’s samples were collected from three sites in California, USA: CNM, CA (7; June 2017); NB, CA (5; June 2017); BC, CA (5; March 2008), and two sites in Mexico: AH, MX (5; October 2008); RS, MX (5; June 2008). Samples were collected from U.S. National Park Service (NPS)-predetermined sites at CNM and NB. Leaf tissue surfaces were cleaned with 70% ethanol and excised with sterile scissors. Three replicates from each individual’s rosette (Figure 1H) were collected at CNM and NB, placed on ice, and stored at -80°C. Leaf tissues collected from BC, AH, and RS (Figure 1G) were stored in silica gel.

Soil samples were collected at a maximal depth of 30 cm within 1-meter from the center of Shaw’s Agave rosettes following the root system lines and within 5 cm of peripheral roots at CNM and NB using ethanol-treated aluminum coring devices. The rhizosphere was not sampled to avoid disturbing the shallow root system of these endangered plants. Three soil replicates per agave cluster were obtained within 2 meters of each other and placed on ice prior to storage at -20°C.


Approximately 100 mg of Shaw’s Agave spineless leaf tissue were snap-frozen with liquid nitrogen and homogenized using a mortar and pestle. DNA was isolated using a DNeasy 96 Plant kit (Qiagen DNeasy, Valencia, CA, USA) according to manufacturer’s instructions. Barcoding regions of rbcL and matK were amplified in an Eppendorf Mastercycler using GoTaq Green. Primers generated in Primer3plus for rbcL and matK are rbcL-F 5’-CTGCGAATTCCCCCTGCTTA-3’, rbcL-R 5’-GATCGCGTCCCTCATTACGA-3’, matK-F 5’-CAAAAGAGGTTCGTTGGGCA-3’, and matK-R 5’-ATTGGCCCAGATCGGCTTAC-3’. PCR products were purified using a PCR clean-up kit (LAMDA Biotech, USA), and samples were sequenced by Eton Bioscience Inc., San Diego.

Phylogenetic tree construction

DNA sequences and chromatograms for rbcL and matK were visualized in FinchTV for quality checking. Multiple sequence alignments were performed in GUIDANCE2; DNA segments with low confidence and/or ambiguous reads were manually removed. Phylogenetic trees were generated in ClustalOmega using neighbor-joining and edited in the Interactive Tree of Life (iTOL; Letunic & Bork, 2019).

16S rRNA Microbial Sequencing and Analysis

Genomic DNA was extracted from soil samples using a Quick-DNA Fecal/Soil Microbe kit (Zymo Research, USA) per manufacturer's instructions. Library preparation was performed using the NEXTflex™ 16S V4 Amplicon-Seq Kit 2.0. Libraries were quantified with a Bioanalyzer (Agilent 2100 and Agilent 2200 TapeStation) and pooled in a MiSeq flow cell for 100,000 reads per sample for 36 samples (12 sites, Point Loma). 16S rRNA sequences (~7 million) were imported and analyzed using Mothur software (version 1.39.5; Kozich et al., 2013) toolset on Galaxy (; Afgan et al., 2018). Forward and reverse paired sequences were aligned and grouped by sample sites. Sequences were aligned to the SILVA 132 bacterial reference (Quast et al., 2013; Yilmaz et al., 2014). Sequences aligned to region 10357 to 25452 in the SILVA reference were selected for further analysis and denoised by the pre.cluster command. Chimeric sequences were screened from the sequences by chimera.vsearch with default settings and the remaining sequences taxonomically classified (classify.seqs) to the SILVA 132 taxonomy reference. Sequences were assigned to operational taxonomic units (OTUs) by cluster.split command with the cut-off set at 0.03 (97% identity). Alpha diversity was calculated by Shannon’s diversity index, Simpson’s diversity index, and observed species richness (OSR); beta diversity was determined by calculating Bray-Curtis dissimilarity index and visualized as an NMDS plot.

Statistical Analysis

A McDonald-Kreitman test ( was performed for US populations (CNM, NB, BC) against Mexican populations (RS, AH) for both rbcL and matK gene sequences to determine divergence between the US and MX populations. Sequence alignments generated in Clustal Omega were used to calculate dS (synonymous substitution) and dN (nonsynonymous substitution) across all samples and between US and MX Shaw’s Agave samples per locus in SNAP v2.1.1. pH and moisture content variation per site were analyzed via one-way ANOVA with post-hoc Bonferroni correction for multiple comparison testing at p < 0.001. t-tests assuming unequal variances were performed to compare richness (S) and evenness (E) of microbial activity between sites. AMOVA (analysis of molecular variance) and HOMOVA (homogeneity of molecular variance) were calculated between different geographic clusters of Shaw’s Agave for barcoding and 16S rRNA data. Pearson’s linear correlation coefficients were generated between all soil-related indices and measurements.


National Park Service