Of Mojave milkweed and mirrors: The population genomic structure of a species impacted by solar energy development
Data files
Oct 30, 2023 version files 66.04 MB
-
All_Sample_LatLong_Info.txt
-
f15.recode.vcf
-
MMW_Analysis.Rmd
-
README_file.txt
Abstract
A rapid renewable energy transition has facilitated the development of large, ground‐mounted solar energy facilities worldwide. Deserts, and other sensitive aridland ecosystems, are the second most common land‐cover type for solar energy development globally. Thus, it is necessary to understand existing diversity within environmentally sensitive desert plant populations to understand spatiotemporal effects of solar energy siting and design. Overall, few population genomic studies of desert plants exist, and much of their biology is unknown. To help fill this knowledge gap, we sampled Mojave milkweed (Asclepias nyctaginifolia) in and around the Ivanpah Solar Electric Generating Station (ISEGS) in the Mojave Desert of California to understand the species' population structure, standing genetic variation, and how that intersects with solar development. We performed Restriction‐site Associated Sequencing (RADseq) and discovered 9942 single nucleotide polymorphisms (SNPs). Using these data, we found clear population structure over small spatial scales, suggesting each site sampled comprised a genetically distinct population of Mojave milkweed. While mowing, in lieu of blading, the vegetation across the solar energy facility's footprint prevented the immediate loss of the ISEGS Mojave milkweed population, we show that the effects of land‐cover change, especially those impacting desert washes, may impact long‐term genetic diversity and persistence. Potential implications of this include a risk of overall loss of genetic diversity, or even hastened extirpation. These findings highlight the need to consider the genetic diversity of impacted species when predicting the impact and necessary conservation measures of large‐scale land‐cover changes on species with small population sizes.
Methods
Sample Collection
In 2015, we sampled leaf tissue of all vegetative Mojave milkweed plants at four locations: three sites undisturbed by facility construction and throughout ISEGS. The first of these sites (“Excelsior”) is approximately 21 kilometers west of ISEGS. The other two locations are approximately five kilometers north (“Umberci),” and 60 kilometers south (“Bobcat”) of the solar facility (Fig.1). We cut small sections of green leaf tissue from mature plants and stored them in individually labeled coin envelopes with desiccant packets to promote drying. When present, we collected seeds for subsequent growth in a greenhouse, where leaves were cut and similarly stored once the plants reached sufficient maturity. It is important to note that while we collected all the plants present at the time, there is the possibility that some plants died back prior to our ability to collect them or remained dormant as rhizomes that season.
We collected additional samples in the fall of 2018 from plants previously identified within the ISEGS halos (n= 8) as well as plants that emerged in halos after the 2015 sampling (n= 30). We also collected any previously unidentified plants that grew within the facility but outside of designated halo areas (n= 51). We acquired additional samples from the Umberci site of previously identified but unsampled plants (n= 32) and newly emerged plants (n= 16). We designated Individuals (genets) based on the distance from other plants and sampled multiple ramets per genet if possible. We collected leaves from juvenile or adult individuals. For both years, we recorded the location of all plants present, even if they were too small to sample, to establish a census size. See Table 1 for a summary of our sampling scheme and Figure 1 for a map of the sites.
Sequencing
For both the 2015 and 2018 samples, we disrupted the dried plant tissue with steel beads using a bead mill prior to extracting DNA. We performed DNA extractions using the DNeasy Plant Mini kit (QIAGEN Inc., Valencia, CA, USA) and quantified the resulting concentrations of DNA using a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA). We diluted the purified DNA to a concentration of 10.0 ng/ul using low TE in preparation for Restriction site Associated DNA Sequencing (RADSeq) using the Best-RAD method (Ali et al. 2016). A modification to Ali et al. (2016) is that we used the restriction endonuclease pstI to digest the DNA due to the more favorable number of cut sites given the GC content and size of the Asclepias syriaca reference genome (Weitemier et al. 2019) (Genbank accession GFXT01000000). We sonicated samples to a fragment length of 200 base pairs for the 2015 samples and 300 base pairs for the 2018 samples using a Covaris m220 (Covaris, Woburn, MA, USA). Following library preparation with the NEBnext Ultra DNA kit for Illumina (New England Biolabs, Inc., Ipswich, MA), we performed library trace analysis using a Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA). We sequenced the 2015 samples on the Illumina HiSeq3000 platform at the University of California Davis DNA Technologies Core (PE-2x100bp). For the 2018 samples, we sequenced on the Illumina HiSeq X platform (PE-2x150bp) at the UC Davis Sequencing Center (Novogene Corporation Inc.). The longer read length for the 2018 samples was due to the technical specifications of the sequencing platform. For analyses combining datasets from both sampling years, we trimmed the 2018 samples to the same length as the 2015 samples (100bp) using trimmomatic v0.38 (Bolger et al. 2014).
SNP Discovery
Following sequencing, we demultiplexed data for all individuals (n=233) using -process_radtags in STACKS v2.4 (Catchen et al. 2011, 2013) and the following tags: --bestrad, -c, -r, -D. We aligned the files to the Asclepias syriaca reference genome (Weitemier et al. 2019) (Genbank accession GFXT01000000) using the --very-sensitive-local wrapper in Bowtie2 v2.3.4 (Langmead & Salzberg 2012). We used the -ref_map.pl pipeline in STACKS v2.4 to call random SNPs (--write_random_snp) inhe dataset. We retained loci that were present in at least 30% of individuals per population within a single population and proceeded with quality filtration on the resulting VCF file.
We quality filtered the resulting file using VCFtools v0.1.15 (Danecek et al. 2011). Initially, we identified and removed individuals that were not genotyped at greater than 95% of loci and genotypes with a minimum read depth of less than 5. We then filtered out all genotypes with a gene quality score of less than 20. We subsequently removed loci with a minor allele count (MAC) of less than three [see (O’Leary et al. 2018)], followed by filtering out SNPs with a call rate of less than 90%. The final filtration step again identified and removed individuals with less than 85% of loci genotyped. We performed the SNP discovery on all samples combined and separated out the individual sampling years during for downstream analysis. We removed any remaining monomorphic loci using informloci in the R (R Core Team, 2020) package poppr (Kamvar et al. 2014, 2015) prior to proceeding with further analyses.
Genetic Diversity
We analyzed the resulting sequence data to determine genetic diversity using allelic richness, effective population size (Ne), inbreeding coefficients (Fis), population differentiation, and observed/expected heterozygosity. We calculated heterozygosity, Fis, and allelic richness using the basic.stats and allelic.richness functions of hierfstat (Goudet 2005). Private alleles were determined using the private_alleles function in the R package poppr (Kamvar et al. 2014, 2015). In the private allele calculations, we used datasets from 2015 and a combined 2015 and 2018 dataset that excluded clones. We calculated the effective population size using NeEstimator v2.1 (Do et al. 2014) under the linkage disequilibrium model with an allele frequency of 0.05 using our clone-free dataset from 2015 (see below).
Population Structure
When determining population structure, we removed clonal ramets from the 2015 dataset according to their multilocus genotypes, using the mlg.filter function with a genetic distance threshold of 0.04 as calculated by the bitwise.dist function in poppr (Kamvar et al. 2014). We incorporated relatedness by calculating pairwise φ among all samples using the relatedness2 estimator in VCFtools (Manichaikul et al. 2010; Danecek et al. 2011). We removed individuals with pairwise φ values greater than 0.177, which corresponds with first-degree relatives such as full siblings and parent-offspring pairs, as clustering algorithms can be influenced by close relatives (Rodríguez‐Ramilo & Wang 2012; Rodríguez-Ramilo et al. 2014). For each dataset, we evaluated the population structure of the plants using discriminant analysis of principle components via the R package adegenet (Jombart 2008; Jombart et al. 2010). We coss-referenced these results using the program structure (Pritchard et al. 2000; Falush et al. 2003; FALUSH et al. 2007; HUBISZ et al. 2009) and Structure Harvester (Earl & vonHoldt 2011). We also assessed population differentiation (Fst) using the method outlined by Weir and Cockerham (Weir & Cockerham 1984). To further investigate the relationships between individuals, we calculated Minimum Spanning Networks (MSN) in poppr using the bitwise.dist and poppr.msn functions. Finally, we tested isolation by distance in the samples using the R package conStruct (Bradburd et al. 2018).