Skip to main content

Data from: Diversity among rare and common congeneric plant species from the Garry oak and Okanagan shrub-steppe ecosystems in British Columbia: implications for conservation

Cite this dataset

Hersh, Evan et al. (2022). Data from: Diversity among rare and common congeneric plant species from the Garry oak and Okanagan shrub-steppe ecosystems in British Columbia: implications for conservation [Dataset]. Dryad.


Using universal non-coding chloroplast DNA markers (cpDNA), we investigated genetic diversity and genetic structure in four rare and common plant species pairs inhabiting threatened ecosystems (Garry Oak and Okanagan shrub-steppe) in British Columbia. The species found in the Garry oak ecosystem are: Sanicula bipinnatifida (purple sanicle; Apiaceae; rare), Sanicula crassicaulis (Pacific sanicle; Apiaceae; common), and Balsamorhiza deltoidea (deltoid balsamroot; Asteraceae; rare). The species found in the Okanagan shrub-steppe ecosystem are: Balsamorhiza sagittata (arrowleaf balsamroot; Asteraceae; common), Orthocarpus barbatus (Grand Coulee owl-clover; Orobanchaceae; rare), Orthocarpus luteus (yellow owl-clover; Orobanchaceae; common), Phacelia ramosissima (branching phacelia; Hydrophyllaceae; rare), and Phacelia linearis (thread-leaved phacelia; Hydrophyllaceae; common). Eight cpDNA regions were sequenced for each study species. Sequences were aligned and concatenated within each species, and single nucleotide polymorphisms (SNPs) were used to analyze patterns of regional genetic diversity and phylogeographic structure within genera and species. Results include: total gene diversity (Ht), nucleotide diversity (π), number of private alleles, haplotype networks, isolation by distance, and analysis of molecular variance. 



Sample Collection

We sampled single individuals from a total of 95 populations. Seventy-three samples were leaf excisions, taken with permission from herbarium specimens representing populations in western North America including the provinces/states British Columbia, California, Idaho, Oregon, Nevada, and Washington. In addition, we collected 22 fresh leaf samples from well-documented and previously vouchered populations of rare species in regional, provincial, and national parks in British Columbia. We placed fresh leaf tissue in paper envelopes inside a sealed plastic bag containing silica gel desiccant and stored them at room temperature. We assigned each of the 95 samples to northern or southern regional categories based on the location of the population sampled in relation to the last glacial maximum (LGM) of the Cordilleran ice sheet.

DNA extraction and sequencing

We ground 10 mg of dried leaf tissue (herbarium) or lyophilized fresh tissue of each sample using a Qiagen Tissuelyser II (Qiagen, Valencia, CA), and extracted DNA with a NucleoSpin® Plant II kit (Machery-Nagel, Bethlehem, PA, USA) with an extended incubation time of 1 hour at 65ºC, recommended for dry herbarium samples. We analyzed DNA concentration and quality with a Nanodrop 2000c (Fisher Scientific, Toronto, ON, Canada) and visualized using agarose gel electrophoresis. We amplified eight non-coding cpDNA regions for all samples using primer sequences published by Shaw et al., 2014 (See Table 1 below). We conducted polymerase chain reactions (PCR) in 50 ul reaction volumes consisting of 0.2 mM dNTP (New England Biolabs, Ipswich, MA, USA), 1X PCR buffer (Stratagene, La Jolla, CA, USA), 2.5 units of Paq5000 (Stratagene), 50 pmol each forward and reverse primers (Integrated DNA Technologies, Skokie, IL, USA), 0.5 ug/ul BSA, 1 mM MgCl2 with 10 ng DNA template. We used the following PCR protocol for amplification: 80ºC for 5 min, 30 cycles of 95ºC for 1 min, 50ºC for 1 min (ramp + 0.3ºC/sec), 65ºC for 5 min followed by a final extension of 65ºC for 5 min. We purified PCR products using Wizard® PCR Preps DNA Purification System (Promega) and sequenced with Thermo Sequenase cycle sequencing kit (Thermo Fisher Scientific, Waltham, MA, USA) on a T100 Thermal Cycler (Bio-Rad, Hercules, CA, USA) with the following conditions: 95ºC for 3 min, cycled 24 times at 95ºC for 30 sec, 58ºC for 30 sec, and 70ºC for 30 sec. We loaded sequencing products on a 5% polyacrylamide gel (UBC Genetic Data Centre protocol) and ran electrophoresis for 10 h on a LI-COR 4200 automated sequencer (LI-COR Inc., Lincoln, NE, USA). We visualized sequences using eSeq DNA Sequencing and Analysis software version 2.0 (LI-COR Inc., Nebraska, USA).

We inspected chromatograms for quality and trimmed them using Geneious v. 10.2.6 (Biomatters, Inc. San Diego, CA, USA). Herbarium samples generally produced shorter reads and lower sequence quality than fresh samples and required extensive trimming. As a result, it was often not possible to assemble our forward and reverse sequences into a single consensus sequence.  Our initial analysis of sequences from herbarium samples showed that unidirectional sequences generated with forward primers were polymorphic, so to reduce sequencing costs due to limited budget we used only forward primers to generate sequence data for fresh tissue specimens and subsequent analysis of all samples (the exception is rpl32-trnL for which we used reverse primers). We aligned sequences within each species and cpDNA region using the MUSCLE alignment algorithm (Edgar 2004) in Geneious with default parameters. We inspected alignments for ambiguous base calls, with special attention to polymorphic sites, and trimmed to the length of the shortest sequence for each region within each species to minimize missing data. We concatenated sequence data for the eight primer regions for each individual prior to analysis.

Sequence analysis

We exported concatenated sequence alignments for each species in FASTA format and imported them as ‘DNAbin’ objects into R v. 4.0.4 (R Core team 2019) using the read.dna() function from the ape package (Paradis and Schliep 2019).  We excluded insertions and deletions (indels) from analysis, as the majority were present in only a single individual within each species and may have been the result of inaccurate base calls due to low sequence quality generated from herbarium samples. Unless otherwise specified, missing data were handled using the default options in the R functions described below; in most cases, R functions use pairwise deletion of sites with missing values. We extracted single nucleotide polymorphisms (SNPs) from DNAbin objects and stored them as ‘genind’ objects for use with the analysis packages adegenet (Jombart 2008), poppr (Kamvar et al. 2014), and hierfstat (Goudet 2005)To describe patterns of genetic diversity in our dataset we calculated nucleotide diversity (π) for each species using the pegas package (Paradis 2010) and total gene diversity (Ht) using the basic.stats() function (with “diploid=FALSE” setting) from the hierfstat package. Both metrics account for the number of samples sequenced to produce unbiased estimates of diversity. To assess whether our results were biased by unequal sampling, we used linear models to test for a relationship between diversity measures and sample size within species using the lm() function in R. To estimate the uniqueness of each species and differences between northern and southern populationswe calculated the number of private alleles using the private_alleles() function in poppr and by calculating the proportion of private alleles per individual in each species and each geographic region.

We used paired t-tests to determine if rare species had lower diversity (proportion of polymorphism, proportion of private alleles per individual, π, and Ht) than their common congeners. We also tested if populations in the north had lower diversity (proportion of private alleles per individual, π, and Ht) than populations in the south within Garry oak and Okanagan shrub-steppe ecosystems separately. We relied on the probability from a one-tailed distribution as we hypothesized that diversity might only be lower (not higher) in the northern populations or among rare species. Means are presented with standard errors. We further assessed the degree of genetic differentiation between northern (recently glaciated) and southern regional categories within species, using analysis of molecular variance (AMOVA). We performed AMOVA using the poppr.amova() function in poppr, and tested for significant differentiation using the randtest() function (with 1000 randomizations).  We applied the additional options “within=FALSE” (disable calculation of within-individual variation as cpDNA data are haploid) and “cutoff=0.1” (allow loci with 10% missing data) to poppr.amova()To visualize relatedness among samples and genetic structure within species, we constructed haplotype networks from DNAbin objects using the haplotype() and haploNet() functions in pegas. To help us better understand the effects of geographically limited dispersal on the degree of genetic differentiation, we assessed isolation by distance (IBD) within species. We calculated pairwise genetic and geographic distances between all individuals within species using the base R dist() function, and tested for significance of IBD using Mantel tests with the mantel.randtest() function (1000 replicates) in ade4.





Usage notes

Please refer to README.txt file


British Columbia Ministry of Environment

Environment and Climate Change Canada

Priority Places Funding