Data from: Hybrid enrichment of adaptive variation revealed by genotype-environment associations in montane sedges
Cite this dataset
Hodel, Richard; Massatti, Rob; Knowles, Lacey (2022). Data from: Hybrid enrichment of adaptive variation revealed by genotype-environment associations in montane sedges [Dataset]. Dryad. https://doi.org/10.5061/dryad.sbcc2fr8f
The role of hybridization in diversification is complex and may result in many possible outcomes. Not only can hybridization produce new lineages, but those lineages may contain unique combinations of adaptive genetic variation derived from parental taxa that allow hybrid-origin lineages to occupy unique environmental space relative to one (or both) parents. We document such a case of hybridization between two sedge species, Carex nova and Carex nelsonii (Cyperaceae), that occupy partially overlapping environmental space in the southern Rocky Mountains, USA. In the region hypothesized to be the origin of the hybrid lineage, one parental taxon (C. nelsonii) is at the edge of its environmental tolerance. Hybrid-origin individuals display mixed ancestry between the parental taxa – of nearly 7,000 unlinked loci sampled, almost 30% showed evidence of excess ancestry from one parental lineage – approximately half displayed a genomic background skewed towards one parent, and half skewed towards the other. To test whether excess ancestry loci may have conferred an adaptive advantage to the hybrid-origin lineage, we conducted genotype-environment association analyses on different combinations of loci – with and without excess ancestry – and with multiple contrasts between the hybrids and parental taxa. Loci with skewed ancestry showed significant environmental associations distinguishing the hybrid lineage from one parent (C. nelsonii), whereas loci with relatively equal representation of parental ancestries showed no such environmental associations. Moreover, the overwhelming majority of candidate adaptive loci with respect to environmental gradients also had excess ancestry from a parental lineage, implying these loci have facilitated the persistence of the hybrid lineage in an environment unsuitable to at least one parent.
Field sampling and genomic characterization
Samples were field-collected from eight C. nelsonii localities, 15 C. nova localities, and five putative hybrid sites (Fig. 1). Between two and ten individuals (mean = 8.5) separated by at least five meters were collected at each sampling locality (Supplemental Table S1, Supplemental Table S2). Putative hybrid-origin individuals collected and keyed out to C. nelsonii were only identified as hybrid-origin during the course of population genetic analyses. A total of 237 individuals – 54 C. nelsonii, 136 C. nova, and 47 putative hybrids – were used for genomic analyses (Fig. 1, Supplemental Table S1, Supplemental Table S2).
We performed double-digest RAD-Seq (Peterson et al. 2012) to obtain anonymous genomic loci. We used two restriction enzymes, EcoRI and MseI, to digest genomic DNA, and ligated Illumina adaptor sequences and unique 10 bp barcodes to restriction sites. The ligation products for each library were pooled and PCR-amplified for 12 cycles, and 400-500 bp fragments were size selected using a Pippin Prep (Sage Science, Beverly, MA). All libraries were sequenced on single-end Illumina HiSeq 2500 runs at The Centre for Applied Genomics (Hospital for Sick Children, Toronto, Ontario). Raw sequencing reads were processed using ipyrad v9.42 (Eaton 2014, Eaton & Overcast 2020) in a single assembly that combined both species and the putative hybrids (Supplemental Table S2). The combined assembly approach facilitated the tracking of allele frequencies across all individuals and populations in downstream analyses. The ipyrad assembly was filtered using vcftools (Danacek et al. 2011) to allow 40% missing data per site (--max-missing = 0.6) and require a minor allele count of two or greater (--mac = 2). The resulting data matrix consisted of 6,946 unlinked SNPs (i.e., one randomly selected SNP per RAD-Seq locus). Because we were interested in the effects of putative adaptive loci associated with hybridization and not loci associated with selection within one of the parental species, we used PCAdapt (Luu et al. 2017) to remove FST outliers identified within either C. nelsonii or C. nova; in total, we removed 206 loci, which left 6,740 unlinked SNPs that were used in most subsequent analyses, except where noted. To verify that removing FST outliers did not bias our downstream results, we ran several analyses both with and without FST outliers, including analyses of excess ancestry, all genetic diversity and differentiation statistics, and redundancy analyses, as noted below.
A separate ipyrad assembly was performed to add an outgroup (C. chalciolepis) suitable for rooting the phylogeny of C. nelsonii + C. nova + hybrids and for use in downstream analyses. This assembly included the 237 individuals referenced above plus 152 individuals of C. chalciolepis and leveraged the same filtering strategy as the first assembly. The resulting SNP data matrix consisted of 6,069 unlinked SNPs, which were used in the SVDQuartets and TREEMIX analyses. The authors of HyDe recommend including invariant sites in the genetic dataset, so we used all 69,373 sites from the assembly with the outgroup in the HyDe analysis.
Usage notes are provided in the file "README.txt"
National Science Foundation, Award: 06608147
National Science Foundation, Award: 1655607