Iris lacustris, a northern Great Lakes endemic, is a rare species known from 165 occurrences across Lake Michigan and Huron in the United States and Canada. Due to multiple factors, including habitat loss, lack of seed dispersal, patterns of reproduction, and forest succession, the species is threatened. Early population genetic studies using isozymes and allozymes recovered no to limited genetic variation within the species. To better explore genetic variation across the geographic range of I. lacustris and to identify units for conservation, we used tunable Genotyping-by-Sequencing (tGBS) with 171 individuals across 24 populations from Michigan and Wisconsin, and because the species is polyploid, we filtered the single nucleotide polymorphism (SNP) matrices using polyRAD to recognize diploid and tetraploid loci. Based on multiple population genetic approaches, we resolved three to four population clusters that are geographically structured across the two ranges of the species. The species migrated from west to east across its geographic range, and minimal genetic exchange has occurred among populations. Four units for conservation are recognized, but nine adaptive units were identified, providing evidence for local adaptation across the geographic range of the species. Population genetic analyses with all, diploid, and tetraploid loci recovered similar results, which suggests that methods may be robust to variation in ploidy level.

Using the restriction enzyme Bsp1286I, paired-end tGBS libraries were created and subsequently sequenced with an Illumina HiSeq X (Illumina Inc., San Diego, CA, USA). Based on all sequence data, consensus reference sequences were generated with CD-HIT-454 after sequencing depth was normalized to 50X, and sequencing errors were corrected using Fiona. Low-quality reads were discarded (PHRED quality <15 and error rates ≥3%) and trimmed, and GSNAP was employed to map reads to the reference sequences based on the following parameters: ≤2 mismatches per 36 bp and less than five total per 75 bp for tails. SNPs were identified based on the following criteria: two most common alleles supported by at least 30% of the aligned bases, at least five unique reads, the sum of the one or two most common alleles covering at least 80% of the aligned reads, and no polymorphisms in the first or last three base pairs of each read. From the SNPs, two datasets were created: MCR90 with up to 10% missing data, and MCR50 with up to 50% missing data.

Because I. lacustris is a putative polyploid and many population genetic methods assume that species are (at most) diploid, polyRAD was used to identify and filter loci that are diploid and tetraploid. The MCR90 and MCR50 datasets were filtered using the IteratePopStruct command to identify genotypes and then the H_ind/H_Estatistic was employed to recognize diploid loci with H_ind/H_E <0.5 and tetraploid loci with H_ind/H_E >0.75. Datasets were created for each set of loci. The number of SNPs in the diploid and tetraploid datasets does not equal the value in the initial datasets because of filtering with polyRAD.

The files can be opened with any text editor.

The conservation genetics of Iris lacustris (Dwarf Lake Iris), a Great Lakes endemic

Data files

Abstract

Description of the data and file structure

The conservation genetics of Iris lacustris (Dwarf Lake Iris), a Great Lakes endemic

Data files

Abstract

README: The conservation genetics of Iris lacustris (Dwarf Lake Iris), a Great Lakes endemic

Description of the data and file structure

Methods

Usage notes

Works referencing this dataset