The conservation genetics of Iris lacustris (Dwarf Lake Iris), a Great Lakes endemic
Data files
Feb 25, 2024 version files 1.18 GB
-
Iris.MCR50.snps_2N.vcf
121.97 MB
-
Iris.MCR50.snps_4N.vcf
200.04 MB
-
Iris.MCR50.snps.vcf
830.44 MB
-
Iris.MCR90.snps_2N.vcf
5.44 MB
-
Iris.MCR90.snps_4N.vcf
3.57 MB
-
Iris.MCR90.snps.vcf
13.75 MB
-
README.md
1.01 KB
Abstract
Iris lacustris, a northern Great Lakes endemic, is a rare species known from 165 occurrences across Lake Michigan and Huron in the United States and Canada. Due to multiple factors, including habitat loss, lack of seed dispersal, patterns of reproduction, and forest succession, the species is threatened. Early population genetic studies using isozymes and allozymes recovered no to limited genetic variation within the species. To better explore genetic variation across the geographic range of I. lacustris and to identify units for conservation, we used tunable Genotyping-by-Sequencing (tGBS) with 171 individuals across 24 populations from Michigan and Wisconsin, and because the species is polyploid, we filtered the single nucleotide polymorphism (SNP) matrices using polyRAD to recognize diploid and tetraploid loci. Based on multiple population genetic approaches, we resolved three to four population clusters that are geographically structured across the two ranges of the species. The species migrated from west to east across its geographic range, and minimal genetic exchange has occurred among populations. Four units for conservation are recognized, but nine adaptive units were identified, providing evidence for local adaptation across the geographic range of the species. Population genetic analyses with all, diploid, and tetraploid loci recovered similar results, which suggests that methods may be robust to variation in ploidy level.
The datasets were used for population genomic analyses of the rare species, Iris lacustris. Each VCF file includes 169 or 171 individuals from 24 populations across Michigan and Wisconsin. The number of individuals per population ranges from one to twelve, depending on the suitability of the population for collection. Given that the species is rare, geographic coordinates are not included as part of the dataset.
Description of the data and file structure
Six datasets are included. MCR90 has up to 10% missing data, and MCR50 has up to 50% missing data. From these two datasets, one was filtered to include on diploid loci and another was filtered to include only tetraploid loci (denoted by 2N and 4N, respectively). Locus ploidy was determined by the Hind/He statistic of Clark et al. (2012, 2022), as implemented in polyRAD, with diploid loci having Hind/HE <0.5 and tetraploid loci having Hind/HE >0.75
Using the restriction enzyme Bsp1286I, paired-end tGBS libraries were created and subsequently sequenced with an Illumina HiSeq X (Illumina Inc., San Diego, CA, USA). Based on all sequence data, consensus reference sequences were generated with CD-HIT-454 after sequencing depth was normalized to 50X, and sequencing errors were corrected using Fiona. Low-quality reads were discarded (PHRED quality <15 and error rates ≥3%) and trimmed, and GSNAP was employed to map reads to the reference sequences based on the following parameters: ≤2 mismatches per 36 bp and less than five total per 75 bp for tails. SNPs were identified based on the following criteria: two most common alleles supported by at least 30% of the aligned bases, at least five unique reads, the sum of the one or two most common alleles covering at least 80% of the aligned reads, and no polymorphisms in the first or last three base pairs of each read. From the SNPs, two datasets were created: MCR90 with up to 10% missing data, and MCR50 with up to 50% missing data.
Because I. lacustris is a putative polyploid and many population genetic methods assume that species are (at most) diploid, polyRAD was used to identify and filter loci that are diploid and tetraploid. The MCR90 and MCR50 datasets were filtered using the IteratePopStruct command to identify genotypes and then the Hind/HEstatistic was employed to recognize diploid loci with Hind/HE <0.5 and tetraploid loci with Hind/HE >0.75. Datasets were created for each set of loci. The number of SNPs in the diploid and tetraploid datasets does not equal the value in the initial datasets because of filtering with polyRAD.
The files can be opened with any text editor.