Data from: Genetic and evolutionary divergence of harbor seals (Phoca vitulina) in Iliamna Lake, Alaska
Data files
Jul 27, 2024 version files 74.79 KB
-
Iliamna_Lake_seals_ms_-_Supplemetary_data__1.docx
-
Iliamna_Lake_seals_ms_-_Supplemetary_data__2.docx
-
README.md
Abstract
Freshwater populations of typically marine species present unique opportunities to investigate biodiversity, evolutionary divergence, and the adaptive potential and niche width of species. A few pinniped species have populations that reside solely in freshwater. The harbor seals inhabiting Iliamna Lake, Alaska constitute one such population. Their remoteness, however, has long hindered scientific inquiry. We used DNA from seal scat and from tissue samples provided by Indigenous hunters to screen for mitochondrial DNA and microsatellite variation within Iliamna Lake and eight regions across the Pacific Ocean. The Iliamna seals: (1) were substantially and significantly discrete from all other populations ( Fst-mtDNA= 0.544, Φst-mtDNA = 0.541, Fst-microsatellites = 0.308), (2) formed a discrete genetic cluster separate from all marine populations (modal ∆k=2, PC1=14.8%), and had (3) less genetic diversity (Hd, π, Hexp) and (4) higher inbreeding (F) than marine populations. These findings are both striking and unexpected revealing that Iliamna seals have likely been on a separate evolutionary trajectory for some time and may represent a unique evolutionary legacy for the species. Attention must now be given to the selective processes driving evolutionary divergence from harbor seals in marine habitats, and to ensuring the future of the Iliamna seal.
README: Genetic and evolutionary divergence of harbor seals (Phoca vitulina) in Iliamna Lake, Alaska
https://doi.org/10.5061/dryad.6hdr7sr8f
There are two files containing the raw genetic data, one for mitochondrial DNA and one for microsatellites, on individual harbor seals sampled at 9 locations or regions across the Pacific Ocean. The authors state that work is continuing on the samples and we ask that anyone interested in using the data for reports or publications contact the corresponding author who will provide updates on continuing work.
Description of the data and file structure
Each file contains a title of the associated manuscript, an author list and a request to contact the authors prior to using the data in any future reports, publications, etc. The first data file in a Word file that provides the mtDNA haplotype for each individual seal from each of the 9 locations/regions. Each individual has a sample ID number, it's respective haplotype and whether there is associated microsatellite data, which is presented in a second, Word file. That file provides the genotypes in a stacked format and missing data is represented by "-9".
Sharing/Access information
Data was derived from the following sources: Primary data was generated de novo from tissue and scat samples from harbor seals. Contact corresponding author regarding data sharing.
Methods
Samples were collected from six locations between 1996 and 2017 (Figure 1). DNA was extracted from 222 ILS scat and 13 tissue samples using QIAamp® Fast DNA Stool Mini Kit and DNeasy® Blood and Tissue purification kit (Qiagen), respectively. A 435bp fragment of the mitochondrial control region and adjacent proline tRNA gene (mtDNA) was amplified and both forward and reverse strands were sequenced on a Genetic Analyzer 3130 (Applied Biosystems) according to published methods [13]. Twelve independent microsatellite loci, originally developed on pinnipeds and previously optimized for P. vitulina [14,15], were multiplexed and screened for genotypic variation: Hg3.6, Hg4.2, Hg6.1, Hg6.3, Hg8.9, Hg8.10, BG, SGPv9, SGPv11, Pvc19, Pvc78, and Lc28 [14,16,17,18,19] (Table S1). This 12-locus panel was successfully used to identify duplicate samples. The quality and quantity of DNA recovered from scat was much lower than from tissue samples, often resulting in poorer quality allele amplification and sequences. A rigorous QA/QC protocol was developed to ensure only high-quality genetic profiles were used in subsequent analyses. It comprised: (1) high peak to noise requirements for allele and haplotype calls, and (2) the same genotype scores from multiple replicate PCRs and screenings of the same individual.
Additionally, 205 samples chosen from eight regions across the Pacific, (based on earlier studies [13,20,21]), were sequenced for mtDNA and genotyped for 10 of the 12 microsatellite loci. The locations: (1) Hokkaido, Japan, (2) the Commander Islands, Russia, (3) the Pribilof Islands, Alaska, (4) Bristol Bay, Alaska, (5) Kodiak, Alaska, (6) Prince William Sound, Alaska, (7) Southeast Alaska, and (8) Monterey Bay, California (Figure 1A), span the ranges of the two currently recognized Pacific subspecies. The boundary between the western P. v. stejnegeri and eastern P. v. richardii is presumed to occur somewhere along the Commander-Aleutian ridge (i.e., between locations 2 and 3, Figure 1A), however, no clear boundary has been established [20,21].
Minimum spanning networks of mtDNA haplotypes were generated using PopART [22] to reconstruct the phylogenetic relationships among haplotypes and their geographic distribution. Arlequin v. 3.5 [23] was used to estimate patterns of genetic diversity within geographic strata including frequency-based mtDNA haplotypic diversity (Hd), nDNA expected heterozygosity (Hexp), and diversity indices that include information on the distance (i.e., number of mutations) among variants: mtDNA nucleotide diversity (π) and nDNA mean square distance (msd). Patterns of genetic differentiation among strata were estimated including frequency-based mtDNA and nDNA heterogeneity (Fst) and their distance-based equivalents (Φst and Rst, respectively). Homogeneity tests (comprising 50,000 iterations) were conducted to assess statistical significance in patterns of heterogeneity. Mega v. 6 [24] was used to determine pairwise differences among mtDNA haplotypes while Coancestry v. 1.01 [25] was used to calculate the triadic likelihood estimate (TrioML) of individual inbreeding coefficients (F) from the nDNA data. The Bayesian model-based clustering algorithm, Structure v. 2.3.4 [26], was used to infer the likely number of populations (K). Various parameter settings were evaluated, including whether the use of sampling location as a prior or admixture among population clusters was allowed or not, and each setting was run five times for each value of K to ensure convergence. For each parameter set, a burn-in period of 100,000 iterations, followed by 2 x 106 iterations, was used to collect data, and the modal value of the ad hoc statistic ∆K [27] determined the likely number of populations, K. Principal Component Analysis (PCA) on the genotypes of individuals was performed using the R package adegenet. Statistical outliers within the data were assessed relative to the upper and lower bound values. Swim distances among locations were meaured as minimal distances across continental shelf waters and along the main course of the Kvichak River in Google Maps.