Data from: Genetic identification of lamprey genera and anadromous ecotypes in watersheds of the Northeastern Pacific Ocean
Data files
May 22, 2025 version files 652.03 KB
-
Dryad_Repository_EVA-2025-010-OA_R1Data2.xlsx
644.15 KB
-
README.md
7.88 KB
Abstract
Non-parasitic, non-migratory Western Brook Lamprey (WBL; Lampetra ayresii) and parasitic, anadromous Western River Lamprey (WRL; L. ayresii) are sympatric lampreys that likely represent different life history variations of a single species. Novel genetic tools are critical for differentiating WBL and WRL, whose larvae preclude morphological identification (ID) and will enable comprehensive assessment of imperiled native lampreys of the Northeastern Pacific (including WBL, WRL, and Pacific Lamprey, Entosphenus tridentatus). We developed 47 candidate single nucleotide polymorphism (SNP) markers using whole genome resequencing of WBL (N=24) and WRL (N=15) from Ksi Ts’oohl Ts’ap Creek (Nass River, British Columbia, Canada) which are likely ecotypes distinguished by few divergent SNPs across multiple chromosomes. We used five novel candidate SNPs to perform genetic ID of WBL and WRL ecotypes in collections of mixed native lampreys from lower Columbia River tributaries (N=1,474), Ksi Ts’oohl Ts’ap Creek (N=352), and ocean phase WRL from the Georgia Basin (Salish Sea, British Columbia, Canada; N=91). Two previously published SNPs were used to ID genera, Entosphenus versus Lampetra. Morphological ID utilized photographs collected from a subset of genotyped lampreys, and high concordance was demonstrated between ID methods for genera (99%) and Lampetra ecotypes (>98%). We characterized spatial and temporal composition of lamprey genera and ecotypes surveyed across NE Pacific tributaries under the expectation these compositions would be similar across nearby sites and across years at the same site. Proportions of lamprey genera were highly variable within regions and across years; however, Lampetra ecotypic proportions were spatially and temporally stable. WRL were rare in lower Columbia tributaries (~1% average rate among Lampetra) and common further north (>40% of Lampetra). Genetic ID methods are powerful monitoring tools that create the novel ability to ascertain genera and ecotypes regardless of life stage, while increasing efficiency of surveys by eliminating time-intensive morphological data collection.
Dataset DOI: 10.5061/dryad.b2rbnzss7
Description of the data and file structure
This dataset accompanies the article with the same title with the aim to develop genetic ID applications for lampreys. Sample collections were comprised of the following overlapping subsets of samples (a – e) and used in various combinations to address our objectives: a) morphologically-confirmed ocean-phase Western River Lamprey (Lampetra ayresii; WRL) collected at sea by Fisheries and Oceans Canada between 2013 – 2019 from “site 1” in the Georgia Basin (Salish Sea, British Columbia, Canada; N=91), which served as a positive control collection of the anadromous ecotype WRL for testing purposes (Table 1, Fig. 1), b) WBL and WRL voucher specimen (N = 39) that were a portion of the confident morphologically identified voucher specimen from Ksi Ts’oohl Ts’ap Creek (Nass River Northern B.C., Canada) that could be used for candidate SNP discovery, c) broadly distributed specimen with both morphological photo documentation and genetic data for testing genus ID concordance (N = 1170), d) broadly distributed specimen with both morphological photo documentation and genetic data for testing Lampetra ecotype ID concordance (N = 514), and e) genotyped individuals lacking morphological ID (N = 747) that were combined with all other genotyped individuals for characterizing the spatial distributions of genera and ecotypes across the entire geographic area of our study (Table 1). The grand total of unique individuals (N = 1917, Table 1) was constructed into 24 collections that represented 15 unique geographical sites and multiple collection years (Table 1).
Files and variables
File: Dryad_Repository_EVA-2025-010-OA_R1Data.xlsx: Sampling data tab
Variables
- Order- individual order #
- Individual name- unique databse ID
- Collection Group- standard group naming convention (collection agency, collection method, collection subbasin, collection site and year, collection Rkm for stream sites or latitude interval
- Location- Site description and Latitude Longitude coordinates in decimal degrees
- Collection Year- Date or year only
- Life Stage- (Adult, juvenile, or larva)
- Total length- body length in mm
- LampSD- the allele 1 and allele 2 for the SNPs for genus ID
- Genetic ID Genus- The determination of genus (Lampetra or Entosphenus) based on genotypes
- Region- name of the collection region
- Region#- collection regions numbered 1 - 7
- Site- sites numbered 1 - 15
- Collection#- collections (site + collection year) numbered 1 -24
- Morph ID Genus- the morphological ID of lamprey genera (Lampetra or Entosphenus)
- Morph ID SPP- the morphological ID of lamprey species/ecotypes (WRL, WBL, Lampetra spp, Entosphenus tridentatus)
- a, b, c, d, e - the type of collection described in Table 1 based on the objective for its collection. The letters indicate which individuals were in each of the following overlapping subsamples (a – e) used to address multiple objectives: a) WRL positive control, b) voucher specimen from Ksi Ts’oohl Ts’ap Creek used for candidate SNP discovery, c) Specimen with both morphological photo documentation and genetic data for testing genus ID concordance, d) Specimen with both morphological photo documentation and genetic data for testing Lampetra ecotype ID concordance, and e) individuals with only genetic data.
- %_WRL - the percent of WRL alleles among the 5-SNP candidate assay
- Genetic ID Ecotype - classification of ecotypes (WRL or WBL) based on the Genetic ID "%_WRL"
- Missing - number of loci (total 5) missing from the 5-SNP candidate assay
- N_alleles - number of alleles successfully genotyped of the 5-SNP candidate assay (total 10)
- Missing Chr2 - the number of loci (total 2) missing from chromosome 2 loci in the 5-SNP candidate assay
Missing data are indicated as "NA"
File: Dryad_Repository_EVA-2025-010-OA_R1Data.xlsx**:** SNP_genotypes tab
Variables
- Order- individual order # (same as sampling_data tab)
- Region#- Region numbers (same as sampling_data tab)
- Site- Site numbers (same as sampling_data tab)
- Collection#- Collection numbers (same as sampling_data tab)
- Genetic ID Genus - (data same as sampling_data tab)
- Morph ID Genus - (data same as sampling_data tab)
- Morph ID SPP - (data same as sampling_data tab)
- Individual Name - (Specimen IDs same as sampling_data tab)
- Lri2P16333005 - Genotype for this Chromosome 2 SNP locus that is part of the 5-SNP candidate assay; blue genotypes indicate homozygous for WRL alleles; gray genotypes are heterozygous; red genotypes are homozygous for WBL alleles)
- Lri2P16367064 - Genotype for this Chromosome 2 SNP locus that is part of the 5-SNP candidate assay; blue genotypes indicate homozygous for WRL alleles; gray genotypes are heterozygous; red genotypes are homozygous for WBL alleles)
- Lri5P4640927 - Genotype for this Chromosome 5 SNP locus that is part of the 5-SNP candidate assay; blue genotypes indicate homozygous for WRL alleles; gray genotypes are heterozygous; red genotypes are homozygous for WBL alleles)
- Lri12P6060911 - Genotype for this Chromosome 12 SNP locus that is part of the 5-SNP candidate assay; blue genotypes indicate homozygous for WRL alleles; gray genotypes are heterozygous; red genotypes are homozygous for WBL alleles)
- Lri14P4079387 - Genotype for this Chromosome 14 SNP locus that is part of the 5-SNP candidate assay; blue genotypes indicate homozygous for WRL alleles; gray genotypes are heterozygous; red genotypes are homozygous for WBL alleles)
Missing data are indicated as "NA"
File: Dryad_Repository_EVA-2025-010-OA_R1Data.xlsx: SNPListlowcoverage_genotypes tab
Variables
- Order- locus order # (as ordered in SNPlowcoverage_genotypes tab)
- Locus- locus name
- Scaffold- Scaffold number or Chromosome # which are used synonymously
- position- bp position on the reference genome
- GenomeOrder- Order of loci based on scaffold and position
- RetainFILTER- Y if retained, N if excluded from the 349 SNP filtered dataset (see methods in article)
- Distance-value- the distance in bp that a particular SNP is located away from the neighboring SNP on the same scaffold. One SNP from each pair of SNPs within 1000bp from each other was excluded in the filtered dataset which resulted in trimming to 349 total SNP loci.
File: Dryad_Repository_EVA-2025-010-OA_R1Data.xlsx: SNPlowcoverage_genotypes tab
Variables
- MorphID- whether the specimen was morphologically classified as WRL or WBL ecotype
- Individual.Name- specimen database ID
- Order- same order as the specimen in Table 2 in article
- LPT_scaf_8_3352124.A1 - allele 1 for this locus. All loci are listed in columns from the SNPlist for this low coverage whole genome resequencing dataset
File: Dryad_Repository_EVA-2025-010-OA_R1Data.xlsx: SNP_primers tab
Variables
- MarkerType- Species ID, Genus ID, or one of the 47 candidate Ecotype ID SNPs (Table S3)
- Assay- Name of the locus
- A1- letter code for allele 1
- A2- letter code for allele 2
- A1-Probe- the probe allele 1 sequence for genotyping-by-sequencing pipeline
- A2-Probe- the probe allele 2 sequence for genotyping-by-sequencing pipeline
- Forward primer sequence
- reverse primer sequence
Access information
Other publicly accessible locations of the data:
- Lampetra richardsoni (recently synonymized as Lampetra ayresii; this particular specimen is designated as Western Brook Lamprey ecotype) whole genome sequence: GenBank Accession Number: JARYGF000000000, JASBGX000000000
- WBL and WRL whole genome resequencing data: GenBank Accession Number PRJNA953978
