Population genetics and origins of rainbow smelt (Osmerus mordax) in the Laurentian Great Lakes

Hazra, Kiran Shamir 1 ; Therrien, Christian2 ; Neff, Bryan1

Published Jul 22, 2025 on Dryad. https://doi.org/10.5061/dryad.gqnk98sz7

Data files

Jul 22, 2025 version files 8.62 KB

clades_for_lengths.csv

1.09 KB
hijop.csv

4.33 KB
props.csv

301 B
README.md
2.90 KB

Abstract

Rainbow smelt (Osmerus mordax) are a small predatory fish first recorded in the Great Lakes in 1906. Despite the major ecological and economic impacts of smelt in the Great Lakes, most information on the origins of these populations comes from second-hand accounts written decades after the first smelt were recorded. These accounts are based on circumstantial evidence and include speculation about natural migration and reproduction of smelt in Lake Ontario as well as secondary anthropogenic introductions Great Lakes. Here, we use mtDNA sequencing and RFLP to demonstrate that the single, recorded government introduction of smelt to the Great Lakes in Michigan around 1912 accounts for only about half of the ancestry of Great Lakes smelt. The remaining ancestry appears to be from an anadromous source population. Furthermore, the absence of a longitudinal cline in haplotype frequency indicates that gene flow and dispersal are high amongst smelt in the Great Lakes. Our results suggest a multiple invasion pathway within the Great Lakes and provide insights into the genetic diversity of the extant Great Lakes populations.

https://doi.org/10.5061/dryad.gqnk98sz7

Description of the data and file structure

hijop.csv contains all the haplotypes found, props.csv lists the proportions of each clade in each sampling location, and clades_for_lengths.csv lists the lengths of the fish from each clade used for the size comparison analysis.

Files and variables

File: hijop.csv

Description: Haplotypes from our study

Variables

Haplotype Name: Arbitrary designation meant to denote each haplotype with a single letter (e.g. Haplotype O was found in all sites)
Haplotype: DNA sequence of haplotype
Frequency: Count of the haplotype in all sampling locales, including reference populations in Quebec and Mass.
Specimens: List of specimens with each haplotype. Codes consist of sampling site, and then a numerical identifier (e.g. 1, 01, x01, 01-1).
- OSW = Oswego
- EB = East Basin Lake Ontario
- E40/E16 = East Basin Lake Erie
- MTN = Mtn Bay, Lake Superior
- WAWA = Wawa Lake
- WAN = Wanapitei
- NIPIGON = Nipigon
- RichelieuRiver = Richelieu River
- JDA/JDB = St. Lawrence Estuary
- BostonFore = Fore River
SNP: refers to SNP TGA(A/G)GCC, used to determine clade and verify results of RFLP. A = Atlantic, G = Acadian

File: props.csv

Description: Geographical distribution of clades

Variables

Region: Sampling site. East Basin = East Basin of Lake Ontario
Acadian : Count of Acadian clade smelt
Atlantic: Count of Atlantic clade smelt
Latitude: Latitude of sampling site
Longitude: Longitude of sampling site

File: clades_for_lengths.csv

Description: Sizes for trawl caught fish

Variables

Code: Individual identifier of smelt
Region: Sampling site: East Basin [of Lake Ontario]; Lake Erie; Oswego (Lake Ontario)
Total length (mm): Length from nose to tip of tail. "na" indicates samples for which no measurements could be taken.
Haplogroup: Atlantic or Acadian clade

Code/software

You can use Microsoft Excel (2021) or Google Sheets to view the data, however for our analyses we used pandas 2.2.3, seaborn 0.13.2 and scipy 1.15.0 in Python 3.

T-test calculations for difference in smelt lengths between clades followed the format:

import scipy.stats as stats

import seaborn as sns

sns.boxplot(x='Haplogroup', y='Total length (mm)', data=df).set(title='Ontario and Erie')

plt.show()

stats.ttest_ind(a=df["Total length (mm)"][df.Haplogroup == "Acadian"], b=df["Total length (mm)"][df.Haplogroup == "Atlantic"])

Access information

Other publicly accessible locations of the data:

Lac St. Jean mtDNA template: https://www.ncbi.nlm.nih.gov/nuccore/NC_015246.1?report=genbank

Population genetics and origins of rainbow smelt (Osmerus mordax) in the Laurentian Great Lakes

Data files

Abstract

README: Population Genetics and Origins of Rainbow Smelt (Osmerus mordax) in the Laurentian Great Lakes

Description of the data and file structure

Files and variables

File: hijop.csv

Variables

File: props.csv

Variables

File: clades_for_lengths.csv

Variables

Code/software

Access information

Methods