Population genetics and origins of rainbow smelt (Osmerus mordax) in the Laurentian Great Lakes
Data files
Jul 22, 2025 version files 8.62 KB
-
clades_for_lengths.csv
1.09 KB
-
hijop.csv
4.33 KB
-
props.csv
301 B
-
README.md
2.90 KB
Abstract
Rainbow smelt (Osmerus mordax) are a small predatory fish first recorded in the Great Lakes in 1906. Despite the major ecological and economic impacts of smelt in the Great Lakes, most information on the origins of these populations comes from second-hand accounts written decades after the first smelt were recorded. These accounts are based on circumstantial evidence and include speculation about natural migration and reproduction of smelt in Lake Ontario as well as secondary anthropogenic introductions Great Lakes. Here, we use mtDNA sequencing and RFLP to demonstrate that the single, recorded government introduction of smelt to the Great Lakes in Michigan around 1912 accounts for only about half of the ancestry of Great Lakes smelt. The remaining ancestry appears to be from an anadromous source population. Furthermore, the absence of a longitudinal cline in haplotype frequency indicates that gene flow and dispersal are high amongst smelt in the Great Lakes. Our results suggest a multiple invasion pathway within the Great Lakes and provide insights into the genetic diversity of the extant Great Lakes populations.
https://doi.org/10.5061/dryad.gqnk98sz7
Description of the data and file structure
hijop.csv contains all the haplotypes found, props.csv lists the proportions of each clade in each sampling location, and clades_for_lengths.csv lists the lengths of the fish from each clade used for the size comparison analysis.
Files and variables
File: hijop.csv
Description: Haplotypes from our study
Variables
- Haplotype Name: Arbitrary designation meant to denote each haplotype with a single letter (e.g. Haplotype O was found in all sites)
- Haplotype: DNA sequence of haplotype
- Frequency: Count of the haplotype in all sampling locales, including reference populations in Quebec and Mass.
- Specimens: List of specimens with each haplotype. Codes consist of sampling site, and then a numerical identifier (e.g. 1, 01, x01, 01-1).
- OSW = Oswego
- EB = East Basin Lake Ontario
- E40/E16 = East Basin Lake Erie
- MTN = Mtn Bay, Lake Superior
- WAWA = Wawa Lake
- WAN = Wanapitei
- NIPIGON = Nipigon
- RichelieuRiver = Richelieu River
- JDA/JDB = St. Lawrence Estuary
- BostonFore = Fore River
- SNP: refers to SNP TGA(A/G)GCC, used to determine clade and verify results of RFLP. A = Atlantic, G = Acadian
File: props.csv
Description: Geographical distribution of clades
Variables
- Region: Sampling site. East Basin = East Basin of Lake Ontario
- Acadian : Count of Acadian clade smelt
- Atlantic: Count of Atlantic clade smelt
- Latitude: Latitude of sampling site
- Longitude: Longitude of sampling site
File: clades_for_lengths.csv
Description: Sizes for trawl caught fish
Variables
- Code: Individual identifier of smelt
- Region: Sampling site: East Basin [of Lake Ontario]; Lake Erie; Oswego (Lake Ontario)
- Total length (mm): Length from nose to tip of tail. "na" indicates samples for which no measurements could be taken.
- Haplogroup: Atlantic or Acadian clade
Code/software
You can use Microsoft Excel (2021) or Google Sheets to view the data, however for our analyses we used pandas 2.2.3, seaborn 0.13.2 and scipy 1.15.0 in Python 3.
T-test calculations for difference in smelt lengths between clades followed the format:
import scipy.stats as stats
import seaborn as sns
sns.boxplot(x='Haplogroup', y='Total length (mm)', data=df).set(title='Ontario and Erie')
plt.show()
stats.ttest_ind(a=df["Total length (mm)"][df.Haplogroup == "Acadian"], b=df["Total length (mm)"][df.Haplogroup == "Atlantic"])
Access information
Other publicly accessible locations of the data:
- Lac St. Jean mtDNA template: https://www.ncbi.nlm.nih.gov/nuccore/NC_015246.1?report=genbank
All of the clade data were collected using RFLP. As outlined in the manuscript, some of the analyses were followed up with sequencing, and this data is also included.
Rainbow smelt were collected from Lake Ontario, Lake Erie, Jesse Lake, Wawa Lake, Wanapitei Lake, and Lake Nipigon. Coordinates are given in the haplogroup table.
Laboratory analysis
The total length of each thawed trawl-caught fish was recorded in millimetres, nose-to-tail. DNA extraction was performed using Proteinase K (Thermo Fisher, Waltham, USA) digestion followed by successive phenol and chloroform-isoamyl alcohol extraction steps, or by the DNeasy Blood and Tissue Kit (QIAGEN, Hilden, Germany). PCR amplification of mitochondrial NADH subunit 6 (ND6) was performed following the protocol of Pigeon et al. (1998). Primers used were ND56R250 (ACTGGTCGTGTTTGTATAC) and ND56R450V (GGACTACAAACAAAGTCAATAAG). This resulted in the amplification of a 288 bp amplicon, which was then incubated for 3 hours at 37°C in Tango buffer (Thermo Fisher, Waltham, USA) with the restriction enzyme DDeI (Thermo Fisher, Waltham, USA), according to the manufacturer’s usage directions.
For a subset of the smelt, Sanger sequencing was performed on the mitochondrial NADH subunit 5 (ND5) gene. A 911 bp fragment beginning at position 965 of NCBI Accession AF034752.1.1 was amplified using 21-base primers designed for this study, FWD_911 (TTGGCCTCAATCAACCTCAGC) and REV_911 (GCTAACTCGGGGGTTAAGTCG). Our thermocycler program was a modified version of Pigeon et al.’s (1998) protocol with a 1-minute extension time. 835 bp of consensus sequence was retained from these reads.