Data from: Origin of the Laurentian Great Lakes fish fauna through upward adaptive radiation cascade prior to the Last Glacial Maximum
Data files
Aug 19, 2024 version files 107.90 GB
Abstract
The evolutionary histories of adaptive radiations can be marked by dramatic demographic fluctuations. However, the demographic histories of ecologically-linked co-diversifying lineages remain understudied. The Laurentian Great Lakes provide a unique system of two such radiations that are dispersed across depth gradients with a predator-prey relationship. We show that the North American Coregonus species complex (“ciscoes”) radiated rapidly prior to the Last Glacial Maximum (80-90 kya), a globally warm period, followed by rapid expansion in population size. Similar patterns of demographic expansion were observed in the predator species, Lake Charr (ˆ), following a brief time lag, which we hypothesize was driven by predator-prey dynamics. Diversification of prey into deepwater created ecological opportunities for the predators, facilitating their demographic expansion, which is consistent with an upward adaptive radiation cascade. This study provides a new timeline and environmental context for the origin of the Laurentian Great Lakes fish fauna and firmly establishes this system as a driver of ecological diversification and rapid speciation through cyclical glaciation.
README: Data from: Origin of the Laurentian Great Lakes fish fauna through upward adaptive radiation cascade prior to the Last Glacial Maximum
Access this dataset on Dryad https://doi.org/10.5061/dryad.n02v6wx59
Description of data generation
For population genomic analyses, genome resequencing data for individuals of each species of Coregonus were mapped to the unmasked Coregonus artedi reference genome assembly, and SNPs were called with VCF tools. An alignment was created for phylogenetic inference. Historical demography analyses (PSMC and SMC++) were created by mapping resequencing data to a repeat masked version of the reference before calling SNPs. There are additional files for testing the various mutation rates explored in the manuscript. The same methodology was applied to the Salvelinus namaycush samples with the addition of a PSMC analysis where we tested the sensitivity of different generation times.
Description of the data and file structure
Coregonus_PCA_biallelic_no_missing.maf0.05.noSingletons.min4.phy
- phylip formatted alignment containing Coregonus genome-wide SNPs for IQTree2 phylogenetic inference
- used by the "run_IQTree2.sh" script
Coregonus_PCA_biallelic_no_missing.maf0.05.vcf
- VCF format file containing Coregonus genome-wide SNPs for PCA
- used by the "PCA_maf0.05.R"
CA_combined_biallelic.vcf.gz, CH_combined_biallelic.vcf.gz, CK_combined_biallelic.vcf.gz, CN_combined_biallelic.vcf.gz, SN-lean_combined_biallelic.vcf.gz, SN-siscowet_combined_biallelic.vcf.gz
- variant call files used for SMC++, which are grouped by species
CA01.masked.vcf.gz, CA02.masked.vcf.gz, CA03.masked.vcf.gz, CA04.masked.vcf.gz, CH01.masked.vcf.gz, CH02.masked.vcf.gz, CH03.masked.vcf.gz, CH04.masked.vcf.gz, CK01.masked.vcf.gz, CK02.masked.vcf.gz, CK03.masked.vcf.gz, CK04.masked.vcf.gz, CN01.masked.vcf.gz, CN02.masked.vcf.gz, LS01.masked.vcf.gz, LS02.masked.vcf.gz, LS05.masked.vcf.gz, LS07.masked.vcf.gz, LS08.masked.vcf.gz, LS09.masked.vcf.gz
- variant call files used for PSMC
- each file represents an individual sample
psmc_masked_bs_all.xlsx
- PSMC results for Coregonus spp. and Salvelinus namaycush samples in an excel spreadsheet. Each tab represents a result of a PSMC run.
- The tabs marked "main" are the results of the whole genome. The tabs marked "bs" are the concatenated bootstrap replicates.
psmc_Coregonus_masked_mut_rate_bs_all.xlsx
- PSMC results for Coregonus spp. at different mutation rates. Each tab represents a result of a PSMC run.
- The tabs marked "[sample]_main" are the results of the whole genome. The tabs marked "[sample]_bs" are the concatenated bootstrap replicates.
- The "low" mutation rate is 2.5e-09 and the "high" rate is 8.23e-09 mutations per site per generation.
psmc_Salvelinus_masked_generation_time_bs_all.xlsx
- PSMC results for Salvelinus namaycush sample LS01 for generation times 6-20 at increments of every two years.
- The tabs marked "main" are the results of the whole genome. The tabs marked "bs" are the concatenated bootstrap replicates.
smc++_masked_bs_all.xlsx
- SMC++ results for Coregonus spp. and Salvelinus namaycush samples in an excel spreadsheet. Each tab represents a result of a SMC++ run.
- Tabs marked "[sample]7 26 e-9t10-10000 gen" are the results of the whole genome. Tabs marked "*[sample]_bs_all_gen" are the concatenated bootstrap replicates.
smc++_masked_mut_rate_all.xlsx
- SMC++ results for Coregonus species at 2.5e-09 and 8.23e-09 mutations per site per generation (6).
Sharing/Access information
Sequence and assembly data can be found at the following sources:
Coregonus genome assembly and resequencing data.
- NCBI BioProject: PRJNA1062807
Salvelinus namaycush resequencing data.
- NCBI BioProject: PRJNA1077361
Code/Software
Code available at https://github.com/KrabbenhoftLab/Coregonus_demography.git