Population genomic and morphological datasets from: An evolutionary mosaic challenges traditional monitoring of a foundation species in a coastal environment - the Baltic Fucus vesiculosus
Data files
Feb 20, 2025 version files 394.81 MB
-
Fucus_Morphology.xlsx
46.38 KB
-
Fucus_vesiculosus_SNP_data.vcf.zip
394.76 MB
-
README.md
5.01 KB
Abstract
Genetic diversity in foundation species is critical for ecosystem functions and resilience under environmental change but remains largely overlooked in environmental monitoring. In the Baltic Sea, a key species for monitoring is the brown seaweed Fucus vesiculosus, which forms sublittoral 3D habitats providing shelter and food for fish and invertebrates. Ecological distribution models predict a significant loss of Baltic F. vesiculosus due to ocean warming unless populations can adapt. Genetic variation and recombination during sexual reproduction are essential for adaptation, but studies have revealed large-scale clonal reproduction within the Baltic Sea. We analyzed genome-wide SNP data from the east Atlantic, the "Transition zone", and the Baltic Sea, and found a mosaic of divergent lineages in the Baltic Sea contrasting an outside dominance of a few genetic groups. We determined that the previously described endemic species Fucus radicans is predominantly a large female clone of F. vesiculosus in its northern Baltic distribution. In two Estonian sites, however, individuals are sexual but reproductively isolated from Baltic F. vesiculosus, revealing a separate diverged lineage that predates the formation of the Baltic Sea. Monitoring Baltic Fucus without considering this genetic complexity will fail to prioritise populations with adaptive potential to new climate conditions. From our genomic data, we can extract informative and diagnostic genetic markers that differentiate major genetic entities. Such an SNP panel will provide a straightforward tool for spatial and temporal monitoring and informing management decisions and actions.
https://doi.org/10.5061/dryad.q83bk3jsh
Description of the data and file structure
We have submitted the resulting Fucus_vesiculosus_SNP_data.vcf (Variant Call Format) file from the reference-based variant calling and filtering performed using the pipeline modified from Mikhail Matz available at https://github.com/z0on/2bRAD_GATK. Reads were mapped to a F. vesiculosus draft genome assembly previously used for population genomic studies (Kinnby et al., 2020; Pereyra et al., 2023; NCBI Bioproject No. PRJNA629489). This file is the primary population genomic dataset used for all the genetic analysis described in our article, “An evolutionary mosaic challenges traditional monitoring of a foundation species in a coastal environment - the Baltic Fucus vesiculosus”. Here we investigated the genetic and morphological structure of the brown seaweed populations of the Baltic Sea, Kattegat and Skagerrak, while also assessing the level of clonality of the considered populations and the evolutionary history underlying these populations. In addition, we have included the Fucus_morphology.csv file containing the morphological measurements of putative diagnostic traits often used to distinguish Fucus species.
Files and variables
File: Fucus_vesiculosus_SNP_data.vcf.zip
Description: This Variant Call Format (VCF) file contains Single Nucleotide Polymorphism (SNP) data from 982 individuals across 55 sampling localities. It comprises 97913 SNPs after thinning the dataset with one SNP per tag and before filtering for missing data, monomorphic loci or outlier individuals due to biases. Each loci had an average coverage of 17.2x per individual (median=16.3).
File: Fucus_Morphology.csv
Description: This file contains the traits measured to characterise each individual morphologically. The file includes data from 318 individuals from seven localities in the Gulf of Bothnia, Baltic Sea and six Estonian sites. These localities were selected covering areas where F. radicans has previously been found or could potentially be present. Each individual was characterised using the average of three measurements, taken from different branches with the image analysis software ImageJ64 (Schneider et al., 2012). The traits measured were frond width (FW), distance between dichotomies (DBD) and undulation index (UX).
Variables
- Sample code: Individual sample identification code
- POP: Sample locality code (see Supplemental information file)
- Species: Fucus vesiculosus, F. radicans or undescribed FucusX
- Clone/Unique: If genotype is unique (sexually derived) or clone genotype (genotype copy derived from asexual reproduction)
- FW1: Frond width measurement 1
- FW2: Frond width measurement 2
- FW3: Frond width measurement 3
- FW: Frond width Average from the 3 measurements
- DBD1: Distance between dichotomies measurement 1
- DBD2: Distance between dichotomies measurement 2
- DBD3: Distance between dichotomies measurement 3
- DBD:Distance between dichotomies Average from the 3 measurements
- Undulation1: Distance of the thallus margin along undulated branch measurement 1
- Undulation2: Distance of the thallus margin along undulated branch measurement 2
- Undulation3: Distance of the thallus margin along undulated branch measurement 3
- Straight line (cm): Measurement of straight line along 3 cm the same branch to calculate ratio between Undulation1, Undulation2 and Undulation3
- UX: Undulation index, the ratio between Undulation measurement and Straight line.
- Receptacles: Number of reproductive conceptacles, if present.
Code/software
The VCF file can be opened and analyzed using various bioinformatics tools and software, some of the most common include:
- IGV (Integrative Genomics Viewer): Useful for visualizing genomic data.
- BCFtools: Command-line tool for working with VCF/BCF files, including filtering, viewing, and converting.
- GATK (Genome Analysis Toolkit): Provides tools to analyze high-throughput sequencing data with a focus on variant discovery.
- vcftools: Command-line program designed to work specifically with VCF files to perform various types of analyses.
The spreadsheets are commonly opened with Excel from Microsoft or Numbers from Apple.
Access information
Other publicly accessible locations of the data:
-
Sequence data are available from NCBI SRA under BioProject accession PRJNA629489
-
Scripts from the bioinformatic analysis are available at GitHub (https://github.com/crustaceana/Population-genomics-for-Clonal-organisms).
Population genomic data was obtained through individually barcoded 2b-RAD libraries that were constructed following the procedures in Pereyra et al. (2023). The libraries were sequenced using a Novaseq6000 Illumina platform at SciLifeLab, Uppsala. A total of 962 individuals from 55 different sites were sequenced. Removal of PCR duplicates, quality filtering, and variant calling was performed in the computer cluster Albiorix (M. Töpel, IVL, Sweden), following a reference-based pipeline modified from Mikhail Matz available at https://github.com/z0on/2bRAD_GATK. Reads were mapped to an F. vesiculosus draft genome assembly previously used for population genomic studies (Kinnby et al., 2020; Pereyra et al., 2023; NCBI Bioproject No. PRJNA629489). Variant calling calibration was carried out with technical replicates labeled with the Individual name followed by the suffix "Rep". These replicates are included in the dataset.
Morphological data comprises three morphometric characters: Frond width (FW), the distance between dichotomies (DBD), and the undulation index (UX). The first two characters have been used as a diagnostic to differentiate F. radicans from F. vesiculosus (Bergström et al., 2005; Pereyra et al., 2013). Each individual was characterized using the average of three measurements taken from different branches with the image analysis software ImageJ64 (Schneider et al., 2012).
The data includes a list of Sample ID codes and corresponding localities and geographic coordinates.