Skip to main content
Dryad

Shared and distinct patterns of genetic structure in two sympatric large decapods

Cite this dataset

Ellis, Charlie D. et al. (2024). Shared and distinct patterns of genetic structure in two sympatric large decapods [Dataset]. Dryad. https://doi.org/10.5061/dryad.zgmsbccdz

Abstract

Aim: Comparing genetic structure in species with shared spatial ranges and ecological niches can help identify how dissimilar aspects of biology can shape differences in population connectivity. Similarly, where species are widely distributed across heterogeneous environments and major topographic barriers, knowledge of the structuring of populations can help reveal the impacts of factors that limit dispersal and/or drive divergence, aiding conservation management.

Location: European seas of the northeast Atlantic and Mediterranean.

Taxa: European clawed lobster (Homarus gammarus) and European crawfish (Palinurus elephas), two sympatric, heavily-fished decapods with extensive dispersal potential.

Methods: By RAD-sequencing 214 H. gammarus from 32 locations, and 349 P. elephas from 15 locations, we isolated 6,340 and 7,681 SNP loci, respectively. Using these data to characterise contemporary population structuring, we investigate potential spatial and environmental drivers of genomic heterogeneity.

Results: We found higher levels of differentiation among clawed lobsters than crawfish, both globally and within basins, and demonstrate where known hydrographic and topographic barriers generate shared patterns of divergence, such as a genetic break between the Atlantic and Mediterranean basins. Genetic structure not common to both species is principally apparent in the Atlantic portions of their range, where clawed lobster exhibits a genetic cline and increased differentiation towards range margins, while crawfish appear effectively panmictic throughout this region.

Main Conclusions: We attribute the comparative lack of crawfish population structuring to their greater dispersal tendencies via a longer pelagic larval duration and sporadic adult movements. In contrast, genetic connectivity in clawed lobster is relatively restricted, with the correlation of site of origin and temperature to geographic heterogeneity at many divergent loci indicative of both neutral and adaptive processes. Our results help inform how contemporary management can account for likely demographic connectivity and marry the conservation of genomic variation with sustainable fisheries in these ecologically and economically important crustaceans.

README: Shared and distinct patterns of genetic structure in two sympatric large decapods

#START

This README file was generated by Charlie Ellis on 19-03-2023, and updated on 24/01/2024

Files and materials herein should ensure the analytical pipeline undertaken for the relevant study is fully reproducible (or adaptable to your own datasets of the same format).

GENERAL INFORMATION

  1. Citation of linked publication:

    Ellis CD, MacLeod KL, Jenkins TL, Rato LD, Jézéquel Y, Pavičić M, Díaz D, Stevens JR. (2023). Shared and distinct patterns of genetic structure in two sympatric large decapods. Journal of Biogeography. https://doi.org/10.1111/jbi.14623

  2. Contact Information:

    Lead & Corresponding Author;
    Name: Charlie Ellis
    Institution: University of Exeter
    Address: Department of Biosciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
    Email: c.ellis@exeter.ac.uk

  3. Principal Investigator / Group Leader;
    Name: Jamie Stevens
    Institution: University of Exeter
    Address: Department of Biosciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
    Email: j.r.stevens@exeter.ac.uk

  4. Citation of this Dataset:

Ellis, Charlie D. et al. (2023) Shared and distinct patterns of genetic structure in two sympatric large decapods [Dataset]. Dryad. https://doi.org/10.5061/dryad.zgmsbccdz

  1. Method Types and Data Generation:

-- RADseq derived SNP data obtained from rangewide samples of two lobster species, European lobster (Homarus gammarus) and European spiny lobster (Palinurus elephas)

-- BioOracle environmantal data; raw asci files of marine environmental parameters obtained via BioOracle's open source digital data repository, via https://bio-oracle.org/

-- Reproducable scripts in R for the subsequent analytical pipeline used to address questions of population genetic diversity, structure, selection and adaptation.

  1. Usage Notes:

R / R Studio; and numerous R packages; BioOracle environmental dataset downloads (all open-source).

  1. Funding:

European Commission, Award: ENG4300

European Commission, Award: 05R16P00366

SHARING & ACCESS INFORMATION

  1. Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain

  2. Links to publications that cite or use the data:
    Ellis CD, MacLeod KL, Jenkins TL, Rato LD, Jézéquel Y, Pavičić M, Díaz D, Stevens JR.
    Shared and distinct patterns of genetic structure in two sympatric large decapods.
    Journal of Biogeography. 2023 May 2. https://doi.org/10.1111/jbi.14623

  3. The raw RAD sequence data from which SNP loci were derived are available via the NCBI SRA repository, at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA954007

FILE AND DATA OVERVIEW

This data archive consists of:

  1. this README file
  2. Ellis-etal2022_JBiogeog_CrawfishPelephasDataAnalysis.zip folder with analytical resources for European spiny lobsters (Palinurus elephas)
  3. Ellis-etal2022_JBiogeog_LobsterHgammarusDataAnalysis.zip folder with analytical resources for European lobsters (Homarus gammarus)
  4. BioOracleEnvData.zip folder containing raw ASCI files of marine environmental parameters used in Redundancy Analysis

Each Ellis-*DataAnalysis.zip folder (b & c, above) contains sufficient resources to undertake analytical investigation of the RADseq derived SNP genotypes, and reproduce the analytical pipeline of Ellis et al, 2023, via:

  • populations.snps*.gen -- a file with raw SNP genotype data output from the Stacks v2.0 pipeline, in a 'Genepop' format, used by R script 01 to generate an initial unfiltered genind object;

  • other raw files required to run parts of the analysis pipeline;

    • coord*.csv -- a .csv table of latitude and longitud coordinates corresponding to the location of sites, denoted by their site ID codes, as used in R scripts 05c and 06.
    • plot_bayescan_function.txt -- a .txt file containing a function to call from R script 03 in order to plot the results of Bayescan analyses, using the BaysOut*_fst.txt file outputs created by Bayescan (run on external HPC)
  • a series of 13 R scripts, with descriptive filenames prefixed with a numeral that denotes the ascending order in which they should be run, with comprehensive annotation and explanation of the analytical pipeline inherent;

  • 01 InitialImport_*.R

    • a script to input Stacks-derived genotype file, convert it to a Genind file, filter for missing data (via individuals and loci) and inspect basic statistics
  • 02 DAPC_fulldata_*.R

    • a script to produce a Discriminant Analysis of Principal Components and associated plot, using the full SNP dataset
  • 03 OutlierDetection_*.R

    • a script to detect outliers using Bayescan (via a HPC) and OUTflank, and highlight the overlap between detection methods
  • 04a DAPC_neutral_*.R\
    a script to produce a Discriminant Analysis of Principal Components and associated plot, using only neutral SNPs

  • 04b DAPC_outliers_*.R

    • a script to produce a Discriminant Analysis of Principal Components and associated plot, using only outlier SNP candidate loci
  • 05a RDA_geneticdata_*.R

    • a script to reformat SNP genotype data into allele frequency data, formatted for use in Redundancy Analysis
  • 05b RDA_envirodata_*.R

    • a script to reformat environmental data (obtained from BioOracle) into site-specific environmental gradients, formatted for use in Redundancy Analysis
  • 05c prelimlc_distances.R

    • a preliminary script (to be run prior to the other 05c*.R script) to calculate a matrix of least-cost marine distance paths between sites
  • 05c RDA_spatialdata_*.R

    • a script to convert pairwise geographic site distances as distance-based MEMs, for use in Redundancy Analysis
  • 05d RDA_GEA_*.R

    • a script to perform a basic Genotype Environment Association analysis, in which correlations between genotypic and environmental gradients are assessed via Redundancy Analysis
  • 05e RDA_dbMEMs_*.R

    • a script to perform a full distance-based Redundancy Analysis, in which genotypic gradients are correlated to both environmental and spatial variables, controlled as factors.
  • 06 FstIBD_*.R

    • a script to calculate and plot classic population genetic differentiation statistics, the fixation index (Fst) and isolation by distance (IBD)
  • 07 Snapclust_*.R

    • a script to run Snapclust algorithms, to assess the optimal number of ancestral populations (k) and assign membership of individuals to the resulting compartmentation of ancestral populations.

A series of other .txt .csv and .RData files which are created by the R scripts workflow, and are included to demonstrate repeatability of our analysis, and to provide example file structures for others.

  • outlierBAY*.csv
    • a comma separated data table file created in R script 03, giving the locus IDs of outlier SNPs identified as selection candidates by Bayescan, for downstream identification of the outlier (and neutral) loci data subset
  • allele_freqs_filt.csv
    • a comma separated data table file created in R script 05a, giving the minor allele frequencies of filtered SNP loci, for downstream use in redundancy analyses
  • lc_distances_km.csv
    • a comma separated data table file created in R script 05c, giving a matrix of least cost seaward distances between pairwise sites, for downstream use in redundancy analyses and calculations of Isolation By Distance
  • BaysOutfst.txt
    • a text file created by Bayescan (run separately on a HPC) with the results of all SNP loci across all samples, for downstream plotting and interrogation of outlier SNP loci.
  • BaysOutATL_fst.txt
    • a text file created by Bayescan (run separately on a HPC) with the results of all SNP loci across only Atlantic-origin samples, for downstream plotting and interrogation of outlier SNP loci.
  • BaysOutMED_fst.txt
    • a text file created by Bayescan (run separately on a HPC) with the results of all SNP loci across only Mediterranean-origin samples, for downstream plotting and interrogation of outlier SNP loci.
  • Pelephas.RData
    • a genind object created in R script 01 featuring genotypes of all crawfish samples at all filtered SNP loci, used by several other R scripts downstream
  • LobPopGen_.RData
    • a genind object created in R script 01 featuring genotypes of all lobster samples at all filtered SNP loci, used by several other R scripts downstream
  • Pelephas_ATL.RData
    • a genind object created in R script 02 featuring genotypes of Atlantic-origin crawfish samples at all filtered SNP loci, used by several other R scripts downstream
  • LobPopGen_ATL*.RData
    • a genind object created in R script 02 featuring genotypes of Atlantic-origin lobster samples at all filtered SNP loci, used by several other R scripts downstream
  • Pelephas_MED*.RData
    • a genind object created in R script 02 featuring genotypes of Mediterranean-origin crawfish samples at all filtered SNP loci, used by several other R scripts downstream
  • LobPopGen_MED*.RData
    • a genind object created in R script 02 featuring genotypes of Mediterranean-origin lobster samples at all filtered SNP loci, used by several other R scripts downstream
  • NeutralData_.RData
    • a genind object created in R script 03 featuring genotypes of all samples at only neutral filtered SNP loci (without those flagged as outliers), used by several other R scripts downstream
  • Outlier.RData
    • a genind object created in R script 03 featuring genotypes of all samples at only filtered SNP loci identified as outliers, used by several other R scripts downstream. For each species, there are several of these Outlier*.RData genind objects, each with different naming that reflects the different selection of samples and detection methodologies used to identify them.

SOFTWARE AND REPRODUCABILITY

--- All analyses were run in R v4.0.0, using R Studio
--- For version numbers used of individual R software packages, please see the Methods section of the published study, via https://doi.org/10.1111/jbi.14623
--- For any compatability issues or queries / troubleshooting relating to reproducability, please contact Charlie Ellis - c.ellis@exeter.ac.uk

#END

Methods

RADseq derived SNP data obtained from rangewide samples of two lobster species, European lobster (Homarus gammarus) and European spiny lobster (Palinurus elephas), and R scripts for the subsequent analytical pipeline used to address questions of population genetic diversity, structure, selection and adaptation.

Usage notes

R and numerous R packages (all open-source).

Funding

European Commission, Award: ENG4300

European Commission, Award: 05R16P00366