Population structure and demographic analyses of Acanthocybium solandri from the Indo-Pacific and Atlantic oceans
Data files
Jun 29, 2022 version files 41.50 MB
-
Thia_2021_Wahoo_PopGenomics.zip
41.50 MB
Nov 12, 2024 version files 41.06 MB
-
Project_info.pdf
231.94 KB
-
README.md
3.73 KB
-
Thia_2021_Wahoo_PopGenomics.zip
40.83 MB
Abstract
This repository contains scripts, data and results for a populaton genomics study of genetic structure and demography of wahoo, Acanthocybium solandri, published in Journal of Biogeography:
Haro-Bilbao et al. (2021) Global connections with some genomic differentiation occur between Indo-Pacific and Atlantic Ocean wahoo, a large circumtropical pelagic fish. Journal of Biogeography. doi.org/10.1111/jbi.14135
In this work, we generated population allele frequencies for wahoo sampled at 11 locations around the globe using a pooled ezRAD approach. Using thousands of genome-wide SNPs, we demonstrated a significant (but subtle) genetic divide between wahoo from the Indo-Pacific and those from the Atlantic. This genetic differentiation likely occurs against a background of high gene glow throughout the evolutionary history of wahoo, as we inferred from demographic analysis of select population pairs within and between oceanic regions. Analyses contained in this repository are for: (1) Filtering pooled ezRAD allele counts (assembled with dDocent and imputed using poolne_estim); (2) Estimation of genetic differentiation among globally sampled wahoo populations; (3) Estimation of site frequency spectra from joint allele frequencies among select population pairs; (4) Inference of demographic parameters (using δaδi); and (5) Generations of demographic simulation summary statistics. Most of the analyses are performed in R and can be run directly from within the repository directory, this includes: allele filtering, estimation of genetic differentiation, estimaiton of site frequency spectra, and generation of demographic summary statistics. Demographic inference using δaδi requires setup of a Unix environment: input data files and execution scripts are provided, but their implementation needs to be customised.
https://doi.org/10.5061/dryad.dncjsxkz4
Description of the data
This repository contains scripts, data and results for a population genomics study of genetic structure and demography of wahoo, Acanthocybium solandri, published in Journal of Biogeography:
Haro-Bilbao et al. (2021) Global connections with some genomic differentiation occur between Indo-Pacific and Atlantic Ocean wahoo, a large circumtropical pelagic fish. Journal of Biogeography. doi.org/10.1111/jbi.14135
In this work, we generated population allele frequencies for wahoo sampled at 11 locations around the globe using a pooled ezRAD approach. Using thousands of genome-wide SNPs, we demonstrated a significant (but subtle) genetic divide between wahoo from the Indo-Pacific and those from the Atlantic. This genetic differentiation likely occurs against a background of high gene glow throughout the evolutionary history of wahoo, as we inferred from demographic analysis of select population pairs within and between oceanic regions. Analyses contained in this repository are for: (1) Filtering pooled ezRAD allele counts (assembled with dDocent and imputed using poolne_estim); (2) Estimation of genetic differentiation among globally sampled wahoo populations; (3) Estimation of site frequency spectra from joint allele frequencies among select population pairs; (4) Inference of demographic parameters (using δaδi); and (5) Generations of demographic simulation summary statistics. Most of the analyses are performed in R and can be run directly from within the repository directory, this includes: allele filtering, estimation of genetic differentiation, estimaiton of site frequency spectra, and generation of demographic summary statistics. Demographic inference using δaδi requires setup of a Unix environment: input data files and execution scripts are provided, but their implementation needs to be customised.
Files and variables
Download the Thia_2021_Wahoo_PopGenomics.zip file to access all the data related to the R analyses. Analytical scripts include a combination of R and python scripts. These data include everything after the bioinformatic anlaysis of pooled ezRAD data to generate pooled allele frequencies.
Read the Project_info.pdf file for more information about the scripts and data.
Code/software
Wahoo_CODE_01_FST_Analyses.R is used to perform analyses of population genetic structure.
Wahoo_CODE_03a_Dadi_Sims_HPC_PBS_Gen.R and Wahoo_CODE_03b_Dadi_Sims_HPC_Slurm_Gen.R are used to generate automated scripts for running δaδi simulations on a high performance computer (HPC) cluster.
Wahoo_CODE_04_Dadi_Outputs.R is used to summarise δaδi simulations.
Wahoo_CODE_05_Dadi_Plot_SFS.py is used to generate plots of site frequency spectra from δaδi simulations.
Wahoo_CODE_06_Posthoc_FST_Shared_Loci.R is used to perform posthoc analyses of loci used in this pipeline.
Access information
The genomic data used in this study is derived from pooled DNA samples of wahoo, with each sample representing a population. Each sample was sequenced in duplicate to provide an estimate of variance in pool-seq allele frequency estimation.
Raw ezRAD pool-seq reads have been uploaded to NCBI's SRA as has the relevant BioSample information: BioProject PRJNA683059. Additional metadata for these pool-seq reads are available through GeOME (https://n2t.net/ark:/21547/DhI2).
Allele frequency data was obtained through a pooled ezRAD approach. De novo assembly of RAD contigs and variant calling was performed using the dDocent pipeline. Population allele frequencies were imputed using poolne_estim. Additional quality filtering was performed in R. Analysis of genetic differentiation was performed in R, which include: estimates of FST and AMOVA (analysis of molecular variance). Generation of site frequency spectra and summary of demographic analyses was performed in R. Demographic inference was performed using δaδi, originally on an HPC.
All R code can be run from within the respository directory using the R project file, Wahoo_PROJ.Rproj.
Demographic analyses using δaδi must be run in a Unix environment. The scripts Wahoo_DADI_Demog_Models.py and Wahoo_DADI_Generic_Execute.py can be used to set up a pipeline for executing demographic simulations in a local system or on an HPC cluster.