Data for paper: The making of a genetic cline: introgression of oceanic genes into coastal cod populations in the North East Atlantic
Data files
Mar 05, 2021 version files 707.61 KB
-
coastalcoddata.csv
-
connectivity_matrix.txt
-
IBD.dat
-
popfile.dat
-
README.txt
-
report_pairwise_Fst_Vikingbank.R
-
simloop.R
-
simmat
-
simmat.c
Abstract
Files included in this data package includes data, scripts and a simulation program to replicate
simulations of genetic divergence patterns and comparision with observations, as described in the paper.
Filename: content
coastalcoddata.csv: genotypes and metadata, described in detail below
IBD.dat: data, pairwise observed Fst among cod samples
report_pairwise_Fst_Vikingbank.R: R-script to extract relevant data from IBD.dat
connectivity_matrix.txt: output from oceanographic modelling, for simulations
popfile.dat: population data, for simulations
simmat: compiled simulation program (linux 64 bit, dynamically linked)
simmat.c: C source code for simmat simulation program
simloop.R: R-script for running simulations
(R.scripts uses the R computing environment: https://www.r-project.org/)
The coastalcoddata.csv file contains SNP genotypes and metadata for individual cod (one line per individual),
coastal cod as well as reference samples:
Column name: Explanation
RefSamp: 0 = coastal cod sample, 1 = North Sea reference sample, 2 = NEAC reference sample
Region: County (prior to 2020 county revision)
Locality: Name of sample locality
Year: Year of sampling
Lat: Latitude of sample locality
Long: Longitude of sample locality
SampleID: Internal ID og sample locality and year
IndividualID: Individual number
Otype: Otolith type of individual (1 = certain CC/NS, 2 = uncertain CC/NS, 4 = uncertain NEAC, 5 = certain NEAC, where NS=North Sea cod, CC=coastalcod, NEAC=North East Arctic cod)
Missing: Number of missing genotypes for individual
cgpGmoS1026 (and so on): SNP genotype in Genepop 3-digit format (missing genotype =000000)
Methods
SNP genotyping, ocenographic modelling, computer simulations
Usage notes
Genotype and position data from coastalcoddata.csv are used to calculate pairwise genetic divergence (Fst) and geographic distances (Dist) among samples, including all individuals or after removal of suspect NEAC individuals (w/otolith types 4 or 5). The calculated values are reported in the IDB.dat file as Fst (all individuals) and Fst.NCC (excluding suspect NEAC), respectively, with lower (CL025) and upper (CL975) Confidence Limits based on Jackknifing over loci.
Computer simulations are run from inside R by calling simloop.R which in turn calls report_pairwise_Fst_Vikingbank.R and starts the simulation program simmat. The latter program is provided both as an executable program (linux 64 bit, dynamically linked) and as a source code file (simmat.c: C source) and may need to be recompiled for the intended computer system. Programs and scripts have been tested for the linux operating system only (Fedora distribution: https://getfedora.org/).