Data from: Genome analysis reveals three distinct lineages of the cosmopolitan white shark
Data files
Jul 22, 2024 version files 1.48 GB
-
README.md
-
Sample_info_complete.xlsx
-
TGC_GWS_snps.vcf
Abstract
The white shark (Carcharodon carcharias) (Linnaeus, 1758), an iconic apex predator occurring in all oceans, is classified as Vulnerable globally with global abundance having dropped to 63% of 1970s estimates, and Critically Endangered in Europe. Identification of Evolutionary Significant Units and their management are crucial for conservation, especially as the white shark is facing various but often region-specific anthropogenic threats. Here, combining target gene capture sequencing (89 individuals, 4,000 SNPs) and whole genome re-sequencing (17 individuals, 391,000 SNPs), with worldwide sampling across most of the distributional range we identify three genetically distinct allopatric lineages (North Atlantic, Indo-Pacific, North Pacific).
README: Data from: Genome analysis reveals three distinct lineages of the cosmopolitan white shark
https://doi.org/10.5061/dryad.8pk0p2nwh
This folder contains the data, computer code and document files for the article:
Wagner et al 2024. Seeing the bigger picture: Genome analysis reveals three distinct lineages of the cosmopolitan white shark. Current Biology
Comments and requests should be addressed to Galice Hoarau: galice.g.hoarau@nord.no. All material is free of use, but I would appreciate being told, and this dataset and the matching paper cited if appropriate.
Description of the data and file structure
The file named Sample_info_complete.xlsx contains the information relative to the samples used in the publication.
Sample ID: unique identifier, also used in the VCF file |
---|
Sample Group: geographic region the individual was sampled |
Sampling Location: The location that the specimen was obtained from |
Sex: M(ale), F(emale) U(nknown) |
Total Length in cm, U(nknown) |
DNA repaired: indicates if the DNA was repaired prior to library preparation (see Wagner et al 2024) |
WGS total read number: total number of read from Illumina whole genome sequencing |
TGC total read: total number of read from Illumina target gene capture sequencing |
The file named TGC_GWS_snps.vcf contains the SNPs genotypes (target gene capture) for all the individuals describes in Sample_info_complete.xlsx using the same Sample ID
The file named ANGSD_LD.py is a Python script used for thinning the data as described in Wagner et al 2024
Code/Software
The file named ANGSD_LD.py is a Python script used for thinning the data
Methods
Target capture was performed on all 89 prepared genomic libraries using a bait set designed on the white shark genome and transcriptome. The captured libraries were split into two pools, which were sequenced on 1/4 of an Illumina NovaSeq S4 2x150 bp flow cell each.
Target gene capture sequencing data was trimmed as described in Wagner et al. 2024, except for base call quality thresholds set to a phred of 30, and no length filter being applied. Read mapping and cleaning, Single Nucleotide Polymorphisms SNPs calling and hard filtering was performed following a pipeline developed in Wagner et al 2024, and using the white shark reference genome NCBI accession number GCF_017639515.1