Data from: Genome analysis reveals three distinct lineages of the cosmopolitan white shark
Data files
Jul 22, 2024 version files 1.48 GB
Abstract
The white shark (Carcharodon carcharias) (Linnaeus, 1758), an iconic apex predator occurring in all oceans, is classified as Vulnerable globally with global abundance having dropped to 63% of 1970s estimates, and Critically Endangered in Europe. Identification of Evolutionary Significant Units and their management are crucial for conservation, especially as the white shark is facing various but often region-specific anthropogenic threats. Here, combining target gene capture sequencing (89 individuals, 4,000 SNPs) and whole genome re-sequencing (17 individuals, 391,000 SNPs), with worldwide sampling across most of the distributional range we identify three genetically distinct allopatric lineages (North Atlantic, Indo-Pacific, North Pacific).
README: Data from: Genome analysis reveals three distinct lineages of the cosmopolitan white shark
https://doi.org/10.5061/dryad.8pk0p2nwh
This folder contains the data, computer code and document files for the article:
Wagner et al 2024. Seeing the bigger picture: Genome analysis reveals three distinct lineages of the cosmopolitan white shark. Current Biology
Comments and requests should be addressed to Galice Hoarau: galice.g.hoarau@nord.no. All material is free of use, but I would appreciate being told, and this dataset and the matching paper cited if appropriate.
Description of the data and file structure
The file named Sample_info_complete.xlsx contains the information relative to the samples used in the publication.
Sample ID: unique identifier, also used in the VCF file |
---|
Sample Group: geographic region the individual was sampled |
Sampling Location: The location that the specimen was obtained from |
Sex: M(ale), F(emale) U(nknown) |
Total Length in cm, U(nknown) |
DNA repaired: indicates if the DNA was repaired prior to library preparation (see Wagner et al 2024) |
WGS total read number: total number of read from Illumina whole genome sequencing |
TGC total read: total number of read from Illumina target gene capture sequencing |
The file named TGC_GWS_snps.vcf contains the SNPs genotypes (target gene capture) for all the individuals describes in Sample_info_complete.xlsx using the same Sample ID
The file named ANGSD_LD.py is a Python script used for thinning the data as described in Wagner et al 2024
Code/Software
The file named ANGSD_LD.py is a Python script used for thinning the data
Methods
Target capture was performed on all 89 prepared genomic libraries using a bait set designed on the white shark genome and transcriptome. The captured libraries were split into two pools, which were sequenced on 1/4 of an Illumina NovaSeq S4 2x150 bp flow cell each.
Target gene capture sequencing data was trimmed as described in Wagner et al. 2024, except for base call quality thresholds set to a phred of 30, and no length filter being applied. Read mapping and cleaning, Single Nucleotide Polymorphisms SNPs calling and hard filtering was performed following a pipeline developed in Wagner et al 2024, and using the white shark reference genome NCBI accession number GCF_017639515.1