Humpback whale genomes reflect the increased efficiency of commercial whaling
Data files
Nov 19, 2025 version files 2.48 GB
-
downsampled.temporal.all.vcf.gz
283.13 MB
-
downsampled.temporal.all.vcf.gz.tbi
1.35 MB
-
downsampled.temporal.final.filtered.vcf.gz
295.58 MB
-
NAT_contemporary_snps.map
130.02 MB
-
NAT_contemporary_snps.ped
235.10 MB
-
NAT.01h.estimates.gone
1.41 MB
-
NAT.05h.estimates.gone
4.16 MB
-
NAT.10h.estimates.gone
1.38 MB
-
NAT.contemporary.filter.vcf.gz
777.98 MB
-
README.md
4.27 KB
-
SOU_contemporary_data.snps.vcf.gz
479.55 MB
-
SOU_contemporary_snps.map
79.36 MB
-
SOU_contemporary_snps.ped
184.54 MB
-
SOU.01h.estimates.gone
1.34 MB
-
SOU.05h.estimates.gone
4.03 MB
-
SOU.10h.estimates.gone
1.33 MB
Abstract
Genetic diversity is declining globally, a trend that may particularly impact exploited populations that must adapt to rapid environmental change and other threats. Estimated genomic changes in effective population size mirrored known whaling history and shifts in technology. In the Southern Ocean, a comparison of genomes from historical and contemporary populations indicated that the contemporary genomes have less diversity and an elevated realised mutation load for moderately deleterious mutations, likely due to the effects of whaling. Our results demonstrate that the relatively recent, brief, and drastic depletion of humpback whale populations by whaling likely had subtle but discernible, negative, and lasting effects on the whales’ genomes. Thus, even as some humpback whale populations are now recovering to pre-exploitation numbers, they likely do so with a diminished adaptive capacity in the face of future conditions and threats.
Dataset DOI: 10.5061/dryad.37pvmcvwx
Description of the data and file structure
Data collected for the humpback whale genomic assessment in the North Atlantic and Southern Ocean.
SNP VARIANT FILES
File: SOU_contemporary_data.snps.vcf.gz
Description: Variant file for the contemporary Southern Ocean genomes only.
File: NAT.contemporary.filter.vcf.gz
Description: Variant file for the contemporary North Atlantic genomes only.
File: downsampled.temporal.all.vcf.gz - (downsampled.temporal.all.vcf.gz.tbi)
Description: Variant file for temporal comparisons using the downsampled dataset, including all samples (used for genetic clustering estimates)
File: downsampled.temporal.final.filtered.vcf.gz
Description: Variant file for temporal comparisons using the downsampled dataset, excluding samples with lower average depth (used for genetic load, heterozygosity, and ROH estimates).
GONE RESULTS
Demographic estimates results from GONE.
Files contains three columns: first column indicates the replicated run, second column indicates the number of generations back in time, and the third column indicates effective pop size (NE) estimates.
File: NAT.01h.estimates.gone
Description: GONE estimates results for the North Atlantic using hc=0.01
File: NAT.10h.estimates.gone
Description: GONE estimates results for the North Atlantic using hc=0.10
File: NAT.05h.estimates.gone
Description: GONE estimates results for the North Atlantic using hc=0.05
File: SOU.10h.estimates.gone
Description: GONE estimates results for the Southern Ocean using hc=0.10
File: SOU.01h.estimates.gone
Description: GONE estimates results for the Southern Ocean using hc=0.01
File: SOU.05h.estimates.gone
Description: GONE estimates results for the Southern Ocean using hc=0.05
PLINK FILES
PLINK PED and MAP variant files for the different datasets used in the study
File: : NAT_contemporary_snps.map
Description: North Atlantic genomic coordinates for contemporary dataset (used as input for GONE)
File: NAT_contemporary_snps.ped
Description: North Atlantic genomic variants for contemporary dataset (used as input for GONE)
File: SOU_contemporary_snps.map
Description: Southern Ocean genomic coordinates for contemporary dataset (used as input for GONE)
File: SOU_contemporary_snps.ped
Description: Southern Ocean genomic variants for contemporary dataset (used as input for GONE)
Scripts and Codes
Scripts used in the study are deposited in the associated Zenodo repository
File: alignment.sh
Description: script including all steps to obtain alignment files from raw reads
File: calling.sh
Description: script including all steps to obtain SNPs variants from alignment files (unfiltered VCF files)
File: filtering_genotypes.py
Description: script including all steps to obtain filtered SNPs variants (filtered VCF files)
File: clustering.sh
Description: script including steps to obtain clustering results. It includes conversion steps to PLINK files and clustering analyses.
File: gone_runs.sh
Description: script including steps to obtain demographic estimates using GONE from PLINK files.
File: mutation_load_count.py
Description: script including steps to calculate mutation load counts from VCF files.
File: allele_count_sites.py
Description: script including steps to calculate derived and total alleles from VCF files.
File: rxy_calculation.py
Description: script including steps to calculate Rxy statistics from allelic counts.
Access information
Other publicly accessible locations of the data:
Raw data is available under the NCBI project code: PRJNA1250286 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1250286).
Outgroup raw data is available under the NCBI BioSample code: SAMN43059233 (https://www.ncbi.nlm.nih.gov/biosample/SAMN43059233)
