Data and code from: Whole-genome-sequencing reveals demographic history and patterns of parallel adaptive evolution in Indo-Pacific bottlenose dolphins (Tursiops aduncus) across coastal Australian seascapes

Published May 26, 2026 on Dryad. https://doi.org/10.5061/dryad.wwpzgmt0q

Data files

May 26, 2026 version files 2.57 GB

README.md

4.10 KB
SampleMetadata_TableS1_n164.csv

22.39 KB
VCF_Taduncus_hardfiltering_n156.vcf.gz

897.22 MB
VCF_Taduncus_nonrel_n146.vcf.gz

849.08 MB
VCF_Taduncus_nonrel_noSA_n142.vcf.gz

825.89 MB

Abstract

Understanding how demographic dynamics interact with environmental heterogeneity is central to explaining patterns of genomic variation in the marine realm. Indo-Pacific bottlenose dolphins (Tursiops aduncus) occur along most of the Australian coastline, spanning tropical to temperate environments with pronounced differences in temperature, salinity, and productivity. Using whole-genome sequencing, we examined population genetic structure, demographic history, and adaptive divergence at a continental scale. Genome-wide variation reflected geographic patterns consistent with a northern origin and subsequent colonization along both coastlines, while putatively adaptive loci suggested parallel responses to environmental conditions across regions.

This dataset contains variant call format (VCF) files used in these analyses.

Description of the data and file structure

Overview

This dataset contains SNP data derived from whole-genome sequencing of Indo-Pacific bottlenose dolphins (Tursiops aduncus) sampled across the Australian coastline. The data were used to investigate population structure, demographic history, and adaptive divergence across tropical to temperate marine environments.

Files

1. Sample metadata

File: SampleMetadata_TableS1_n164.csv

Metadata for all sequenced individuals (n = 164).

The metadata file corresponds to the sample metadata associated with the raw sequencing data deposited on NCBI. Not available information is denoted by "n/a"

GPS coordinates have been generalised to the centroid of the nearest 0.1° grid cell following GBIF sensitive species best practices (https://docs.gbif.org/sensitive-species-best-practices/master/en/#s-generalization).

Columns:

sample_id – unique identifier used in VCF files and all analyses (primary ID)
GPS.S – latitude of sampling location
GPS.E – longitude of sampling location
location_short – abbreviated sampling site name
location_long – full sampling site name
sex – sex of the individual (if known)
state – Australian state or region
coast – coastline grouping (e.g. west, east, south)
date.taken – sampling date
ABTC-Number – identifier from the Australian Bio Tissue Collection (ABTC)
SA_Museum ID – museum identifier (if applicable)
Sample_ID – museum sample identifier
Sequencing Run – sequencing batch/run information
BioProject – NCBI BioProject accession
failed_filter_step – filtering step at which a sample was excluded (if applicable, otherwise filled with n/a as it passed all filter steps)
reason – reason for exclusion (e.g. low coverage, relatedness)
IMCRA Bioregion – bioregional classification based on IMCRA
IMCRA Water Type – water type classification (e.g. tropical, subtropical, temperate)

2. VCF: post-QC dataset (n = 156)

File: VCF_Taduncus_hardfiltering_n156.vcf.gz

Quality- and linkage-filtered SNP dataset including all individuals retained after initial filtering (n = 156).

Low-coverage and poor-quality samples were excluded prior to genotype calling (see metadata columns failed_filter_step and reason).

3. VCF: main analysis dataset (n = 146)

File: VCF_Taduncus_nonrel_n146.vcf.gz

Dataset used for main analyses (n = 146).

Excludes closely related individuals.

4. VCF: dataset excluding South Australia (n = 142)

File: VCF_Taduncus_nonrel_noSA_n142.vcf.gz

Dataset excluding South Australian samples (n = 142).

Used for sensitivity analyses.

Data processing summary

Whole-genome sequencing followed by variant calling
Low-quality and low-coverage samples were excluded prior to genotype calling
SNP datasets were filtered for quality and linkage
Additional filtering steps:
Removal of closely related individuals (main dataset)
Exclusion of South Australian samples (sensitivity analysis)

Full details of data processing and filtering are provided in the associated manuscript.

External resources

Raw sequencing data

NCBI Sequence Read Archive (SRA)

BioProjects:

PRJNA1287052
PRJNA1291464

Analysis scripts and workflows

https://github.com/svenjamar/TursiopsAduncusWGS

Notes

Sample IDs are consistent across all VCF and metadata files
The sample_id column should be used to match individuals between metadata and VCF files
Only SNPs passing filtering criteria are included

Contact

Svenja Marfurt – svenja.marfurt@iea.uzh.ch
Adrien Tran Lu Y – adrien.tranluy@iea.uzh.ch