Data and code from: Whole-genome-sequencing reveals demographic history and patterns of parallel adaptive evolution in Indo-Pacific bottlenose dolphins (Tursiops aduncus) across coastal Australian seascapes
Data files
May 26, 2026 version files 2.57 GB
-
README.md
4.10 KB
-
SampleMetadata_TableS1_n164.csv
22.39 KB
-
VCF_Taduncus_hardfiltering_n156.vcf.gz
897.22 MB
-
VCF_Taduncus_nonrel_n146.vcf.gz
849.08 MB
-
VCF_Taduncus_nonrel_noSA_n142.vcf.gz
825.89 MB
Abstract
Understanding how demographic dynamics interact with environmental heterogeneity is central to explaining patterns of genomic variation in the marine realm. Indo-Pacific bottlenose dolphins (Tursiops aduncus) occur along most of the Australian coastline, spanning tropical to temperate environments with pronounced differences in temperature, salinity, and productivity. Using whole-genome sequencing, we examined population genetic structure, demographic history, and adaptive divergence at a continental scale. Genome-wide variation reflected geographic patterns consistent with a northern origin and subsequent colonization along both coastlines, while putatively adaptive loci suggested parallel responses to environmental conditions across regions.
This dataset contains variant call format (VCF) files used in these analyses.
Description of the data and file structure
Overview
This dataset contains SNP data derived from whole-genome sequencing of Indo-Pacific bottlenose dolphins (Tursiops aduncus) sampled across the Australian coastline. The data were used to investigate population structure, demographic history, and adaptive divergence across tropical to temperate marine environments.
Files
1. Sample metadata
File: SampleMetadata_TableS1_n164.csv
Metadata for all sequenced individuals (n = 164).
The metadata file corresponds to the sample metadata associated with the raw sequencing data deposited on NCBI. Not available information is denoted by "n/a"
GPS coordinates have been generalised to the centroid of the nearest 0.1° grid cell following GBIF sensitive species best practices (https://docs.gbif.org/sensitive-species-best-practices/master/en/#s-generalization).
Columns:
- sample_id – unique identifier used in VCF files and all analyses (primary ID)
- GPS.S – latitude of sampling location
- GPS.E – longitude of sampling location
- location_short – abbreviated sampling site name
- location_long – full sampling site name
- sex – sex of the individual (if known)
- state – Australian state or region
- coast – coastline grouping (e.g. west, east, south)
- date.taken – sampling date
- ABTC-Number – identifier from the Australian Bio Tissue Collection (ABTC)
- SA_Museum ID – museum identifier (if applicable)
- Sample_ID – museum sample identifier
- Sequencing Run – sequencing batch/run information
- BioProject – NCBI BioProject accession
- failed_filter_step – filtering step at which a sample was excluded (if applicable, otherwise filled with n/a as it passed all filter steps)
- reason – reason for exclusion (e.g. low coverage, relatedness)
- IMCRA Bioregion – bioregional classification based on IMCRA
- IMCRA Water Type – water type classification (e.g. tropical, subtropical, temperate)
2. VCF: post-QC dataset (n = 156)
File: VCF_Taduncus_hardfiltering_n156.vcf.gz
Quality- and linkage-filtered SNP dataset including all individuals retained after initial filtering (n = 156).
Low-coverage and poor-quality samples were excluded prior to genotype calling (see metadata columns failed_filter_step and reason).
3. VCF: main analysis dataset (n = 146)
File: VCF_Taduncus_nonrel_n146.vcf.gz
Dataset used for main analyses (n = 146).
Excludes closely related individuals.
4. VCF: dataset excluding South Australia (n = 142)
File: VCF_Taduncus_nonrel_noSA_n142.vcf.gz
Dataset excluding South Australian samples (n = 142).
Used for sensitivity analyses.
Data processing summary
- Whole-genome sequencing followed by variant calling
- Low-quality and low-coverage samples were excluded prior to genotype calling
- SNP datasets were filtered for quality and linkage
- Additional filtering steps:
- Removal of closely related individuals (main dataset)
- Exclusion of South Australian samples (sensitivity analysis)
Full details of data processing and filtering are provided in the associated manuscript.
External resources
Raw sequencing data
NCBI Sequence Read Archive (SRA)
BioProjects:
- PRJNA1287052
- PRJNA1291464
Analysis scripts and workflows
https://github.com/svenjamar/TursiopsAduncusWGS
Notes
- Sample IDs are consistent across all VCF and metadata files
- The sample_id column should be used to match individuals between metadata and VCF files
- Only SNPs passing filtering criteria are included
Contact
- Svenja Marfurt – svenja.marfurt@iea.uzh.ch
- Adrien Tran Lu Y – adrien.tranluy@iea.uzh.ch
