Data and code from: Genetic and ecological divergence between northwest Atlantic killer whale populations
Data files
May 25, 2026 version files 375.32 MB
-
d13C_AA.csv
29.32 KB
-
d15N_AA.csv
31.94 KB
-
killerwhale_genomics_sample_info.csv
11.39 KB
-
killerwhale4_snps.ID.dups_kin_removed.filter1.miss25.biallel.min100kb.autosomes.hwe.maf.LDprunedr08.vcf.gz
375.21 MB
-
kw_differentiation_code.zip
33.20 KB
-
README.md
7.72 KB
Abstract
Killer whales (Orcinus orca) exhibit substantial genetic and ecological variation across their global distribution. Differentiation between neighbouring or sympatric populations is thought to be driven by foraging specialization and social organization, which can lead to reproductive isolation and facilitate the emergence of distinct ecotypes or morphotypes. Here, we use whole-genome resequencing and compound-specific stable isotope analysis of amino acids to investigate links between genetic and ecological differentiation in two genetically distinct killer whale populations in the northwest Atlantic, specifically in the eastern Canadian Arctic and Greenland (ECAG1 and ECAG2). Essential amino acid stable carbon isotope ratios (δ13C) suggest that the populations maintain largely distinct distributions or habitat use patterns. Amino acid-specific stable nitrogen isotope ratios (δ15N) indicate ECAG1 has a higher trophic level diet than ECAG2. Previously undetected genetic substructure within the ECAG1 population revealed finer-scale genetic differentiation between individuals sampled in the eastern Canadian Arctic and those sampled in more temperate northwest Atlantic waters. However, small sample sizes prevented exploration of isotopic differentiation among them. Within ECAG1, considerable inter-annual variation in δ13C and δ15N amino acid values of seven individuals sampled across different years suggests some degree of ecological plasticity. Concurrent genetic and ecological differentiation suggests that northwest Atlantic killer whales have diverged ecologically, possibly in allopatry, and are now reproductively isolated under secondary contact, comparable to population-level differences observed in other regions. However, their degree of ecological plasticity and secondary contact within expanding Arctic ranges raises questions about whether current levels of divergence will be maintained or eroded with ongoing Arctic warming.
Dataset DOI: 10.5061/dryad.s1rn8pkpc
Description of the data and file structure
These datasets include amino acid-specific stable nitrogen and carbon isotope measurements and whole genome resequencing data (filtered SNPs) from DNA extracted from killer whale skin samples, collected from free-ranging killer whales at several locations in the Northwest Atlantic. Samples were collected under the permits provided in the manuscript.
Files and variables
File: d13C_AA.csv
Description: Amino acid-specific stable carbon isotope ratios.
Variables
- Location: The location (community) near which the sample was collected
- Year: The year in which the sample was collected
- Sample_date: The date (yyyy-mm-dd) on which the sample was collected
- Tissue: The tissue of the sample (for laboratory analysis of compound-specific stable isotope ratios). All SI ratios were measured in skin.
- UCDavis: The sample ID at UC Davis for laboratory analysis.
- DFO_sample_id: DFO sample ID.
- csia_sample_id: Alternative sample ID (some samples were re-named after initial collection for internal consistency).
- sex: individual sex determined genomically.
- duplicate_in_year: Some individuals were unintentionally re-sampled and were later identified genomically. This column indicates whether an individual was sampled twice in the same year. 0 = no genomic duplicate, 1 = first sample collected of genomic duplicate in same year, 2 = second sample collected of genomic duplicate in same year
- duplicate_other_year: This column indicates whether an individual was sampled twice in different years. 0 = no genomic duplicate, 1 = first sample collected of genomic duplicate, 2 = second sample collected of genomic duplicate in a later year, 3 = third sample collected of genomic duplicate in a later year
- duplicate_id: DFO ID of genomic duplicate
- wgs: 0 = not sequenced, 1 = whole genome sequencing completed for this sample
- genetic_pop: indicates the genetic population (ECAG1 or ECAG2) to which the sample belongs
- pop_analysis: 1 indicates samples that used in the population analysis, while no value indicates samples that were not (see killerwhale_genomics_sample_info.csv for more information)
- mean_se: whether the value in the subsequent amino acid columns is the sample mean or standard error
- Ala, Asx, Glx, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Tyr, Val: compound-specific stable carbon isotope ratio value measured for each amino acid. Empty cells indicate that the AA was not measured for this sample (only occurred in AAs not used for analysis).
File: d15N_AA.csv
Description: Amino acid-specific stable nitrogen isotope ratios.
Variables
- Location, Year, Sample_date, Tissue, UCDavis, DFO_sample_id, csia_sample_id, sex, duplicate_in_year, duplicate_other_year, duplicate_id, wgs, genetic_pop, pop_analysis, mean_se: See d13C_AA.csv for variable descriptions.
- Ala, Asx, Glx, Gly, Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Tyr, Val: compound-specific stable nitrogen isotope ratio value measured for each amino acid. Empty cells indicate that the AA was not measured for this sample (only occurred in AAs not used for analysis).
File: killerwhale_genomics_sample_info.csv
Description: metadata file for SNP data.
Variables
- genome_sample_ID: DFO sample ID
- location_name: The location (community) near which the sample was collected
- year: The year in which the sample was collected
- tissue: The tissue of the sample (for laboratory purposes). All DNA was extracted from skin.
- duplicates: DFO ID of genomic duplicate. Cell empty if sample has no genomic duplicate.
- remove_duplicates: x indicates genomic duplicates that were removed for genetic population analysis. Cell empty if sample has no genomic duplicate to remove.
- remove_closekin: x indicates close kin that were removed for genetic population analysis. Cell empty if sample has no close kin to remove.
- remove_ECAG2: Indicates ECAG2 whales, to remove for PCA of ECAG1 only. Cell is empty if individual is part of ECAG1.
- genome_sex: individual sex determined genomically.
- est_latitude: estimated latitude of sample collection (of community)
- est_longitude: estimated longitude of sample collection (of community)
- sortedbam_size_GB: size (GB) of sorted BAM file.
- finalbam_size_GB: size (GB) of final BAM file.
- modal_coverage: modal coverage of the BAM file
- mean_coverage: mean coverage of the BAM file
- downsample: to avoid large imbalances in coverage among samples, samples with high coverage were downsampled to 19x coverage. x = downsampled samples, cell is empty if sample was not downsampled.
- downsampled_coverage: modal coverage of downsampled file
- vcf_order: sample order in VCF file
- filtered_freq_miss: sample missingness
File: killerwhale4_snps.ID.dups_kin_removed.filter1.miss25.biallel.min100kb.autosomes.hwe.maf.LDprunedr08.vcf.gz
Description: filtered SNP data.
File: kw_differentiation_code.zip
Description: Annotated code for Kucheravy et al. (2026) Genetic and ecological divergence between northwest Atlantic killer whale populations.
Code/software
Relevant code is available here and at https://github.com/cailakucheravy/kw_differentation_ms.
1_sequence_prep folder:
- 01.2_trim_fastq.sh: Trim fastq files using program Trimmomatic (cited in manuscript).
- 01.3_merge_fastq.sh: Merge trimmed fastq files.
- 01.4_map_fastq.sh: Map merged fastq files to killer whale reference genome (cited in manuscript) using programs bwa v0.7.17 and samtools v1.12.
- 01.5_index_bams.sh: Index BAM files using programs bwa v0.7.17 and samtools v1.12.
- 01.6_delete_duplicates.sh: Remove duplicate reads from BAM files using program Picard 2.20.6.
- 01.7_add_read_groups.sh: Add read group information to BAM files using programs Picard 2.20.6 and samtools v1.12.
- 01.8_check_coverages.sh: Check modal coverage of BAM files using programs bwa v0.7.17 and samtools v1.12.
- 01.9_downsample_bams.sh: Downsample BAM files (if necessary) using program gatk/4.1.2.0, and re-check modal coverage.
- bam_coverage.sh: Additional file required to check modal coverage.
2_snp_prep folder:
- 02.1_call_variants.sh: Call genomic variants from BAM files using reference genome (cited in manuscript) using programs gcc 7.3.0 and platypus 0.8.1.
- 02.2_snps_stats.sh: Check SNP statistics using program vcftools.
- 02.3_check_snps_stats.R: Check SNP statistics using output of "02.2_snps_stats.sh" in R v4.4.1.
- 02.4_kinship_files.sh: Calculate kinship coefficient between genomes using program Plink.
- 02.5_examine_kinship.R: Check kinship using output of "02.4_kinship_files.sh: in R v4.4.1.
- 02.6_snp_filter_pipeline_pop_structure.sh: Filter SNP files for quality using programs bcftools v1.9, gatk v4.1.9, and Plink.
- 02.7_difcover.sh: Check sex using Difcover.
3_analysis folder:
- 03.1_kw_popstructure.Rmd: Analysis of killer whale genetic population structure in R v4.4.1.
- 03.2_d13C_CSIA_analysis.Rmd: Analysis of killer whale d13C CSIA-AA data in R v4.4.1.
- 03.3_d15N_CSIA_analysis.Rmd: Analysis of killer whale d15N CSIA-AA data in R v4.4.1.
- 03.4_CSIA_LDA.Rmd: Linear discriminant analysis of killer whale d13C and d15N CSIA-AA data in R v4.4.1.
Access information
Other publicly accessible locations of the data:
- Raw genomic data is available on NCBI BioProject PRJNA986581.
