Skip to main content

North Pacific harbor porpoise SNP and microhaplotype genotypes, mitochondrial control region haplotype sequences

Cite this dataset

Morin, Phillip et al. (2021). North Pacific harbor porpoise SNP and microhaplotype genotypes, mitochondrial control region haplotype sequences [Dataset]. Dryad.


Harbor porpoises in the North Pacific are found in coastal waters from southern California to Japan, but population structure is poorly known outside of a few local areas. We used multiplexed amplicon sequencing of 292 loci and genotyped clusters of SNPs as microhaplotypes (N=271 samples) in addition to mtDNA sequence data (N=413 samples), to examine the genetic structure from samples collected along the Pacific coast and inland waterways from California to southern British Columbia. We confirmed an overall pattern of strong isolation-by-distance, suggesting that individual dispersal is restricted. We also found evidence of regions where genetic differences are larger than expected based on geographic distance alone, implying current or historical barriers to gene flow. In particular, the southernmost population in California is genetically distinct (FST = 0.02 (microhaplotypes); 0.31 (mtDNA)), with both reduced genetic variability and high frequency of an otherwise rare mtDNA haplotype. At the northern end of our study range, we found significant genetic differentiation of samples from the Strait of Georgia, previously identified as a potential biogeographic boundary or secondary contact zone between harbor porpoise populations. Association of microhaplotypes with remotely-sensed environmental variables indicated potential local adaptation, especially at the southern end of the species’ range. These results inform conservation and management for this nearshore species, illustrate the value of genomic methods for detecting patterns of genetic structure within a continuously distributed marine species, and highlight the power of microhaplotype genotyping for detecting genetic structure in harbor porpoises despite reliance on poor-quality samples.


Amplicon libraries were prepared following the GT-seq protocol, including the optional Exo-SAP pre-treatment of the samples (Campbell et al., 2015), and pooled libraries were sequenced on an Illumina NextSeq500 sequencer, 1x150 bp reads. Custom scripts for processing GT-seq data (Campbell et al., 2015) were used to demultiplex the sample files and conduct preliminary genotyping. Genotypes were quality checked for duplicate samples, percent missing genotypes per locus and sample, and percent homozygosity using the strataG package in R.

Microhaplotypes were generated for all loci using the R package MicrohaPlot (Baetscher et al., 2017). The MicrohaPlot algorithm inserts N’s for missing sequence data at SNPs within haplotypes, so we used a custom R-scripts (supplemental materials) to identify SNPs with >10% N’s. The identified SNPs were removed from the original vcf file using vcfTools, and MicrohaPlot was used to generate new microhaplotypes with the remaining variable SNP positions. The unfiltered haplotypes were exported for subsequent filtering with custom scripts to view and call genotypes.

Mitochondiral DNA control region haplotype sequences were generated using Sanger dideoxy sequencing of PCR products, sequenced in both directions. 

Usage notes

Sample ID's, collection location (Latitude, Longitude) and a priori geographic stratification are provided in Table S1 of the supplemental materials.