Sea otter sequence capture project data files
Beichman, Annabel et al. (2021), Sea otter sequence capture project data files, Dryad, Dataset, https://doi.org/10.5068/D1ZD4D
Extinction or severe population contractions are rarely uniform across an entire species. However, because of the rapid onset of the fur trade in the 18th and 19th centuries, sea otters (Enhydra lutris) were systematically hunted to near extinction across their entire Northern Pacific range. Many sea otter populations were driven fully extinct, and the populations that survived suffered a rapid decline from 10-20,000 individuals per population to fewer than one hundred survivors. Each surviving remnant sea otter population represents a replicate of an extreme population bottleneck event impacting genetic diversity and fitness into the future. Here, we designed sequence capture probes of the sea otter exome and neutral regions to examine the population structure and demographic history of five surviving sea otter populations from throughout the species’ former range, including three ancient Californian samples from ~1500 and ~200 years ago. We show that southern sea otters in California are the last survivors of a divergent lineage that has been isolated from northern and Asian populations for thousands of years, highlighting the need for their separate conservation. We detect a signal of extreme population decline in every surviving sea otter population and use simulations to demonstrate that these contractions may have lowered the fitness of recovering populations. However, we also infer historically low effective population sizes prior to the fur trade bottleneck which paradoxically could have led to the purging of highly deleterious mutations and mitigated the effects of population decline on the burden of harmful genetic variants, countering the conventional wisdom that large populations are most robust to decline. Nonetheless, future bottlenecks caused by existing external threats may act to maintain the negative genetic impacts of the fur trade for hundreds of generations, illustrating how human exploitation can leave a species vulnerable long after its nominal recovery.
We carried out sequence capture on DNA from 122 sea otters, sampled in the 1970s-90s from the Kuril Islands, Commander Islands, Aleutian Islands, south central Alaska, California and Baja California. We designed the sequence capture based on the annotation of the southern sea otter genome. 50Mb of sequence was captured, including all annotated exonic sequence passing custom filters (see SI Methods), 1kb regions upstream of genes, and 10Mb of neutral regions far from genes. All sequencing fastq files are available on the Sequence Read Archive (SRA), BioProject accession PRJNA629776, and sample information is available in modernSampleInfo.xlsx. We mapped reads to the domestic ferret reference genome (Mustela putorius furo, accession GCA_000215625.1) in order have an outgroup genome from all sea otter populations and to make use of the existing Ensembl Variant Effect Predictor (VEP) database. Read mapping was carried out using the PALEOMIX pipeline. Genotypes were called and filtered using GATK (v. 3.7). On average, each sequenced individual had 6.14 x 107 called genotypes (61.4 Mb). Approximately 60,000 SNPs were discovered. Details of read mapping, genotype calling, and genotype filtering are in the SI Methods of Beichman et al. SNPs in coding regions were annotated as synonymous or missense using VEP. We generated the neutral site frequency spectrum based on SNPs in putatively neutral regions far from genes (SI Methods) using hypergeometric projection to maximize the number of SNPs.
In addition to these modern sea otter samples, we carried out sequence capture on three historical/ancient samples from shell middens in California (details in Beichman et al. and in aDNA.SampleInfo.xlsx). To enrich for endogenous content, the ancient samples were subject to two serial sequence captures. This led very high levels of clonality, but a successful 30-fold enrichment of target regions after PCR duplicates were removed. Adapter removal and read mapping to both the domestic ferret (Acc. GCA_000215625.1) and southern sea otter (Acc. GCA_006410715.1) reference genomes were carried out using the PALEOMIX pipeline. Mapdamage was used to detect DNA-damage profiles and to rescale the quality score of bases that are likely to be misincorporations. We found that the modern samples mapped well to both the domestic ferret and sea otter genomes, which validates our choice to do the majority of modern analyses mapped to the domestic ferret genomes. However, the ancient samples showed a mapping preference, with more reads mapping to the sea otter genome, likely due to lower coverage and read quality. We therefore carried out all ancient analyses mapped to both reference genomes, with similar qualitative results. We selected three modern samples from each population (California, Alaska, Aleutians, Commanders, Kurils) o be analyzed alongside the ancient samples. Genotype likelihoods, genotype posteriors and per-site depths were determined using ANGSD.
Finally, to ensure that our choice of a diverged outgroup reference genome would not bias estimates of heterozygosity or demographic history, we also mapped all individuals to the southern sea otter reference genome (accession GCA_006410715.1), called variants as described in the SI Methods of Beichman et al. and generated neutral SFSs from putatively neutral regions in the southern sea otter genome. Note that the set of putatively neutral regions in the southern sea otter genome is not identical to the set of regions in the ferret genome due to different annotations (see Methods section of Beichman et al. for details on these regions).
Here we present (1) VCF genotype files mapped to the domestic ferret reference genome for the modern data that were used for PCA and structure analysis and to generate the site frequency spectrum, (2) the folded neutral site frequency spectrum for each population, as well as the folded joint SFS for California and Alaska, (3) bam and ANGSD output files for the ancient+modern comparison analyses (3 ancient samples paired with 15 modern samples, 3 from each modern remnant population), (4) sets of VCF genotype files mapped to the southern sea otter reference genome for comparison of reference genomes, (5) site frequency spectra from sites mapped to the southern sea otter reference genome, for comparison of reference genomes. Population identifiers are California (CA), south central Alaska (AK), Aleutian Islands (AL), Commander Islands (COM), Kuril Islands (KUR).
All sequencing fastq files are available on the Sequence Read Archive (SRA), BioProject accession PRJNA629776, and sample information is available in modern.SampleInfo.xlsx and aDNA.SampleInfo.xlsx. Details of methods are available in Beichman et al.
Please see README.txt for more details.
National Science Foundation, Award: 1556705
Monterey Bay Aquarium
University of California, Award: Conservation Genomics Consortium
National Institute of Health, Award: R35GM119856
University of California, Los Angeles, Award: Academic Senate
National Institutes of Health, Award: T32 HG-002536: Training Grant
National Science Foundation Graduate Research Fellowship Program
Consejo Nacional de Ciencia y Tecnología, Award: 724094: Postdoctoral Fellowship
Saint Petersburg State University, Award: 1.52.1647.2016