Skip to main content
Dryad

A pan-cetacean MHC amplicon sequencing panel developed and evaluated in combination with genome assemblies

Cite this dataset

Heimeier, Dorothea et al. (2024). A pan-cetacean MHC amplicon sequencing panel developed and evaluated in combination with genome assemblies [Dataset]. Dryad. https://doi.org/10.5061/dryad.wh70rxwvb

Abstract

The major histocompatibility complex (MHC) is a highly polymorphic gene family that is crucial in immunity, and its diversity can be effectively used as a fitness marker for populations. Despite this, MHC remains poorly characterised in non-model species (e.g., cetaceans: whales, dolphins and porpoises) as high gene copy number variation, especially in the fast-evolving class I region, makes analyses of genomic sequences difficult. To date, only small sections of class I and IIa genes have been used to assess functional diversity in cetacean populations. Here, we undertook a systematic characterisation of the MHC class I and IIa regions in available cetacean genomes. We extracted full-length gene sequences to design pan-cetacean primers that amplified the complete exon2 from MHC class I and IIa genes in one combined sequencing panel. We validated this panel in 19 cetacean species and described 354 alleles for both classes.  Furthermore, we identified likely assembly artefacts for many MHC class I assemblies based on the presence of class I genes in the amplicon data compared to missing genes from genomes. Finally, we investigated MHC diversity using the panel in 25 humpback and 30 southern right whales, including four paternity trios for humpback whales. This revealed copy-number variable class I haplotypes in humpback whales, which is likely a common phenomenon across cetaceans. These MHC alleles will form the basis for a cetacean branch of the Immuno-Polymorphism Database (IPD-MHC), a curated resource intended to aid in the systematic compilation of MHC alleles across several species, to support conservation initiatives.

README: Merged paired end Illumina reads from five MHC loci for 85 cetaceans and their class I and assumed non-functional DRB alleles

https://doi.org/10.5061/dryad.wh70rxwvb

The dataset contains 85 fastq files. Each file contains reads of amplicons from five MHC loci (DQA, DQB, DRA, DRB, and class I genes) combined across separate sequencing runs from a single cetacean. Details on individual cetacean sample abbreviations can be found in the manuscript. Reads are paired and merged with the Illumina adapter removed.

It also contains one fastq file with all class I alleles found and one fastq file with non-functional DRB alleles found. Alleles are labeled with four letter species abbreviation followed by locus designation (DRB or N for class I) and are numbered in the order they were discovered.

Methods

A total of 85 tissue samples were taken from individual animals across several cetacean species. The type of tissue were either from strandings or biopsies. Stranding samples in New Zealand were taken by the Department of Conservation New Zealand and sent to the New Zealand Cetacean Tissue Archive (NZCeTA) housed at the University of Auckland Waipapa Taumata Rau with approval from mana whenua (Māori indigenous groups). Biopsy samples from New Zealand cetaceans include two Hector’s dolphin (Chephalorhyncus hectori) (Hamner et al., 2017) and two bottlenose dolphins (T. truncatus) (Tezanos-Pinto et al., 2009). Further biopsies include two rough-toothed dolphins (Steno bredanensis) and two Blainville beaked whales (Mesoplodon densirostris) from French-Polynesia (Albertson et al., 2017; Oremus et al., 2012). Details on samples and associated permit numbers can be found in the published manuscript.

DNA was extracted from tissue samples and genomic DNA underwent PCR for five Major Histocompatibility Complex loci. PCR products were pooled for each individual, indexed with Nextera indexes supplied by IDT, and sequenced on Illumina NanoSeq and MiSeq. Each individual amplicon-pool was sequenced multiple times on different sequencing runs. The reads provided here are the paired and merged reads from several sequencing runs combined for each individual in a fastq file.

Funding

Royal Society, Award: RGF\R1\181014