Skip to main content

Mitochondrial genome sequencing of marine leukemias reveals cancer contagion between clam species in the Seas of Southern Europe

Cite this dataset

Garcia-Souto, Daniel et al. (2021). Mitochondrial genome sequencing of marine leukemias reveals cancer contagion between clam species in the Seas of Southern Europe [Dataset]. Dryad.


Clonally transmissible cancers are tumour lineages that are transmitted between individuals via the transfer of living cancer cells. In marine bivalves, leukemia-like transmissible cancers, called hemic neoplasias, have demonstrated the ability to infect individuals from different species. We performed whole-genome sequencing in eight V. verrucosa clams that were diagnosed with hemic neoplasia, from two sampling points located more than 1,000 nautical miles away in the Atlantic Ocean and the Mediterranean Sea Coasts of Spain. Mitochondrial genome sequencing of tumour tissues from neoplastic animals revealed the coexistence of haplotypes from two different clam species. Phylogenies estimated from mitochondrial and nuclear markers confirmed this leukemia originated in C. gallina (or a closely related taxa) and was later transmitted to V. verrucosa, in which it survived as a contagious cancer. The analysis of mitochondrial and nuclear gene sequences supports all the studied tumours belonging to a single neoplastic C. gallina lineage that spread in the Seas of Southern Europe.


We performed whole-genome sequencing on 23 samples from 16 clam specimens, which includes eight neoplastic and eight non-neoplastic animals (Table 1), with Illumina paired-end libraries of 350 bp insert size and reads 150 bp long. Then, we run MITObim v1.9.1 (Hahn, et al. 2013) to assemble the full mitochondrial genome of all sequenced samples, using gene baits from the following Cox1 and 16S reference genes to prime the assembly of clam mitochondrial genomes: V. verrucosa (Cox1, with GenBank accession number KC429139, 16S: C429301), C. gallina (Cox1: KY547757, 16S: KY547777) and C. striatula (Cox1: KY547747, 16S: KY547767). The draft sequences were polished twice with Pilon v1.23 (Walker, et al. 2014), and conflictive repetitive fragments from the mitochondrial control region were resolved using long read sequencing with Oxford Nanopore technologies (ONT) on a set of representative samples from each species and tumours. ONT reads were assembled with Miniasm v0.3 (Li 2016) and corrected using Racon v1.3.1 (Vaser, et al. 2017). Protein-coding genes, rDNAs and tRNAs were annotated on the curated mitochondrial genomes using MITOS2 web server (Bernt, et al. 2013), and manually curated to fit ORFs as predicted by ORF-FINDER (Rombel, et al. 2002). The rebuilt sequences can be found here in 
VVE_CGA_Mitochondrial_DNA.fasta and the annotations in VVE_CGA_Mitochondrial_DNA.gff.

We then mapped the paired-end sequencing data from healthy and neoplastic tissues from all neoplastic samples onto the V. verrucosa and C. gallina reference mitochondrial genomes using BWA-mem v0.7.17-r1188 (Li and Durbin 2009) with default parameters. Duplicate reads were marked with Picard 2.18.14 and removed from the analysis. The employed reference genome can be found in Ref_CGA_VVE.fa, and each bam file corresponds to a given animal (i.e. PLVV18_2249F-N0-alnCGA_VVE.sorted.detup.bam) in which the code represents the country (P, Portugal), and the locality (L, Lisbon) of procedence, the species (VV, Venus verrucosa), the year of collection (18, 2018), an internal code of the animal (2249) and the sequenced tissue (F, Foot; Also H from Haemolymph and G from Gill). Some of them were obtained from Whole genome amplified samples (WGA). There is a reference to the intensity of the neoplasia detected in the animal (N0=Healthy; N1, N2, N3 = different levels of neoplasia; N? = neoplasia but level unknown).

COI Veneridae sequences were recovered from Genbank and Boldsystems and aligned with our mitochondria reconstructions and are in the COX_all_Venerids.phy file. This was later employed to reconstruct a molecular ML phylogeny shown as supplementary data in our manuscript. DEAH12 and TFIIH sequences, alignments and molecular phylogenies can be found here as well (alignments and trees).

Satellite monomer clusters CL04 and CL17 reconstructions as recovered from repeatexplorer (Novak et al. 2010) from C. gallina NGS data can be found at "Satellite_references.fasta" and coverage stats for each of the specimens with NGS data here is present in Satellite_coverage.txt.


European Research Council, Award: Starting Grant 716290 SCUBA CANCERS

Ministerio de Asuntos Económicos y Transformación Digital, Award: BES-2016-078166

Xunta de Galicia, Award: ED481B/2018/091

European Research Council, Award: ERC-617457- PHYLOCANCER

Ministerio de Asuntos Económicos y Transformación Digital, Award: PID2019- 106247GB-I00

ASSEMBLE Plus project, Award: 730984