Mitochondrial genome sequencing of marine leukemias reveals cancer contagion between clam species in the Seas of Southern Europe
Data files
Dec 21, 2021 version files 4.04 GB
- 
              
                COX_all_Venerids.phy
                780.90 KB
- 
              
                COX_all_Venerids.phy_phyml_boot_stats.txt
                12.83 KB
- 
              
                COX_all_Venerids.phy_phyml_boot_trees.txt
                7.75 MB
- 
              
                COX_all_Venerids.phy_phyml_stats.txt
                2.07 KB
- 
              
                COX_all_Venerids.phy_phyml_tree.txt
                150.75 KB
- 
              
                CSVV18_1052F-N0-alnCGA_VVE.sorted.detup.bam
                8.37 MB
- 
              
                DEAH12_alignment.nex
                21.46 KB
- 
              
                DEAH12_tree.nex
                127.26 KB
- 
              
                EMVV18_376F-N1-alnCGA_VVE.sorted.detup.bam
                66.47 MB
- 
              
                EMVV18_376H-N1-alnCGA_VVE.sorted.detup.bam
                29.02 MB
- 
              
                EMVV18_385F-N0-alnCGA_VVE.sorted.detup.bam
                9.57 MB
- 
              
                EMVV18_391F-N3-alnCGA_VVE.sorted.detup.bam
                17.43 MB
- 
              
                EMVV18_391Hwga-N3-alnCGA_VVE.sorted.detup.bam
                1.08 GB
- 
              
                EMVV18_395F-N3-alnCGA_VVE.sorted.detup.bam
                88.68 MB
- 
              
                EMVV18_395Hwga-N3-alnCGA_VVE.sorted.detup.bam
                1.95 GB
- 
              
                EMVV18_400F-N2-alnCGA_VVE.sorted.detup.bam
                39.89 MB
- 
              
                EMVV18_400H-N2-alnCGA_VVE.sorted.detup.bam
                37 MB
- 
              
                ERVV17_2993F-N0-alnCGA_VVE.sorted.detup.bam
                8.62 MB
- 
              
                ERVV17_2995F-N3-alnCGA_VVE.sorted.detup.bam
                135.39 MB
- 
              
                ERVV17_2995H-N3-alnCGA_VVE.sorted.detup.bam
                19.84 MB
- 
              
                ERVV17_2997F-N1-alnCGA_VVE.sorted.detup.bam
                54.36 MB
- 
              
                ERVV17_2997Hwga-N1-alnCGA_VVE.sorted.detup.bam
                155.29 MB
- 
              
                ERVV17_3193F-N3-alnCGA_VVE.sorted.detup.bam
                83.28 MB
- 
              
                ERVV17_3193H-N3-alnCGA_VVE.sorted.detup.bam
                25.76 MB
- 
              
                EVVV11_02G-N_-alnCGA_VVE.sorted.detup.bam
                147.90 MB
- 
              
                FGVV18_193_F2-N0-alnCGA_VVE.sorted.detup.bam
                48.02 MB
- 
              
                IGVV19_666F-N0-alnCGA_VVE.sorted.detup.bam
                21.50 MB
- 
              
                PLVV18_2249F-N0-alnCGA_VVE.sorted.detup.bam
                6.85 MB
- 
              
                Ref_CGA_VVE.fa
                35.78 KB
- 
              
                Satellite_coverage.txt
                3.20 KB
- 
              
                Satellite_references.fasta
                800 B
- 
              
                TFIIH_alignment.nex
                27.33 KB
- 
              
                TFIIH_tree.nex
                144.13 KB
- 
              
                VVE_CGA_Mitochondrial_DNA.fasta
                398.04 KB
- 
              
                VVE_CGA_Mitochondrial_DNA.gff
                59.85 KB
Abstract
Clonally transmissible cancers are tumour lineages that are transmitted between individuals via the transfer of living cancer cells. In marine bivalves, leukemia-like transmissible cancers, called hemic neoplasias, have demonstrated the ability to infect individuals from different species. We performed whole-genome sequencing in eight V. verrucosa clams that were diagnosed with hemic neoplasia, from two sampling points located more than 1,000 nautical miles away in the Atlantic Ocean and the Mediterranean Sea Coasts of Spain. Mitochondrial genome sequencing of tumour tissues from neoplastic animals revealed the coexistence of haplotypes from two different clam species. Phylogenies estimated from mitochondrial and nuclear markers confirmed this leukemia originated in C. gallina (or a closely related taxa) and was later transmitted to V. verrucosa, in which it survived as a contagious cancer. The analysis of mitochondrial and nuclear gene sequences supports all the studied tumours belonging to a single neoplastic C. gallina lineage that spread in the Seas of Southern Europe.
We performed whole-genome sequencing on 23 samples from 16 clam specimens, which includes eight neoplastic and eight non-neoplastic animals (Table 1), with Illumina paired-end libraries of 350 bp insert size and reads 150 bp long. Then, we run MITObim v1.9.1 (Hahn, et al. 2013) to assemble the full mitochondrial genome of all sequenced samples, using gene baits from the following Cox1 and 16S reference genes to prime the assembly of clam mitochondrial genomes: V. verrucosa (Cox1, with GenBank accession number KC429139, 16S: C429301), C. gallina (Cox1: KY547757, 16S: KY547777) and C. striatula (Cox1: KY547747, 16S: KY547767). The draft sequences were polished twice with Pilon v1.23 (Walker, et al. 2014), and conflictive repetitive fragments from the mitochondrial control region were resolved using long read sequencing with Oxford Nanopore technologies (ONT) on a set of representative samples from each species and tumours. ONT reads were assembled with Miniasm v0.3 (Li 2016) and corrected using Racon v1.3.1 (Vaser, et al. 2017). Protein-coding genes, rDNAs and tRNAs were annotated on the curated mitochondrial genomes using MITOS2 web server (Bernt, et al. 2013), and manually curated to fit ORFs as predicted by ORF-FINDER (Rombel, et al. 2002). The rebuilt sequences can be found here in 
VVE_CGA_Mitochondrial_DNA.fasta and the annotations in VVE_CGA_Mitochondrial_DNA.gff.
We then mapped the paired-end sequencing data from healthy and neoplastic tissues from all neoplastic samples onto the V. verrucosa and C. gallina reference mitochondrial genomes using BWA-mem v0.7.17-r1188 (Li and Durbin 2009) with default parameters. Duplicate reads were marked with Picard 2.18.14 and removed from the analysis. The employed reference genome can be found in Ref_CGA_VVE.fa, and each bam file corresponds to a given animal (i.e. PLVV18_2249F-N0-alnCGA_VVE.sorted.detup.bam) in which the code represents the country (P, Portugal), and the locality (L, Lisbon) of procedence, the species (VV, Venus verrucosa), the year of collection (18, 2018), an internal code of the animal (2249) and the sequenced tissue (F, Foot; Also H from Haemolymph and G from Gill). Some of them were obtained from Whole genome amplified samples (WGA). There is a reference to the intensity of the neoplasia detected in the animal (N0=Healthy; N1, N2, N3 = different levels of neoplasia; N? = neoplasia but level unknown).
COI Veneridae sequences were recovered from Genbank and Boldsystems and aligned with our mitochondria reconstructions and are in the COX_all_Venerids.phy file. This was later employed to reconstruct a molecular ML phylogeny shown as supplementary data in our manuscript. DEAH12 and TFIIH sequences, alignments and molecular phylogenies can be found here as well (alignments and trees).
Satellite monomer clusters CL04 and CL17 reconstructions as recovered from repeatexplorer (Novak et al. 2010) from C. gallina NGS data can be found at "Satellite_references.fasta" and coverage stats for each of the specimens with NGS data here is present in Satellite_coverage.txt.
