Skip to main content
Dryad

Cosmopolites sordidus genome assemblies

Abstract

PacBio HiFi sequencing was employed in combination with metagenomic binning to produce a high-quality reference genome of Cosmopolites sordidus. We compared k-mer and alignment reference-based pre-binning and post-binning approaches to remove contamination. We were also interested to know if the post-binning approach had interspersed Bacterial contamination within intragenic regions of Arthropoda-binned contigs. Our analyses identified 3,433 genes that were composed with reads identified as of putative bacterial origins. The pre-binning approach yielded a C. sordidus genome of 1.07Gb genome composed of 3,089 contigs with 98.6% and 97.1% complete and single copy genome and protein BUSCO scores respectively. In this paper, we demonstrate that in this case, the pre-binning approach does not sacrifice assembly quality for more stringent metagenomic filtering. We also determine post-binning allows for increased intragenic contamination increased with increasing coverage, but the frequency of gene contamination increased with lower coverage. Finally, NCBI’s new FCS-GX program was used as a final post-assembly classification approach and identified contamination in both pre- and post-binning assemblies. This indicates that both pre- and post-binning approaches are required to fully remove contamination. Future work should focus on developing reference-free pre-binning approaches for HiFi reads produced from eukaryotic-based metagenomic samples.