Skip to main content
Dryad

Chromonomer: a tool set for repairing and enhancing assembled genomes through integration of genetic maps and conserved synteny

Cite this dataset

Catchen, Julian; Amores, Angel; Bassham, Susan (2020). Chromonomer: a tool set for repairing and enhancing assembled genomes through integration of genetic maps and conserved synteny [Dataset]. Dryad. https://doi.org/10.5061/dryad.gtht76hjm

Abstract

The pace of the sequencing and computational assembly of novel reference genomes is accelerating. Though DNA sequencing technologies and assembly software tools continue to improve, biological features of genomes such as repetitive sequence as well as molecular artifacts that often accompany sequencing library preparation can lead to fragmented or chimeric assemblies. If left uncorrected, defects like these trammel progress on understanding genome structure and function, or worse, positively mislead this research. Fortunately, integration of additional, independent streams of information, such as a marker-dense genetic map and conserved orthologous gene order from related taxa, can be used to scaffold together unlinked, disordered fragments and to restructure a reference genome where it is incorrectly joined. We present a tool set for automating these processes, one that additionally tracks any changes to the assembly and to the genetic map, and which allows the user to scrutinize these changes with the help of web-based, graphical visualizations. Chromonomer takes a user-defined reference genome, a map of genetic markers, and, optionally, conserved synteny information to construct an improved reference genome of chromosome models: a “chromonome”. We demonstrate Chromonomer’s performance on genome assemblies and genetic maps that have disparate characteristics and levels of quality.

Methods

This dataset consists of several genome integrations, combining an assembled genome and a genetic map, to create chromosome-level assembly. The software chromonomer is used for the integration. There are three integrations of teleost fish, including:

  1. Gulf pipefish (Sygnathus scovelli)
  2. Platyfish (Xiphophorus maculatus)
  3. Black rockcod (Notothenia coriiceps)

The gulf pipefish integration began with data from the sequence read archive (SRA), NCBI BioProject Accession PRJNA355893, while the Sygnathus acus reference genome (NCBI accession GCA_901709675.1, BioProject PRJEB32741) was used as a base for the Sygnathus scovelli gene annotationa. The platyfish integration began with SRA data from NCBI accession GCA_002775205.2, BioProject PRJNA72525, with raw read data obtained from SRA accessions SRR7207855 - SRR7207868. The black rockcod genome assembly was obtained from NCBI accession GCA_000735185.1, BioProject PRJNA66471.

The genetic map for each fish is included in this repository, named *_unprocessed_consensus_genetic_map.tsv and the RADseq markers corresponding with the map are also provided as all_markers.fa.gz in each respective directory.

Usage notes

Each integration includes a shell script (chrr.sh) outlining how the chromonomer software was executed.

Funding

National Science Foundation, Award: 1645087

National Science Foundation, Award: 1543383

National Cancer Institute, Award: R01 ODO11116

National Cancer Institute, Award: R24 RR032670

National Science Foundation, Award: 1543383