Data from: Improvement of the threespine stickleback genome using a Hi-C-based Proximity-Guided Assembly
Cite this dataset
Peichel, Catherine L.; Sullivan, Shawn T.; Liachko, Ivan; White, Michael A. (2017). Data from: Improvement of the threespine stickleback genome using a Hi-C-based Proximity-Guided Assembly [Dataset]. Dryad. https://doi.org/10.5061/dryad.h7h32
Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based Proximity-Guided Assembly to perform a de novo genome assembly from relatively short contigs. Using Hi-C based Proximity-Guided Assembly, we generated complete chromosome assemblies from a distribution of short contigs (20–100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups, with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the two assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to linkage groups. Together, our results highlight the potential of the Hi-C based Proximity-Guided Assembly method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods.
Paxton Lake Benthic population