Skip to main content
Dryad

Data from: Improved genome assembly of the whiteleg shrimp Penaeus (Litopenaeus) vannamei using long- and short-read sequences from public databases

Data files

Abstract

A genome assembly contains the complete DNA sequence of a particular organism. This information is necessary to understand the organism's gene functions and genetic variability of their populations. In this study, the genome of the Pacific whiteleg shrimp Penaeus (Litopenaeus) vannamei was assembled using databases from the GenBank, the repository of DNA sequences of the National Institute of Health of the USA, which is of worldwide public access. The three tables and two figures contain supplementary information of the article JOH-2023-155.R1. The information is relevant for the analysis of the new reference-guided genome assembly of the whiteleg shrimp. The Supplementary Table 1 compares observed to expected chromosome sizes. The location of genetic markers in Supplementary Table 2 will be particularly relevant for future genome-wide association studies, which will look for the association of markers and/or genes to traits of interest for aquaculture, such as disease resistance, growth or fecundity. The Supplementary Table 3 shows that many markers tend to align in several parts of the genome indicating the great number of repeated regions in the shrimp's genome. The Supplementary Figure 1 shows the results of genome size estimation based on counting k-mers (substrings of length k contained within a DNA sequence). The Supplementary Figure 2 depicts the linear correlation between the observed and expected length of the assembled chromosomes. The Supplementary Materials 1 file contains the Perl script necessary to extract from the raw-data database, the mitochondrial DNA sequences that are not necessary, and can eventually interfere, in the genome assembly.