Data from: Genome-wide identification of microsatellites and transposable elements in the dromedary camel genome using whole genome sequencing data
Khalkhali-Evrigh, Reza et al. (2019), Data from: Genome-wide identification of microsatellites and transposable elements in the dromedary camel genome using whole genome sequencing data, Dryad, Dataset, https://doi.org/10.5061/dryad.10h185k
Transposable elements (TEs) along with simple sequence repeats (SSRs) are prevalent in eukaryotic genome, especially in mammals. Repetitive sequences form approximately one-third of the camelid genomes, so study on this part of genome can be helpful in providing deeper information from the genome and its evolutionary path. Here, in order to improve our understanding regarding the camel genome architecture, the whole genome of the two dromedaries (Yazdi and Trodi camels) was sequenced. Totally, 92- and 84.3-Gb sequence data were obtained and assembled to 137,772 and 149,997 contigs with a N50 length of 54,626 and 54,031 bp in Yazdi and Trodi camels, respectively. Results showed that 30.58% of Yazdi camel genome and 30.50% of Trodi camel genome were covered by TEs. Contrary to the observed results in the genomes of cattle, sheep, horse, and pig, no endogenous retrovirus-K (ERVK) elements were found in the camel genome. Distribution pattern of DNA transposons in the genomes of dromedary, Bactrian, and cattle was similar in contrast with LINE, SINE, and long terminal repeat (LTR) families. Elements like RTE-BovB belonging to LINEs family in cattle and sheep genomes are dramatically higher than genome of dromedary. However, LINE1 (L1) and LINE2 (L2) elements cover higher percentage of LINE family in dromedary genome compared to genome of cattle. Also, 540,133 and 539,409 microsatellites were identified from the assembled contigs of Yazdi and Trodi dromedary camels, respectively. In both samples, di-(393,196) and tri-(65,313) nucleotide repeats contributed to about 42.5% of the microsatellites. The findings of the present study revealed that non-repetitive content of mammalian genomes is approximately similar. Results showed that 9.1 Mb (0.47% of whole assembled genome) of Iranian dromedary’s genome length is made up of SSRs. Annotation of repetitive content of Iranian dromedary camel genome revealed that 9,068 and 11,544 genes contain different types of TEs and SSRs, respectively. SSR markers identified in the present study can be used as a valuable resource for genetic diversity investigations and marker-assisted selection (MAS) in camel-breeding programs.