Skip to main content
Dryad

Data from: Transposable element annotation in non-model species - on the benefits of species specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines

Abstract

Transposable elements (TEs) are significant genomic components which can be detected either through sequence homology against existing databases or de novo, with the latter potentially reducing underestimates of TE abundance. Here, we describe the semi-automated generation of a de-novo TE library which combines the newly described EDTA pipeline and DeepTE classifier in a non-model teleost (Corydoras sp. C115). We assess performance using both genomic and transcriptomic input by five metrics: (i) abundance (ii) composition (iii) fragmentation (iv) age distributions and (v) capture of potential horizontally transferred TEs. We identified notable differences in these metrics between different TE libraries, and highlight how  library choice can have a major impact on TE content estimates in non-model species.

This repository incorporates six raw (unparsed) Repeat Masker (RM) output files for two genomes (Corydoras sp. c115 and Corydoras maculifer) one transcriptome (C. maculifer), two Repeat Libraries (one based on the RepBase Danio rerio library and one de novo library build on the C. sp. c115 genome). The RM ouput files correspond to one homology based transposon search using the D. rerio library and one species specific search using the de novo library. It also includes a script to acompany horizontal transfer analysis and a transposable element renamins script.