Transposable element libraries from 101 fish
Data files
Sep 28, 2023 version files 84.80 MB
- 
              
                fish_1.scf.fasta.seq.denovolib.classified.Arctogadus_glacialis
                735.06 KB
 - 
              
                fish_10.scf.fasta.seq.denovolib.classified.Molva_molva
                537.21 KB
 - 
              
                fish_100.scf.fasta.seq.denovolib.classified.Clupea_harengus
                441.15 KB
 - 
              
                fish_101.scf.fasta.seq.denovolib.classified.Cyprinus_carpio
                798.52 KB
 - 
              
                fish_102.scf.fasta.seq.denovolib.classified.Electrophorus_electricus
                517.32 KB
 - 
              
                fish_103.scf.fasta.seq.denovolib.classified.Epinephelus_aeneus
                773.34 KB
 - 
              
                fish_104.scf.fasta.seq.denovolib.classified.Monopterus_albus
                517.16 KB
 - 
              
                fish_106.scf.fasta.seq.denovolib.classified.Pimephales_promelas
                1.15 MB
 - 
              
                fish_107.chr.fasta.seq.denovolib.classified.Salmo_salar
                1.48 MB
 - 
              
                fish_108.chr.fasta.seq.denovolib.classified.Cynoglossus_semilaevis
                304.20 KB
 - 
              
                fish_109.scf.fasta.seq.denovolib.classified.Danio_rerio
                2.67 MB
 - 
              
                fish_11.scf.fasta.seq.denovolib.classified.Lota_lota
                695.30 KB
 - 
              
                fish_110.chr.fasta.seq.denovolib.classified.Esox_lucius
                2 MB
 - 
              
                fish_111.chr.fasta.seq.denovolib.classified.Lepisosteus_oculatus
                630.54 KB
 - 
              
                fish_112.chr.fasta.seq.denovolib.classified.Oryzias_latipes
                878.15 KB
 - 
              
                fish_113.chr.fasta.seq.denovolib.classified.Takifugu_rubripes
                349.17 KB
 - 
              
                fish_114.scf.fasta.seq.denovolib.classified.Amphilophus_citrinellus
                728.75 KB
 - 
              
                fish_115.ctg.fasta.seq.denovolib.classified.Gasterosteus_aculeatus
                815.74 KB
 - 
              
                fish_116.ctg.fasta.seq.denovolib.classified.Pseudopleuronectes_yokohamae
                328.68 KB
 - 
              
                fish_117.chr.fasta.seq.denovolib.classified.Tetraodon_nigroviridis
                473.12 KB
 - 
              
                fish_118.ctg.fasta.seq.denovolib.classified.Thunnus_orientalis
                716.42 KB
 - 
              
                fish_119.scf.fasta.seq.denovolib.classified.Takifugu_flavidus
                184.56 KB
 - 
              
                fish_12.scf.fasta.seq.denovolib.classified.Brosme_brosme
                555.36 KB
 - 
              
                fish_120.scf.fasta.seq.denovolib.classified.Anguilla_anguilla
                496.59 KB
 - 
              
                fish_121.scf.fasta.seq.denovolib.classified.Anguilla_japonica
                478.21 KB
 - 
              
                fish_122.scf.fasta.seq.denovolib.classified.Astatotilapia_burtoni
                533.34 KB
 - 
              
                fish_123.scf.fasta.seq.denovolib.classified.Astyanax_mexicanus
                729.66 KB
 - 
              
                fish_124.scf.fasta.seq.denovolib.classified.Boleophthalmus_pectinirostris
                655.84 KB
 - 
              
                fish_125.scf.fasta.seq.denovolib.classified.Cyprinodon_variegatus
                760.58 KB
 - 
              
                fish_126.scf.fasta.seq.denovolib.classified.Cyprinodon_nevadensis
                660.97 KB
 - 
              
                fish_127.scf.fasta.seq.denovolib.classified.Dicentrarchus_labrax
                764.82 KB
 - 
              
                fish_128.scf.fasta.seq.denovolib.classified.Larimichthys_crocea
                412.18 KB
 - 
              
                fish_129.scf.fasta.seq.denovolib.classified.Metriaclima_zebra
                1.88 MB
 - 
              
                fish_13.scf.fasta.seq.denovolib.classified.Merluccius_merluccius
                1.23 MB
 - 
              
                fish_130.scf.fasta.seq.denovolib.classified.Neolamprologus_brichardi
                554.95 KB
 - 
              
                fish_131.scf.fasta.seq.denovolib.classified.Notothenia_coriiceps
                922.70 KB
 - 
              
                fish_132.scf.fasta.seq.denovolib.classified.Oreochromis_niloticus
                745.41 KB
 - 
              
                fish_133.scf.fasta.seq.denovolib.classified.Periophthalmodon_schlosseri
                494.94 KB
 - 
              
                fish_134.scf.fasta.seq.denovolib.classified.Periophthalmus_magnuspinnatus
                580.27 KB
 - 
              
                fish_135.scf.fasta.seq.denovolib.classified.Poecilia_formosa
                506.84 KB
 - 
              
                fish_136.scf.fasta.seq.denovolib.classified.Poecilia_reticulata
                646.43 KB
 - 
              
                fish_137.scf.fasta.seq.denovolib.classified.Pundamilia_nyererei
                541.43 KB
 - 
              
                fish_138.scf.fasta.seq.denovolib.classified.Scartelaos_histophorus
                435.44 KB
 - 
              
                fish_139.scf.fasta.seq.denovolib.classified.Sebastes_nigrocinctus
                691.98 KB
 - 
              
                fish_140.scf.fasta.seq.denovolib.classified.Stegastes_partitus
                674.51 KB
 - 
              
                fish_141.scf.fasta.seq.denovolib.classified.Xiphophorus_maculatus
                444.24 KB
 - 
              
                fish_142.scf.fasta.seq.denovolib.classified.Scleropages_formosus
                256.05 KB
 - 
              
                fish_143.scf.fasta.seq.denovolib.classified.Fundulus_heteroclitus
                825.32 KB
 - 
              
                fish_144.scf.fasta.seq.denovolib.classified.Syngnathus_typhle
                301.36 KB
 - 
              
                fish_15.scf.fasta.seq.denovolib.classified.Merluccius_polli
                1.21 MB
 - 
              
                fish_16.scf.fasta.seq.denovolib.classified.Melanonus_zugmayeri
                844.39 KB
 - 
              
                fish_17.scf.fasta.seq.denovolib.classified.Macrourus_berglax
                827.82 KB
 - 
              
                fish_18.scf.fasta.seq.denovolib.classified.Malacocephalus_occidentalis
                718.60 KB
 - 
              
                fish_19.scf.fasta.seq.denovolib.classified.Bathygadus_melanobranchus
                430.97 KB
 - 
              
                fish_2.scf.fasta.seq.denovolib.classified.Boreogadus_saida
                653.49 KB
 - 
              
                fish_20.scf.fasta.seq.denovolib.classified.Muraenolepis_marmoratus
                686.44 KB
 - 
              
                fish_21.scf.fasta.seq.denovolib.classified.Bregmaceros_cantori
                796.50 KB
 - 
              
                fish_22.scf.fasta.seq.denovolib.classified.Mora_moro
                576.10 KB
 - 
              
                fish_24.scf.fasta.seq.denovolib.classified.Polymixia_japonica
                831.04 KB
 - 
              
                fish_26.scf.fasta.seq.denovolib.classified.Percopsis_transmontana
                705.52 KB
 - 
              
                fish_27.scf.fasta.seq.denovolib.classified.Typhlichthys_subterraneus
                977.91 KB
 - 
              
                fish_28.scf.fasta.seq.denovolib.classified.Zeus_faber
                1.40 MB
 - 
              
                fish_3.scf.fasta.seq.denovolib.classified.Trisopterus_minutus
                810.75 KB
 - 
              
                fish_30.scf.fasta.seq.denovolib.classified.Cyttopsis_rosea
                1.15 MB
 - 
              
                fish_31.scf.fasta.seq.denovolib.classified.Lamprogrammus_exutus
                570.62 KB
 - 
              
                fish_32.scf.fasta.seq.denovolib.classified.Brotula_barbata
                365.48 KB
 - 
              
                fish_33.scf.fasta.seq.denovolib.classified.Carapus_acus
                440.21 KB
 - 
              
                fish_34.scf.fasta.seq.denovolib.classified.Myripristis_jacobus
                581.24 KB
 - 
              
                fish_35.scf.fasta.seq.denovolib.classified.Holocentrus_rufus
                515.62 KB
 - 
              
                fish_36.scf.fasta.seq.denovolib.classified.Trachyrincus_scabrus
                647.49 KB
 - 
              
                fish_4.scf.fasta.seq.denovolib.classified.Pollachius_virens
                1.02 MB
 - 
              
                fish_40.scf.fasta.seq.denovolib.classified.Chatrabus_melanurus
                946.16 KB
 - 
              
                fish_41.scf.fasta.seq.denovolib.classified.Opsanus_beta
                1.02 MB
 - 
              
                fish_42.scf.fasta.seq.denovolib.classified.Parasudis_fraserbrunneri
                895.71 KB
 - 
              
                fish_45.scf.fasta.seq.denovolib.classified.Synodus_synodus
                662.40 KB
 - 
              
                fish_47.scf.fasta.seq.denovolib.classified.Regalecus_glesne
                734.55 KB
 - 
              
                fish_48.scf.fasta.seq.denovolib.classified.Lampris_guttatus
                1.35 MB
 - 
              
                fish_5.scf.fasta.seq.denovolib.classified.Melanogrammus_aeglefinus
                1.10 MB
 - 
              
                fish_50.scf.fasta.seq.denovolib.classified.Guentherus_altivela
                933.81 KB
 - 
              
                fish_51.scf.fasta.seq.denovolib.classified.Lophius_vaillanti
                648.55 KB
 - 
              
                fish_52.scf.fasta.seq.denovolib.classified.Antennarius_striatus
                320.35 KB
 - 
              
                fish_54.scf.fasta.seq.denovolib.classified.Osmerus_eperlanus
                346.46 KB
 - 
              
                fish_55.scf.fasta.seq.denovolib.classified.Perca_fluviatilis
                1.03 MB
 - 
              
                fish_56.scf.fasta.seq.denovolib.classified.Sebastes_norvegicus
                620.05 KB
 - 
              
                fish_6.scf.fasta.seq.denovolib.classified.Merlangius_merlangus
                852.71 KB
 - 
              
                fish_61.scf.fasta.seq.denovolib.classified.Chaenocephalus_aceratus
                764.36 KB
 - 
              
                fish_65.scf.fasta.seq.denovolib.classified.Borostomias_antarcticus
                883.87 KB
 - 
              
                fish_66.scf.fasta.seq.denovolib.classified.Benthosema_glaciale
                606.77 KB
 - 
              
                fish_67.scf.fasta.seq.denovolib.classified.Cetomimus_sp
                782.75 KB
 - 
              
                fish_68.scf.fasta.seq.denovolib.classified.Rondeletia_loricata
                944.88 KB
 - 
              
                fish_69.scf.fasta.seq.denovolib.classified.Beryx_splendens
                789.49 KB
 - 
              
                fish_7.scf.fasta.seq.denovolib.classified.Theragra_chalcogramma
                739.40 KB
 - 
              
                fish_70.scf.fasta.seq.denovolib.classified.Neoniphon_sammara
                604.55 KB
 - 
              
                fish_71.scf.fasta.seq.denovolib.classified.Anoplogaster_cornuta
                612.77 KB
 - 
              
                fish_72.scf.fasta.seq.denovolib.classified.Diretmus_argenteus
                778.30 KB
 - 
              
                fish_73.scf.fasta.seq.denovolib.classified.Diretmoides_pauciradiatus
                779.12 KB
 - 
              
                fish_74.scf.fasta.seq.denovolib.classified.Monocentris_japonica
                676.05 KB
 - 
              
                fish_75.scf.fasta.seq.denovolib.classified.Gephyroberyx_darwinii
                675.60 KB
 - 
              
                fish_76.scf.fasta.seq.denovolib.classified.Hoplostethus_atlanticus
                614.71 KB
 - 
              
                fish_79.scf.fasta.seq.denovolib.classified.Acanthochaenus_luetkenii
                932.71 KB
 - 
              
                fish_8.scf.fasta.seq.denovolib.classified.Gadiculus_argenteus
                935.87 KB
 - 
              
                fish_80.scf.fasta.seq.denovolib.classified.Stylephorus_chordatus
                1.13 MB
 - 
              
                fish_81.scf.fasta.seq.denovolib.classified.Spondyliosoma_cantharus
                546.37 KB
 - 
              
                fish_83.scf.fasta.seq.denovolib.classified.Thunnus_albacares
                705.50 KB
 - 
              
                fish_84.scf.fasta.seq.denovolib.classified.Helostoma_temminkii
                655.30 KB
 - 
              
                fish_85.scf.fasta.seq.denovolib.classified.Anabas_testudineus
                289.87 KB
 - 
              
                fish_86.scf.fasta.seq.denovolib.classified.Selene_dorsalis
                416.31 KB
 - 
              
                fish_87.scf.fasta.seq.denovolib.classified.Chromis_chromis
                603.50 KB
 - 
              
                fish_88.scf.fasta.seq.denovolib.classified.Parablennius_parvicornis
                602.24 KB
 - 
              
                fish_89.scf.fasta.seq.denovolib.classified.Symphodus_melops
                417.71 KB
 - 
              
                fish_9.scf.fasta.seq.denovolib.classified.Phycis_phycis
                550.41 KB
 - 
              
                fish_90.scf.fasta.seq.denovolib.classified.Pseudochromis_fuscus
                379.60 KB
 - 
              
                fish_91.scf.fasta.seq.denovolib.classified.Myoxocephalus_scorpius
                493.63 KB
 - 
              
                fish_95.scf.fasta.seq.denovolib.classified.Phycis_blennoides
                292.53 KB
 - 
              
                fish_96.scf.fasta.seq.denovolib.classified.Lesueurigobius_cf._sanzoi
                796.54 KB
 - 
              
                fish_97.scf.fasta.seq.denovolib.classified.Gadus_morhua
                478.57 KB
 - 
              
                fish_98.scf.fasta.seq.denovolib.classified.Caranx_ignobilis
                298.76 KB
 - 
              
                fish_99.scf.fasta.seq.denovolib.classified.Caranx_melampygus
                282.04 KB
 - 
              
                README.md
                11.43 KB
 
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we found that TE proportion correlate to genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish have large differences in STR content. The most extreme propagation was found in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
This repository contain de novo libraries of transposable element (TE) consensus sequences (FASTA file), one per genome assembly. The results of masking each genome assembly with RepeatMasker using these de novo libraries can be found at https://doi.org/10.6084/m9.figshare.8280800.
Description of the data and file structure
Each file is one TE library (FASTA file) that can be used to mask a genome assembly. To generate the libraries, we used a variant of the computational pipeline that is more thoroughly described in (Trresen et al. 2017), available at https://github.com/uio-cels/Repeats. The pipeline includes multiple TE detection steps using different tools, steps for removing non-TEs from the detected sequences and steps for classifying the elements. For the initial detection step, we used RepeatModeler (v. 1.0.8) (Smit & Hubley 2008-2015) and LTRharvest (part of GenomeTools v. 1.5.7) (Ellinghaus et al. 2008). RepeatModeler detects all sorts of repetitive sequences and LTRharvest is specialized for detecting LTR-RTs. Using BLASTX, TEs with sequences matching known non-TEs in UniProtKB/Swiss-Prot were removed. To classify the TEs, we used RepeatClassifier, which is a part of the RepeatModeler software. As the tool did not manage to classify all of the remaining sequences, additional similarity searches were performed between the sequences and a curated library of TE sequences (RepBase v. 20150807), using nucleotide BLAST. Finally, we built Hidden Markov Model profiles from the detected sequences using HMMER (v. 3.1b1) (Wheeler & Eddy 2013) and compared the profiles with HMM profiles from databases downloaded from GyDB.org (Llorens et al. 2011) and dfam.org (Hubley et al. 2016), using the nhmmer feature included in HMMER. This resulted in additional sequences being classified at the class and subclass level. The pipeline resulted in one de novo library per assembly, which contained the consensus sequences of the interspersed repeats detected in each assembly.
Sharing/Access information
The source genome assemblies used to generate the consensus libraries was retrieved from the following sources:
SOURCE			SPECIES
Malmstrom et al. 2017	Acanthochaenus luetkenii
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000751415.1\_Midas\_v5/GCA\_000751415.1\_Midas\_v5\_genomic.fna.gz	Amphilophus citrinellus
Malmstrom et al. 2017	Anabas testudineus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000695075.1\_Anguilla\_anguilla\_v1\_09\_nov\_10/GCA\_000695075.1\_Anguilla\_anguilla\_v1\_09\_nov\_10\_genomic.fna.gz	Anguilla anguilla
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000470695.1\_japanese\_eel\_genome\_v1\_25\_oct\_2011\_japonica\_c401b400k25m200\_sspacepremiumk3a02n24\_extra.final.scaffolds/GCA\_000470695.1\_japanese\_eel\_genome\_v1\_25\_oct\_2011\_japonica\_c401b400k25m200\_sspacepremiumk3a02n24\_extra.final.scaffolds\_genomic.fna.gz	Anguilla japonica
Musilova et al. 2018	Anoplogaster cornuta
Malmstrom et al. 2017	Antennarius striatus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000239415.1\_AstBur1.0/GCF\_000239415.1\_AstBur1.0\_genomic.fna.gz	Astatotilapia burtoni
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000372685.1\_Astyanax\_mexicanus-1.0.2/GCF\_000372685.1\_Astyanax\_mexicanus-1.0.2\_genomic.fna.gz	Astyanax mexicanus
Malmstrom et al. 2017	Bathygadus melanobranchus
Malmstrom et al. 2017	Benthosema glaciale
Malmstrom et al. 2017	Beryx splendens
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000788275.1\_BP.fa/GCA\_000788275.1\_BP.fa\_genomic.fna.gz	Boleophthalmus pectinirostris
Malmstrom et al. 2017	Borostomias antarcticus
Malmstrom et al. 2017	Brosme brosme
Malmstrom et al. 2017	Brotula barbata
SRX360276, GenBank, see Musilova et al. 2018	Caranx ignobilis
SRX360285, GenBank, see Musilova et al. 2018	Caranx melampygus
Malmstrom et al. 2017	Carapus acus
Musilova et al. 2018	Cetomimus sp
Malmstrom et al. 2017	Chaenocephalus aceratus
Malmstrom et al. 2017	Chatrabus melanurus
Malmstrom et al. 2017	Chromis chromis
SRX203077, GenBank, see Musilova et al. 2018	Clupea harengus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000523025.1\_Cse\_v1.0/GCF\_000523025.1\_Cse\_v1.0\_genomic.fna.gz	Cynoglossus semilaevis
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000776015.1\_ASM77601v1/GCA\_000776015.1\_ASM77601v1\_genomic.fna.gz	Cyprinodon nevadensis
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000776015.1\_ASM77601v1/GCA\_000776015.1\_ASM77601v1\_genomic.fna.gz	Cyprinodon variegatus
SRX317090, GenBank, see Musilova et al. 2018	Cyprinus carpio
Malmstrom et al. 2017	Cyttopsis rosea
ftp://ftp.ensembl.org/pub//release-78/fasta/danio\_rerio/dna/Danio\_rerio.Zv9.dna.toplevel.fa.gz	Danio rerio
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000689215.1\_seabass\_V1.0/GCA\_000689215.1\_seabass\_V1.0\_genomic.fna.gz	Dicentrarchus labrax
Musilova et al. 2018	Diretmoides pauciradiatus
Musilova et al. 2018	Diretmus argenteus
SRX554947, GenBank, see Musilova et al. 2018	Electrophorus electricus
ERX432347, GenBank, see Musilova et al. 2018	Epinephelus aeneus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000721915.2\_ASM72191v2/GCF\_000721915.2\_ASM72191v2\_genomic.fna.gz	Esox lucius
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000826765.1\_Fundulus\_heteroclitus-3.0.2/GCF\_000826765.1\_Fundulus\_heteroclitus-3.0.2\_genomic.fna.gz	Fundulus heteroclitus
Malmstrom et al. 2017	Gadus morhua
ftp://ftp.ensembl.org/pub//release-78/fasta/gasterosteus\_aculeatus/dna/Gasterosteus\_aculeatus.BROADS1.dna.toplevel.fa.gz	Gasterosteus aculeatus
Musilova et al. 2018	Gephyroberyx darwinii
Malmstrom et al. 2017	Guentherus altivela
Malmstrom et al. 2017	Helostoma temminkii
Malmstrom et al. 2017	Holocentrus rufus
Musilova et al. 2018	Hoplostethus atlanticus
Malmstrom et al. 2017	Lampris guttatus
Malmstrom et al. 2017	Lamprogrammus exutus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000742935.1\_ASM74293v1/GCF\_000742935.1\_ASM74293v1\_genomic.fna.gz	Larimichthys crocea
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000242695.1\_LepOcu1/GCF\_000242695.1\_LepOcu1\_genomic.fna.gz	Lepisosteus oculatus
Malmstrom et al. 2017	Lesueurigobius cf. sanzoi
Musilova et al. 2018 (unpublished)	Lophius vaillanti
Malmstrom et al. 2017	Macrourus berglax
Malmstrom et al. 2017	Merluccius polli
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000238955.2\_M\_zebra\_UMD1/GCF\_000238955.2\_M\_zebra\_UMD1\_genomic.fna.gz	Metriaclima zebra
Malmstrom et al. 2017	Monocentris japonica
SRX218060, GenBank, see Musilova et al. 2018	Monopterus albus
Malmstrom et al. 2017	Mora moro
Malmstrom et al. 2017	Myoxocephalus scorpius
Malmstrom et al. 2017	Myripristis jacobus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000239395.1\_NeoBri1.0/GCF\_000239395.1\_NeoBri1.0\_genomic.fna.gz	Neolamprologus brichardi
Malmstrom et al. 2017	Neoniphon sammara
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000735185.1\_NC01/GCF\_000735185.1\_NC01\_genomic.fna.gz	Notothenia coriiceps
Musilova et al. 2018	Opsanus beta
ftp://ftp.ensembl.org/pub//release-78/fasta/oreochromis\_niloticus/dna/Oreochromis\_niloticus.Orenil1.0.dna.toplevel.fa.gz	Oreochromis niloticus
ftp://ftp.ensembl.org/pub//release-78/fasta/oryzias\_latipes/dna/Oryzias\_latipes.MEDAKA1.dna.toplevel.fa.gz	Oryzias latipes
Malmstrom et al. 2017	Osmerus eperlanus
Malmstrom et al. 2017	Parablennius parvicornis
Malmstrom et al. 2017	Parasudis fraserbrunneri
Malmstrom et al. 2017	Perca fluviatilis
Malmstrom et al. 2017	Percopsis transmontana
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000787095.1\_PS.fa/GCA\_000787095.1\_PS.fa\_genomic.fna.gz	Periophthalmodon schlosseri
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000787105.1\_PM.fa/GCA\_000787105.1\_PM.fa\_genomic.fna.gz	Periophthalmus magnuspinnatus
SRX423854, GenBank, see Musilova et al. 2018	Pimephales promelas
ftp://ftp.ensembl.org/pub//release-78/fasta/poecilia\_formosa/dna/Poecilia\_formosa.PoeFor\_5.1.2.dna.toplevel.fa.gz	Poecilia formosa
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000633615.1\_Guppy\_female\_1.0\_MT/GCF\_000633615.1\_Guppy\_female\_1.0\_MT\_genomic.fna.gz	Poecilia reticulata
Malmstrom et al. 2017	Polymixia japonica
Malmstrom et al. 2017	Pseudochromis fuscus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000787555.1\_Pyoko\_1.0/GCA\_000787555.1\_Pyoko\_1.0\_genomic.fna.gz	Pseudopleuronectes yokohamae
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000239375.1\_PunNye1.0/GCF\_000239375.1\_PunNye1.0\_genomic.fna.gz	Pundamilia nyererei
Malmstrom et al. 2017	Regalecus glesne
Malmstrom et al. 2017	Rondeletia loricata
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000233375.1\_ICSASG\_v2/GCF\_000233375.1\_ICSASG\_v2\_genomic.fna.gz	Salmo salar
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000787155.1\_SH.fa/GCA\_000787155.1\_SH.fa\_genomic.fna.gz	Scartelaos histophorus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_001005745.2\_aro\_v2/GCA\_001005745.2\_aro\_v2\_genomic.fna.gz	Scleropages formosus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000475235.1\_Snig1.0/GCA\_000475235.1\_Snig1.0\_genomic.fna.gz	Sebastes nigrocinctus
Malmstrom et al. 2017	Sebastes norvegicus
Malmstrom et al. 2017	Selene dorsalis
Malmstrom et al. 2017	Spondyliosoma cantharus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF\_000690725.1\_Stegastes\_partitus-1.0.2/GCF\_000690725.1\_Stegastes\_partitus-1.0.2\_genomic.fna.gz	Stegastes partitus
Malmstrom et al. 2017	Stylephorus chordatus
Malmstrom et al. 2017	Symphodus melops
Musilova et al. 2018 (unpublished)	Syngnathus typhle
Musilova et al. 2018 (unpublished)	Synodus synodus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000400755.1\_version\_1\_of\_Takifugu\_flavidus\_genome/GCA\_000400755.1\_version\_1\_of\_Takifugu\_flavidus\_genome\_genomic.fna.gz	Takifugu flavidus
ftp://ftp.ensembl.org/pub//release-78/fasta/takifugu\_rubripes/dna/Takifugu\_rubripes.FUGU4.dna.toplevel.fa.gz	Takifugu rubripes
ftp://ftp.ensembl.org/pub//release-78/fasta/tetraodon\_nigroviridis/dna/Tetraodon\_nigroviridis.TETRAODON8.dna.toplevel.fa.gz	Tetraodon nigroviridis
Malmstrom et al. 2017	Thunnus albacares
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA\_000418415.1\_Thunnus\_orientalis\_ver\_Ba\_1.0/GCA\_000418415.1\_Thunnus\_orientalis\_ver\_Ba\_1.0\_genomic.fna.gz	Thunnus orientalis
Malmstrom et al. 2017	Trisopterus minutus
Malmstrom et al. 2017	Typhlichthys subterraneus
ftp://ftp.ensembl.org/pub//release-78/fasta/xiphophorus\_maculatus/dna/Xiphophorus\_maculatus.Xipmac4.4.2.dna.toplevel.fa.gz	Xiphophorus maculatus
Malmstrom et al. 2017	Zeus faber
Malmstrom et al. 2017	Arctogadus glacialis
Malmstrom et al. 2017	Molva molva
Malmstrom et al. 2017	Lota lota
Malmstrom et al. 2017	Brosme brosme
Malmstrom et al. 2017	Merluccius merluccius
Malmstrom et al. 2017	Merluccius polli
Malmstrom et al. 2017	Melanonus zugmayeri
Malmstrom et al. 2017	Macrourus berglax
Malmstrom et al. 2017	Malacocephalus occidentalis
Malmstrom et al. 2017	Bathygadus melanobranchus
Malmstrom et al. 2017	Boreogadus saida
Malmstrom et al. 2017	Muraenolepis marmoratus
Malmstrom et al. 2017	Bregmaceros cantori
Malmstrom et al. 2017	Mora moro
Malmstrom et al. 2017	Trisopterus minutus
Malmstrom et al. 2017	Trachyrincus scabrus
Malmstrom et al. 2017	Pollachius virens
Malmstrom et al. 2017	Melanogrammus aeglefinus
Malmstrom et al. 2017	Merlangius merlangus
Malmstrom et al. 2017	Theragra chalcogramma
Malmstrom et al. 2017	Gadiculus argenteus
Malmstrom et al. 2017	Phycis phycis
Malmstrom et al. 2017	Phycis blennoides
Malmstrom et al. 2017	Gadus morhua
For TE annotation, we used a variant of the computational pipeline that is more thoroughly described in (Tørresen et al. 2017), available at https://github.com/uio-cels/Repeats. The pipeline includes multiple TE detection steps using different tools, steps for removing non-TEs from the detected sequences and steps for classifying the elements. For the initial detection step, we used RepeatModeler (v. 1.0.8) (Smit & Hubley 2008-2015) and LTRharvest (part of GenomeTools v. 1.5.7) (Ellinghaus et al. 2008). RepeatModeler detects all sorts of repetitive sequences and LTRharvest is specialized for detecting LTR-RTs. Using BLASTX, TEs with sequences matching known non-TEs in UniProtKB/Swiss-Prot were removed. To classify the TEs, we used RepeatClassifier, which is a part of the RepeatModeler software. As the tool did not manage to classify all of the remaining sequences, additional similarity searches were performed between the sequences and a curated library of TE sequences (RepBase v. 20150807), using nucleotide BLAST. Finally, we built Hidden Markov Model profiles from the detected sequences using HMMER (v. 3.1b1) (Wheeler & Eddy 2013) and compared the profiles with HMM profiles from databases downloaded from GyDB.org (Llorens et al. 2011) and dfam.org (Hubley et al. 2016), using the nhmmer feature included in HMMER. This resulted in additional sequences being classified at the class and subclass level. The pipeline resulted in one de novo library per assembly, which contained the consensus sequences of the interspersed repeats detected in each assembly.
This repository contain one de novo library (FASTA file) per genome assembly. The results of masking each genome assembly with RepeatMasker using these de novo libraries can be found at https://doi.org/10.6084/m9.figshare.8280800.
