Transposable element libraries from 101 fish
Data files
Sep 28, 2023 version files 84.80 MB
-
fish_1.scf.fasta.seq.denovolib.classified.Arctogadus_glacialis
735.06 KB
-
fish_10.scf.fasta.seq.denovolib.classified.Molva_molva
537.21 KB
-
fish_100.scf.fasta.seq.denovolib.classified.Clupea_harengus
441.15 KB
-
fish_101.scf.fasta.seq.denovolib.classified.Cyprinus_carpio
798.52 KB
-
fish_102.scf.fasta.seq.denovolib.classified.Electrophorus_electricus
517.32 KB
-
fish_103.scf.fasta.seq.denovolib.classified.Epinephelus_aeneus
773.34 KB
-
fish_104.scf.fasta.seq.denovolib.classified.Monopterus_albus
517.16 KB
-
fish_106.scf.fasta.seq.denovolib.classified.Pimephales_promelas
1.15 MB
-
fish_107.chr.fasta.seq.denovolib.classified.Salmo_salar
1.48 MB
-
fish_108.chr.fasta.seq.denovolib.classified.Cynoglossus_semilaevis
304.20 KB
-
fish_109.scf.fasta.seq.denovolib.classified.Danio_rerio
2.67 MB
-
fish_11.scf.fasta.seq.denovolib.classified.Lota_lota
695.30 KB
-
fish_110.chr.fasta.seq.denovolib.classified.Esox_lucius
2 MB
-
fish_111.chr.fasta.seq.denovolib.classified.Lepisosteus_oculatus
630.54 KB
-
fish_112.chr.fasta.seq.denovolib.classified.Oryzias_latipes
878.15 KB
-
fish_113.chr.fasta.seq.denovolib.classified.Takifugu_rubripes
349.17 KB
-
fish_114.scf.fasta.seq.denovolib.classified.Amphilophus_citrinellus
728.75 KB
-
fish_115.ctg.fasta.seq.denovolib.classified.Gasterosteus_aculeatus
815.74 KB
-
fish_116.ctg.fasta.seq.denovolib.classified.Pseudopleuronectes_yokohamae
328.68 KB
-
fish_117.chr.fasta.seq.denovolib.classified.Tetraodon_nigroviridis
473.12 KB
-
fish_118.ctg.fasta.seq.denovolib.classified.Thunnus_orientalis
716.42 KB
-
fish_119.scf.fasta.seq.denovolib.classified.Takifugu_flavidus
184.56 KB
-
fish_12.scf.fasta.seq.denovolib.classified.Brosme_brosme
555.36 KB
-
fish_120.scf.fasta.seq.denovolib.classified.Anguilla_anguilla
496.59 KB
-
fish_121.scf.fasta.seq.denovolib.classified.Anguilla_japonica
478.21 KB
-
fish_122.scf.fasta.seq.denovolib.classified.Astatotilapia_burtoni
533.34 KB
-
fish_123.scf.fasta.seq.denovolib.classified.Astyanax_mexicanus
729.66 KB
-
fish_124.scf.fasta.seq.denovolib.classified.Boleophthalmus_pectinirostris
655.84 KB
-
fish_125.scf.fasta.seq.denovolib.classified.Cyprinodon_variegatus
760.58 KB
-
fish_126.scf.fasta.seq.denovolib.classified.Cyprinodon_nevadensis
660.97 KB
-
fish_127.scf.fasta.seq.denovolib.classified.Dicentrarchus_labrax
764.82 KB
-
fish_128.scf.fasta.seq.denovolib.classified.Larimichthys_crocea
412.18 KB
-
fish_129.scf.fasta.seq.denovolib.classified.Metriaclima_zebra
1.88 MB
-
fish_13.scf.fasta.seq.denovolib.classified.Merluccius_merluccius
1.23 MB
-
fish_130.scf.fasta.seq.denovolib.classified.Neolamprologus_brichardi
554.95 KB
-
fish_131.scf.fasta.seq.denovolib.classified.Notothenia_coriiceps
922.70 KB
-
fish_132.scf.fasta.seq.denovolib.classified.Oreochromis_niloticus
745.41 KB
-
fish_133.scf.fasta.seq.denovolib.classified.Periophthalmodon_schlosseri
494.94 KB
-
fish_134.scf.fasta.seq.denovolib.classified.Periophthalmus_magnuspinnatus
580.27 KB
-
fish_135.scf.fasta.seq.denovolib.classified.Poecilia_formosa
506.84 KB
-
fish_136.scf.fasta.seq.denovolib.classified.Poecilia_reticulata
646.43 KB
-
fish_137.scf.fasta.seq.denovolib.classified.Pundamilia_nyererei
541.43 KB
-
fish_138.scf.fasta.seq.denovolib.classified.Scartelaos_histophorus
435.44 KB
-
fish_139.scf.fasta.seq.denovolib.classified.Sebastes_nigrocinctus
691.98 KB
-
fish_140.scf.fasta.seq.denovolib.classified.Stegastes_partitus
674.51 KB
-
fish_141.scf.fasta.seq.denovolib.classified.Xiphophorus_maculatus
444.24 KB
-
fish_142.scf.fasta.seq.denovolib.classified.Scleropages_formosus
256.05 KB
-
fish_143.scf.fasta.seq.denovolib.classified.Fundulus_heteroclitus
825.32 KB
-
fish_144.scf.fasta.seq.denovolib.classified.Syngnathus_typhle
301.36 KB
-
fish_15.scf.fasta.seq.denovolib.classified.Merluccius_polli
1.21 MB
-
fish_16.scf.fasta.seq.denovolib.classified.Melanonus_zugmayeri
844.39 KB
-
fish_17.scf.fasta.seq.denovolib.classified.Macrourus_berglax
827.82 KB
-
fish_18.scf.fasta.seq.denovolib.classified.Malacocephalus_occidentalis
718.60 KB
-
fish_19.scf.fasta.seq.denovolib.classified.Bathygadus_melanobranchus
430.97 KB
-
fish_2.scf.fasta.seq.denovolib.classified.Boreogadus_saida
653.49 KB
-
fish_20.scf.fasta.seq.denovolib.classified.Muraenolepis_marmoratus
686.44 KB
-
fish_21.scf.fasta.seq.denovolib.classified.Bregmaceros_cantori
796.50 KB
-
fish_22.scf.fasta.seq.denovolib.classified.Mora_moro
576.10 KB
-
fish_24.scf.fasta.seq.denovolib.classified.Polymixia_japonica
831.04 KB
-
fish_26.scf.fasta.seq.denovolib.classified.Percopsis_transmontana
705.52 KB
-
fish_27.scf.fasta.seq.denovolib.classified.Typhlichthys_subterraneus
977.91 KB
-
fish_28.scf.fasta.seq.denovolib.classified.Zeus_faber
1.40 MB
-
fish_3.scf.fasta.seq.denovolib.classified.Trisopterus_minutus
810.75 KB
-
fish_30.scf.fasta.seq.denovolib.classified.Cyttopsis_rosea
1.15 MB
-
fish_31.scf.fasta.seq.denovolib.classified.Lamprogrammus_exutus
570.62 KB
-
fish_32.scf.fasta.seq.denovolib.classified.Brotula_barbata
365.48 KB
-
fish_33.scf.fasta.seq.denovolib.classified.Carapus_acus
440.21 KB
-
fish_34.scf.fasta.seq.denovolib.classified.Myripristis_jacobus
581.24 KB
-
fish_35.scf.fasta.seq.denovolib.classified.Holocentrus_rufus
515.62 KB
-
fish_36.scf.fasta.seq.denovolib.classified.Trachyrincus_scabrus
647.49 KB
-
fish_4.scf.fasta.seq.denovolib.classified.Pollachius_virens
1.02 MB
-
fish_40.scf.fasta.seq.denovolib.classified.Chatrabus_melanurus
946.16 KB
-
fish_41.scf.fasta.seq.denovolib.classified.Opsanus_beta
1.02 MB
-
fish_42.scf.fasta.seq.denovolib.classified.Parasudis_fraserbrunneri
895.71 KB
-
fish_45.scf.fasta.seq.denovolib.classified.Synodus_synodus
662.40 KB
-
fish_47.scf.fasta.seq.denovolib.classified.Regalecus_glesne
734.55 KB
-
fish_48.scf.fasta.seq.denovolib.classified.Lampris_guttatus
1.35 MB
-
fish_5.scf.fasta.seq.denovolib.classified.Melanogrammus_aeglefinus
1.10 MB
-
fish_50.scf.fasta.seq.denovolib.classified.Guentherus_altivela
933.81 KB
-
fish_51.scf.fasta.seq.denovolib.classified.Lophius_vaillanti
648.55 KB
-
fish_52.scf.fasta.seq.denovolib.classified.Antennarius_striatus
320.35 KB
-
fish_54.scf.fasta.seq.denovolib.classified.Osmerus_eperlanus
346.46 KB
-
fish_55.scf.fasta.seq.denovolib.classified.Perca_fluviatilis
1.03 MB
-
fish_56.scf.fasta.seq.denovolib.classified.Sebastes_norvegicus
620.05 KB
-
fish_6.scf.fasta.seq.denovolib.classified.Merlangius_merlangus
852.71 KB
-
fish_61.scf.fasta.seq.denovolib.classified.Chaenocephalus_aceratus
764.36 KB
-
fish_65.scf.fasta.seq.denovolib.classified.Borostomias_antarcticus
883.87 KB
-
fish_66.scf.fasta.seq.denovolib.classified.Benthosema_glaciale
606.77 KB
-
fish_67.scf.fasta.seq.denovolib.classified.Cetomimus_sp
782.75 KB
-
fish_68.scf.fasta.seq.denovolib.classified.Rondeletia_loricata
944.88 KB
-
fish_69.scf.fasta.seq.denovolib.classified.Beryx_splendens
789.49 KB
-
fish_7.scf.fasta.seq.denovolib.classified.Theragra_chalcogramma
739.40 KB
-
fish_70.scf.fasta.seq.denovolib.classified.Neoniphon_sammara
604.55 KB
-
fish_71.scf.fasta.seq.denovolib.classified.Anoplogaster_cornuta
612.77 KB
-
fish_72.scf.fasta.seq.denovolib.classified.Diretmus_argenteus
778.30 KB
-
fish_73.scf.fasta.seq.denovolib.classified.Diretmoides_pauciradiatus
779.12 KB
-
fish_74.scf.fasta.seq.denovolib.classified.Monocentris_japonica
676.05 KB
-
fish_75.scf.fasta.seq.denovolib.classified.Gephyroberyx_darwinii
675.60 KB
-
fish_76.scf.fasta.seq.denovolib.classified.Hoplostethus_atlanticus
614.71 KB
-
fish_79.scf.fasta.seq.denovolib.classified.Acanthochaenus_luetkenii
932.71 KB
-
fish_8.scf.fasta.seq.denovolib.classified.Gadiculus_argenteus
935.87 KB
-
fish_80.scf.fasta.seq.denovolib.classified.Stylephorus_chordatus
1.13 MB
-
fish_81.scf.fasta.seq.denovolib.classified.Spondyliosoma_cantharus
546.37 KB
-
fish_83.scf.fasta.seq.denovolib.classified.Thunnus_albacares
705.50 KB
-
fish_84.scf.fasta.seq.denovolib.classified.Helostoma_temminkii
655.30 KB
-
fish_85.scf.fasta.seq.denovolib.classified.Anabas_testudineus
289.87 KB
-
fish_86.scf.fasta.seq.denovolib.classified.Selene_dorsalis
416.31 KB
-
fish_87.scf.fasta.seq.denovolib.classified.Chromis_chromis
603.50 KB
-
fish_88.scf.fasta.seq.denovolib.classified.Parablennius_parvicornis
602.24 KB
-
fish_89.scf.fasta.seq.denovolib.classified.Symphodus_melops
417.71 KB
-
fish_9.scf.fasta.seq.denovolib.classified.Phycis_phycis
550.41 KB
-
fish_90.scf.fasta.seq.denovolib.classified.Pseudochromis_fuscus
379.60 KB
-
fish_91.scf.fasta.seq.denovolib.classified.Myoxocephalus_scorpius
493.63 KB
-
fish_95.scf.fasta.seq.denovolib.classified.Phycis_blennoides
292.53 KB
-
fish_96.scf.fasta.seq.denovolib.classified.Lesueurigobius_cf._sanzoi
796.54 KB
-
fish_97.scf.fasta.seq.denovolib.classified.Gadus_morhua
478.57 KB
-
fish_98.scf.fasta.seq.denovolib.classified.Caranx_ignobilis
298.76 KB
-
fish_99.scf.fasta.seq.denovolib.classified.Caranx_melampygus
282.04 KB
-
README.md
11.43 KB
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we found that TE proportion correlate to genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish have large differences in STR content. The most extreme propagation was found in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
README: Teleost TE libraries
This repository contain de novo libraries of transposable element (TE) consensus sequences (FASTA file), one per genome assembly. The results of masking each genome assembly with RepeatMasker using these de novo libraries can be found at https://doi.org/10.6084/m9.figshare.8280800.
Description of the data and file structure
Each file is one TE library (FASTA file) that can be used to mask a genome assembly. To generate the libraries, we used a variant of the computational pipeline that is more thoroughly described in (Trresen et al. 2017), available at https://github.com/uio-cels/Repeats. The pipeline includes multiple TE detection steps using different tools, steps for removing non-TEs from the detected sequences and steps for classifying the elements. For the initial detection step, we used RepeatModeler (v. 1.0.8) (Smit & Hubley 2008-2015) and LTRharvest (part of GenomeTools v. 1.5.7) (Ellinghaus et al. 2008). RepeatModeler detects all sorts of repetitive sequences and LTRharvest is specialized for detecting LTR-RTs. Using BLASTX, TEs with sequences matching known non-TEs in UniProtKB/Swiss-Prot were removed. To classify the TEs, we used RepeatClassifier, which is a part of the RepeatModeler software. As the tool did not manage to classify all of the remaining sequences, additional similarity searches were performed between the sequences and a curated library of TE sequences (RepBase v. 20150807), using nucleotide BLAST. Finally, we built Hidden Markov Model profiles from the detected sequences using HMMER (v. 3.1b1) (Wheeler & Eddy 2013) and compared the profiles with HMM profiles from databases downloaded from GyDB.org (Llorens et al. 2011) and dfam.org (Hubley et al. 2016), using the nhmmer feature included in HMMER. This resulted in additional sequences being classified at the class and subclass level. The pipeline resulted in one de novo library per assembly, which contained the consensus sequences of the interspersed repeats detected in each assembly.
Sharing/Access information
The source genome assemblies used to generate the consensus libraries was retrieved from the following sources:
SOURCE SPECIES
Malmstrom et al. 2017 Acanthochaenus luetkenii
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000751415.1_Midas_v5/GCA_000751415.1_Midas_v5_genomic.fna.gz Amphilophus citrinellus
Malmstrom et al. 2017 Anabas testudineus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000695075.1_Anguilla_anguilla_v1_09_nov_10/GCA_000695075.1_Anguilla_anguilla_v1_09_nov_10_genomic.fna.gz Anguilla anguilla
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000470695.1_japanese_eel_genome_v1_25_oct_2011_japonica_c401b400k25m200_sspacepremiumk3a02n24_extra.final.scaffolds/GCA_000470695.1_japanese_eel_genome_v1_25_oct_2011_japonica_c401b400k25m200_sspacepremiumk3a02n24_extra.final.scaffolds_genomic.fna.gz Anguilla japonica
Musilova et al. 2018 Anoplogaster cornuta
Malmstrom et al. 2017 Antennarius striatus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000239415.1_AstBur1.0/GCF_000239415.1_AstBur1.0_genomic.fna.gz Astatotilapia burtoni
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000372685.1_Astyanax_mexicanus-1.0.2/GCF_000372685.1_Astyanax_mexicanus-1.0.2_genomic.fna.gz Astyanax mexicanus
Malmstrom et al. 2017 Bathygadus melanobranchus
Malmstrom et al. 2017 Benthosema glaciale
Malmstrom et al. 2017 Beryx splendens
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000788275.1_BP.fa/GCA_000788275.1_BP.fa_genomic.fna.gz Boleophthalmus pectinirostris
Malmstrom et al. 2017 Borostomias antarcticus
Malmstrom et al. 2017 Brosme brosme
Malmstrom et al. 2017 Brotula barbata
SRX360276, GenBank, see Musilova et al. 2018 Caranx ignobilis
SRX360285, GenBank, see Musilova et al. 2018 Caranx melampygus
Malmstrom et al. 2017 Carapus acus
Musilova et al. 2018 Cetomimus sp
Malmstrom et al. 2017 Chaenocephalus aceratus
Malmstrom et al. 2017 Chatrabus melanurus
Malmstrom et al. 2017 Chromis chromis
SRX203077, GenBank, see Musilova et al. 2018 Clupea harengus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000523025.1_Cse_v1.0/GCF_000523025.1_Cse_v1.0_genomic.fna.gz Cynoglossus semilaevis
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000776015.1_ASM77601v1/GCA_000776015.1_ASM77601v1_genomic.fna.gz Cyprinodon nevadensis
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000776015.1_ASM77601v1/GCA_000776015.1_ASM77601v1_genomic.fna.gz Cyprinodon variegatus
SRX317090, GenBank, see Musilova et al. 2018 Cyprinus carpio
Malmstrom et al. 2017 Cyttopsis rosea
ftp://ftp.ensembl.org/pub//release-78/fasta/danio_rerio/dna/Danio_rerio.Zv9.dna.toplevel.fa.gz Danio rerio
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000689215.1_seabass_V1.0/GCA_000689215.1_seabass_V1.0_genomic.fna.gz Dicentrarchus labrax
Musilova et al. 2018 Diretmoides pauciradiatus
Musilova et al. 2018 Diretmus argenteus
SRX554947, GenBank, see Musilova et al. 2018 Electrophorus electricus
ERX432347, GenBank, see Musilova et al. 2018 Epinephelus aeneus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000721915.2_ASM72191v2/GCF_000721915.2_ASM72191v2_genomic.fna.gz Esox lucius
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000826765.1_Fundulus_heteroclitus-3.0.2/GCF_000826765.1_Fundulus_heteroclitus-3.0.2_genomic.fna.gz Fundulus heteroclitus
Malmstrom et al. 2017 Gadus morhua
ftp://ftp.ensembl.org/pub//release-78/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz Gasterosteus aculeatus
Musilova et al. 2018 Gephyroberyx darwinii
Malmstrom et al. 2017 Guentherus altivela
Malmstrom et al. 2017 Helostoma temminkii
Malmstrom et al. 2017 Holocentrus rufus
Musilova et al. 2018 Hoplostethus atlanticus
Malmstrom et al. 2017 Lampris guttatus
Malmstrom et al. 2017 Lamprogrammus exutus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000742935.1_ASM74293v1/GCF_000742935.1_ASM74293v1_genomic.fna.gz Larimichthys crocea
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000242695.1_LepOcu1/GCF_000242695.1_LepOcu1_genomic.fna.gz Lepisosteus oculatus
Malmstrom et al. 2017 Lesueurigobius cf. sanzoi
Musilova et al. 2018 (unpublished) Lophius vaillanti
Malmstrom et al. 2017 Macrourus berglax
Malmstrom et al. 2017 Merluccius polli
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000238955.2_M_zebra_UMD1/GCF_000238955.2_M_zebra_UMD1_genomic.fna.gz Metriaclima zebra
Malmstrom et al. 2017 Monocentris japonica
SRX218060, GenBank, see Musilova et al. 2018 Monopterus albus
Malmstrom et al. 2017 Mora moro
Malmstrom et al. 2017 Myoxocephalus scorpius
Malmstrom et al. 2017 Myripristis jacobus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000239395.1_NeoBri1.0/GCF_000239395.1_NeoBri1.0_genomic.fna.gz Neolamprologus brichardi
Malmstrom et al. 2017 Neoniphon sammara
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000735185.1_NC01/GCF_000735185.1_NC01_genomic.fna.gz Notothenia coriiceps
Musilova et al. 2018 Opsanus beta
ftp://ftp.ensembl.org/pub//release-78/fasta/oreochromis_niloticus/dna/Oreochromis_niloticus.Orenil1.0.dna.toplevel.fa.gz Oreochromis niloticus
ftp://ftp.ensembl.org/pub//release-78/fasta/oryzias_latipes/dna/Oryzias_latipes.MEDAKA1.dna.toplevel.fa.gz Oryzias latipes
Malmstrom et al. 2017 Osmerus eperlanus
Malmstrom et al. 2017 Parablennius parvicornis
Malmstrom et al. 2017 Parasudis fraserbrunneri
Malmstrom et al. 2017 Perca fluviatilis
Malmstrom et al. 2017 Percopsis transmontana
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000787095.1_PS.fa/GCA_000787095.1_PS.fa_genomic.fna.gz Periophthalmodon schlosseri
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000787105.1_PM.fa/GCA_000787105.1_PM.fa_genomic.fna.gz Periophthalmus magnuspinnatus
SRX423854, GenBank, see Musilova et al. 2018 Pimephales promelas
ftp://ftp.ensembl.org/pub//release-78/fasta/poecilia_formosa/dna/Poecilia_formosa.PoeFor_5.1.2.dna.toplevel.fa.gz Poecilia formosa
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000633615.1_Guppy_female_1.0_MT/GCF_000633615.1_Guppy_female_1.0_MT_genomic.fna.gz Poecilia reticulata
Malmstrom et al. 2017 Polymixia japonica
Malmstrom et al. 2017 Pseudochromis fuscus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000787555.1_Pyoko_1.0/GCA_000787555.1_Pyoko_1.0_genomic.fna.gz Pseudopleuronectes yokohamae
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000239375.1_PunNye1.0/GCF_000239375.1_PunNye1.0_genomic.fna.gz Pundamilia nyererei
Malmstrom et al. 2017 Regalecus glesne
Malmstrom et al. 2017 Rondeletia loricata
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000233375.1_ICSASG_v2/GCF_000233375.1_ICSASG_v2_genomic.fna.gz Salmo salar
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000787155.1_SH.fa/GCA_000787155.1_SH.fa_genomic.fna.gz Scartelaos histophorus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_001005745.2_aro_v2/GCA_001005745.2_aro_v2_genomic.fna.gz Scleropages formosus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000475235.1_Snig1.0/GCA_000475235.1_Snig1.0_genomic.fna.gz Sebastes nigrocinctus
Malmstrom et al. 2017 Sebastes norvegicus
Malmstrom et al. 2017 Selene dorsalis
Malmstrom et al. 2017 Spondyliosoma cantharus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000690725.1_Stegastes_partitus-1.0.2/GCF_000690725.1_Stegastes_partitus-1.0.2_genomic.fna.gz Stegastes partitus
Malmstrom et al. 2017 Stylephorus chordatus
Malmstrom et al. 2017 Symphodus melops
Musilova et al. 2018 (unpublished) Syngnathus typhle
Musilova et al. 2018 (unpublished) Synodus synodus
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000400755.1_version_1_of_Takifugu_flavidus_genome/GCA_000400755.1_version_1_of_Takifugu_flavidus_genome_genomic.fna.gz Takifugu flavidus
ftp://ftp.ensembl.org/pub//release-78/fasta/takifugu_rubripes/dna/Takifugu_rubripes.FUGU4.dna.toplevel.fa.gz Takifugu rubripes
ftp://ftp.ensembl.org/pub//release-78/fasta/tetraodon_nigroviridis/dna/Tetraodon_nigroviridis.TETRAODON8.dna.toplevel.fa.gz Tetraodon nigroviridis
Malmstrom et al. 2017 Thunnus albacares
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000418415.1_Thunnus_orientalis_ver_Ba_1.0/GCA_000418415.1_Thunnus_orientalis_ver_Ba_1.0_genomic.fna.gz Thunnus orientalis
Malmstrom et al. 2017 Trisopterus minutus
Malmstrom et al. 2017 Typhlichthys subterraneus
ftp://ftp.ensembl.org/pub//release-78/fasta/xiphophorus_maculatus/dna/Xiphophorus_maculatus.Xipmac4.4.2.dna.toplevel.fa.gz Xiphophorus maculatus
Malmstrom et al. 2017 Zeus faber
Malmstrom et al. 2017 Arctogadus glacialis
Malmstrom et al. 2017 Molva molva
Malmstrom et al. 2017 Lota lota
Malmstrom et al. 2017 Brosme brosme
Malmstrom et al. 2017 Merluccius merluccius
Malmstrom et al. 2017 Merluccius polli
Malmstrom et al. 2017 Melanonus zugmayeri
Malmstrom et al. 2017 Macrourus berglax
Malmstrom et al. 2017 Malacocephalus occidentalis
Malmstrom et al. 2017 Bathygadus melanobranchus
Malmstrom et al. 2017 Boreogadus saida
Malmstrom et al. 2017 Muraenolepis marmoratus
Malmstrom et al. 2017 Bregmaceros cantori
Malmstrom et al. 2017 Mora moro
Malmstrom et al. 2017 Trisopterus minutus
Malmstrom et al. 2017 Trachyrincus scabrus
Malmstrom et al. 2017 Pollachius virens
Malmstrom et al. 2017 Melanogrammus aeglefinus
Malmstrom et al. 2017 Merlangius merlangus
Malmstrom et al. 2017 Theragra chalcogramma
Malmstrom et al. 2017 Gadiculus argenteus
Malmstrom et al. 2017 Phycis phycis
Malmstrom et al. 2017 Phycis blennoides
Malmstrom et al. 2017 Gadus morhua
Methods
For TE annotation, we used a variant of the computational pipeline that is more thoroughly described in (Tørresen et al. 2017), available at https://github.com/uio-cels/Repeats. The pipeline includes multiple TE detection steps using different tools, steps for removing non-TEs from the detected sequences and steps for classifying the elements. For the initial detection step, we used RepeatModeler (v. 1.0.8) (Smit & Hubley 2008-2015) and LTRharvest (part of GenomeTools v. 1.5.7) (Ellinghaus et al. 2008). RepeatModeler detects all sorts of repetitive sequences and LTRharvest is specialized for detecting LTR-RTs. Using BLASTX, TEs with sequences matching known non-TEs in UniProtKB/Swiss-Prot were removed. To classify the TEs, we used RepeatClassifier, which is a part of the RepeatModeler software. As the tool did not manage to classify all of the remaining sequences, additional similarity searches were performed between the sequences and a curated library of TE sequences (RepBase v. 20150807), using nucleotide BLAST. Finally, we built Hidden Markov Model profiles from the detected sequences using HMMER (v. 3.1b1) (Wheeler & Eddy 2013) and compared the profiles with HMM profiles from databases downloaded from GyDB.org (Llorens et al. 2011) and dfam.org (Hubley et al. 2016), using the nhmmer feature included in HMMER. This resulted in additional sequences being classified at the class and subclass level. The pipeline resulted in one de novo library per assembly, which contained the consensus sequences of the interspersed repeats detected in each assembly.
Usage notes
This repository contain one de novo library (FASTA file) per genome assembly. The results of masking each genome assembly with RepeatMasker using these de novo libraries can be found at https://doi.org/10.6084/m9.figshare.8280800.