Data for: Mobilomes of four flower-breeding Drosophila species
Data files
Apr 21, 2023 version files 3.36 MB
-
Dbromeliae_IV_Mobilome.fasta
-
Dbromeliae_Mobilome.fasta
-
Dbromelioides_Mobilome.fasta
-
Dlutzii_Mobilome.fasta
-
README_Dataset-Mobilomes_of_four_flower-breeding_Drosophile_species.txt
Abstract
Understanding the mechanisms that shape the architecture, diversity and adaptations of genomes and their ecological and genetic interfaces is of utmost importance to understand biological evolution. Transposable elements (TEs) play an important role in genome evolution, due to their ability to transpose within and between genomes, providing sites of non-allelic recombination. Here we investigate patterns and processes of TE driven genome evolution associated with niche diversification. Specifically, we compared TE content, TE landscapes, and frequency of horizontal transposon transfers (HTTs) across genomes of five flower-breeding Drosophila (FBD) with different levels of specialization on flowers. Further, we investigated whether niche breadth, and ecological and geographical overlaps are associated with potential for HTT rates. Mobilome landscapes recovered for all species presented a bell-shaped curve, revealing an equilibrium between transposition and excision over evolutionary history. This pattern agrees with results previously recovered for other Drosophilidae species of the same subgenera studied here, suggesting that lineage-specific factors shape the mobilome of this lineage. Nevertheless, the abundance and richness of TE superfamilies was associated with niche breadth. Furthermore, the two more widespread species, the specialist D. incompta and the generalist D. lutzii, presented the highest frequency of HTT events. Our analyses also revealed that HTT opportunities are positively influenced by abiotic niche overlap but are not associated with phylogenetic relationships, niche breadth and biotic niche overlap. This suggests the existence of intermediate vectors promoting HTTs between species presenting non-overlapping biotic niches.
Methods
A de novo search for repetitive elements was performed using the raw filtered Illumina reads on the Galaxy platform (Afgan et al., 2016) through the RepeatExplorer 2.0 tool (Novák et al., 2013). This algorithm makes TE discovery and characterization easier by using graph-based sequence clustering and then evaluating the similarity of each hit to reference databases (Novák et al., 2020). As a first step, FASTQ paired-end reads were preprocessed on the Galaxy platform, through interlacing and quality filtering. The file thus obtained was then employed as input for the graph-based clustering. After, the detected read similarities were used to build a virtual graph that is partitioned into clusters, returning contigs related to individual repeat superfamilies. In addition, supplementary information was also obtained for each supercluster, as the proportions of shared paired-end reads among clusters. To detect and remove false overlaps, a second contig assembly was performed with the results obtained from RepeatExplorer in the program CAP3 (Huang & Madan, 1999) using the parameters: -a 20 -b 20 -c 12 -d 200 -e 30 -f 20 -g 6 -m 2 -n -5 -p 80 -r 1 -s 900 -t 300 -u 3 -v 2 -o 40. The final contigs annotated by superclusters were then used in a homology-based search using RMBlast 2.10.0 against a curated TE database (Repbase v. 20181026 database) to evaluate TEs identity.
A second homology-based search was performed with the draft assembled genome recovered for each species against Repbase v. 20181026 database, to obtain a complementary TE library. In this case, the best query recovered for each contig was manually selected based on the best bit-score number, with a threshold of at least 160 bit-score and a minimum length of 130bp, considering the minimum start and maximum end query sequences. Lastly, the combined TEs libraries of each species were processed by CD-HIT-est clustering algorithm, removing redundant sequences with more than 95% identity (Fu et al., 2012).
Usage notes
Molecular Evolutionary Genetics Analysis - MEGA software