Skip to main content
Dryad

Data from: Evolutionary innovation through fusion of sequences from across the tree of life

Data files

Oct 27, 2025 version files 17.91 GB

Click names to download individual files Select up to 11 GB of files for zip download

Abstract

We hypothesized that fusion of genes acquired via horizontal gene transfer (HGT) with endogenous sequences in arthropod genomes might generate what we call “HGT-chimeras”: genes with regions of non-metazoan and metazoan descent in the same open reading frame. This dataset supports the study of these HGT-chimeras presented in our manuscript “Evolutionary innovation through fusion of sequences from across the tree of life”. It includes input data and intermediate output files used in our HGT-chimera detection pipeline, as well as in the downstream bioinformatic characterization of these genes. The repository contains FASTA files of protein sequences, clustering results, phylogenetic trees, and tabular summaries of inferred HGT-chimeras, along with downstream analyses describing sequence molecular evolution (dN/dS), phylogenetic origin, gene expression, and domain architecture. Files are organized to correspond with steps in the associated GitHub pipeline, beginning with input clustering data (mmseq_cluster_representatives_with_missing.fasta) and concluding with analyses of representative HGT-chimeras highlighted in the manuscript’s figures. These data can be reused to validate our findings, extend analyses of discovered HGT-chimeras, or adapt the included pipeline for other genomic datasets. No ethical or legal restrictions apply to the data, which are derived from available genome assemblies and annotation data on NCBI.