Data from: Hemiptera phylogenomic resources: tree-based orthology prediction and conserved exon identification
Cite this dataset
Owen, Christopher; Stern, David; Hilton, Sarah; Crandall, Keith (2020). Data from: Hemiptera phylogenomic resources: tree-based orthology prediction and conserved exon identification [Dataset]. Dryad. https://doi.org/10.5061/dryad.hqbzkh1cd
High-throughput sequencing of transcriptomes and targeted genomic regions are advancing our knowledge of The Tree of Life. Building phylogenies with regions of the genome requires 1-to-1 ortholog resources of genes and non-coding loci. One organismal group that has received little attention in this area is the Hemiptera, the fifth largest insect order represented by approximately 103,590 named species. Here, we present a set of 3,872 Hemiptera 1-to-1 orthogroups based on tree-based orthology inference of 8 Hemiptera species with publicly available genome sequences. Furthermore, we also estimated a set of 406 orthologous exons with similar mRNA splice sites that can be used for Sanger sequencing and developing enrichment probes for targeted genome sequencing for phylogenomic inference. We show this novel set of orthologs is informative at the protein, coding sequence, and exon molecular levels and provide robust branch support in both gene tree - species tree methods and concatenated sequence phylogenies. In addition, we demonstrate the utility of these loci to resolve relationships in whiteflies, Bemisia tabaci, a large species complex with few phylogenomic resources. Lastly, we compare our Hemiptera phylogeny with previously published phylogenies and other ortholog databases, while providing suggestions on further improvement to this phylogenomic resource.