Data from: Arthropod phylotranscriptomics with a special focus on the basal phylogeny of the Myriapoda

Su, Zhi-Hui 1 ; Sasaki, Ayako1 ; Minami, Hiroaki2; Ozaki, Katsuhisa1

Published Sep 02, 2024 on Dryad. https://doi.org/10.5061/dryad.p8cz8w9th

Abstract

Arthropoda are the most diverse animal phylum, but the phylogenetic relationships of arthropods are difficult to determine because many arthropod lineages diverged in a short period of time. In order to solve the controversial problems in the deep phylogeny of arthropods with a focus on Myriapoda, we conducted phylogenetic analyses based on ten supermatrices, which contain 751-1,233 orthologous genes from transcriptome data of 64 representative arthropod species, of which 28 transcriptomes are newly generated in this study. The results unambiguously support monophyly of the higher arthropod taxa, Chelicerata, Mandibulata, Myriapoda, Pancrustacea, and Hexapoda. The Crustacea are paraphyletic, with the class Remipedia supported as the lineage most closely related to hexapods. Within Hexapoda, our results basically support the phylogenetic relationships among the deep hexapod lineages proposed to date, except that the Paraneoptera (Hemiptera, Thysanoptera, and Psocodea) was recovered as a monophyletic lineage in some analyses. The results also strongly support the recently proposed phylogenetic framework of the four myriapod classes, in which the Symphyla and Pauropoda, and the Chilopoda and Diplopoda are sister relationship, respectively. These findings provide important insights into understanding the phylogeny and evolution of arthropods.

Amino acid sequence alignments of orthologous genes and tree files

Description of the Data and file structure

HaMStR was used to search the assembled transcriptome data to find protein sequences that matched two different gene sets referred to as arthropod core orthologs: the Coreset insecta_hmmer3_2 (Coreset A) and Coreset arthropoda_hmmer3 (Coreset B). Datasets (A1-A5 and B1-B5) were constructed based on the sequence data of orthologs with different samples as follows.
Dataset_A1 is constructed based on the Coreset A with all samples.
Dataset_A2 is constructed based on the Coreset A with representative samples for the phylogenetic analysis among the four arthropod subphyla.
Dataset_A3 is constructed based on the Coreset A with all myriapod samples and other representative arthropods as outgroups.
Dataset_A4 is constructed based on the Coreset A with all samples of Diplopoda and Chilopoda, and each one sample of Symphyla and Pauropoda as outgroups.
Dataset_A5 is constructed based on the Coreset A with all pancrustacean samples and several representative species of myriapods and chelicerates as outgroups.
Dataset_B1 is constructed based on the Coreset B with all samples.
Dataset_B2 is constructed based on the Coreset B with representative samples for the phylogenetic analysis among the four arthropod subphyla.
Dataset_B3 is constructed based on the Coreset B with all myriapod samples and other representative arthropods as outgroups.
Dataset_B4 is constructed based on the Coreset B with all samples of Diplopoda and Chilopoda, and each one sample of Symphyla and Pauropoda as outgroups.
Dataset_B5 is constructed based on the Coreset B with all pancrustacean samples and several representative species of myriapods and chelicerates as outgroups.

Mafft and Gblocks were used to align sequences and eliminate poorly aligned positions and divergent regions for each dataset. Each dataset was used for phylogenetic analyses with RAxML and MrBayes except for Dataset A1 and B1, which were only used for inferring tree by RAxML.

Data from: Arthropod phylotranscriptomics with a special focus on the basal phylogeny of the Myriapoda

Data files

Abstract

README: Data from: Arthropod phylotranscriptomics with a special focus on the basal phylogeny of the Myriapoda

Description of the Data and file structure

Methods

Works referencing this dataset