Data from: Arthropod phylotranscriptomics with a special focus on the basal phylogeny of the Myriapoda
Data files
Sep 02, 2024 version files 102.37 MB
Abstract
Arthropoda are the most diverse animal phylum, but the phylogenetic relationships of arthropods are difficult to determine because many arthropod lineages diverged in a short period of time. In order to solve the controversial problems in the deep phylogeny of arthropods with a focus on Myriapoda, we conducted phylogenetic analyses based on ten supermatrices, which contain 751-1,233 orthologous genes from transcriptome data of 64 representative arthropod species, of which 28 transcriptomes are newly generated in this study. The results unambiguously support monophyly of the higher arthropod taxa, Chelicerata, Mandibulata, Myriapoda, Pancrustacea, and Hexapoda. The Crustacea are paraphyletic, with the class Remipedia supported as the lineage most closely related to hexapods. Within Hexapoda, our results basically support the phylogenetic relationships among the deep hexapod lineages proposed to date, except that the Paraneoptera (Hemiptera, Thysanoptera, and Psocodea) was recovered as a monophyletic lineage in some analyses. The results also strongly support the recently proposed phylogenetic framework of the four myriapod classes, in which the Symphyla and Pauropoda, and the Chilopoda and Diplopoda are sister relationship, respectively. These findings provide important insights into understanding the phylogeny and evolution of arthropods.
README: Data from: Arthropod phylotranscriptomics with a special focus on the basal phylogeny of the Myriapoda
Amino acid sequence alignments of orthologous genes and tree files
Description of the Data and file structure
HaMStR was used to search the assembled transcriptome data to find protein sequences that matched two different gene sets referred to as arthropod core orthologs: the Coreset insecta_hmmer3_2 (Coreset A) and Coreset arthropoda_hmmer3 (Coreset B). Datasets (A1-A5 and B1-B5) were constructed based on the sequence data of orthologs with different samples as follows.
Dataset_A1 is constructed based on the Coreset A with all samples.
Dataset_A2 is constructed based on the Coreset A with representative samples for the phylogenetic analysis among the four arthropod subphyla.
Dataset_A3 is constructed based on the Coreset A with all myriapod samples and other representative arthropods as outgroups.
Dataset_A4 is constructed based on the Coreset A with all samples of Diplopoda and Chilopoda, and each one sample of Symphyla and Pauropoda as outgroups.
Dataset_A5 is constructed based on the Coreset A with all pancrustacean samples and several representative species of myriapods and chelicerates as outgroups.
Dataset_B1 is constructed based on the Coreset B with all samples.
Dataset_B2 is constructed based on the Coreset B with representative samples for the phylogenetic analysis among the four arthropod subphyla.
Dataset_B3 is constructed based on the Coreset B with all myriapod samples and other representative arthropods as outgroups.
Dataset_B4 is constructed based on the Coreset B with all samples of Diplopoda and Chilopoda, and each one sample of Symphyla and Pauropoda as outgroups.
Dataset_B5 is constructed based on the Coreset B with all pancrustacean samples and several representative species of myriapods and chelicerates as outgroups.
Mafft and Gblocks were used to align sequences and eliminate poorly aligned positions and divergent regions for each dataset. Each dataset was used for phylogenetic analyses with RAxML and MrBayes except for Dataset A1 and B1, which were only used for inferring tree by RAxML.
Methods
A total of 64 species representing all arthropod classes (three chelicerate classes: Pycnogonida, Merostomata, and Arachnida; four myriapod classes: Chilopoda, Diplopoda, Pauropoda, and Symphyla; six crustacean classes: Ostracoda, Hexanauplia, Malacostraca, Branchiopoda, Cephalocarida, and Remipedia; two hexapod classes: Entognatha and Ectognatha) were analyzed in this study. Of which 28 species (10 crustaceans, 13 myriapods, and 5 chelecerates), which were collected from the Ryukyu Islands and the Kansai region of Japan, were used for transcriptome sequencing. The transcriptome sequences of other arthropod species and five outgroup species were retrieved from NCBI (National Center for Biotechnology Information) sequence database. For construction of the datasets, tree files, etc., refer to the paper.