Supplementary data from: Lacewing-specific universal single-copy orthologs designed towards resolution of backbone phylogeny of Neuropterida
Data files
Oct 29, 2024 version files 464.89 MB
-
ABS75_matrix60.7z
33.76 MB
-
ABS75_matrix70.7z
18.76 MB
-
ABS75_matrix80.7z
7.80 MB
-
aligment_total.tar.gz
44.84 MB
-
gene_tree_total.tar.gz
5.73 MB
-
L_ABS75_matrix80.7z
24.52 MB
-
neuropterida_odb10.tar.gz
329.48 MB
-
README.md
4.52 KB
Abstract
Universal Single Copy Orthologs (USCO), as a set of markers of nearly universal single-copy genes, show a superiority in phylogenomic inference. Here, we developed a Benchmarking Universal Single Copy Orthologs (BUSCO) dataset, neuropterida_odb10, tailored for Neuropterida, based on high-quality genome assemblies and transcriptome data, comprising 5,438 BUSCOs. A range of 1,524-5,328 complete and single-copy USCOs could be captured from the genome assemblies and transcriptomes of 104 species of Neuropterida. The reconstruction of a higher-level phylogeny of Neuropterida based on a comprehensive sampling and refined genomic data in reference to neuropterida_odb10 validates the efficiency of this BUSCO dataset for phylogenomic inference. We recovered Psychopsidae as the sister group to Ithonidae, and corroborated the sister group relationship between Sisyridae and Nevrorthidae within Osmyloidea and the sister-group relationship between Chrysopidae and Mantispoidea. Furthermore, our findings highlight that focusing on alignments with a higher presence of parsimony-informative sites, rather than on the total number of alignments, can diminish errors in gene tree estimation, a process notably vulnerable to error when using multispecies coalescent methods. The neuropterida_odb10 BUSCO reference dataset holds promise for phylogenetic studies at various hierarchical levels, as well as for comparative genomics and the exploration of species diversity within Neuropterida.
README: Supplementary data from: Lacewing-specific universal single-copy orthologs designed toward resolution of backbone phylogeny of Neuropterida
https://doi.org/10.5061/dryad.fn2z34v4v
Description of the data and file structure
Files and variables
File: gene_tree_total.tar.gz
Description: gene trees of 5313 aligments in aligment_total.tar.gz. The gene tree of each alignment was constructed by IQ-TREE v.2.2.2.7 with 1000 ultrafast bootstraps.
File: aligment_total.tar.gz
Description: 5313 aligments of orthologous, can be used for subsequent aligment filtering and matrix construction. Errors in small species-specific stretches of the multiple sequence alignments have been masked by TAPER v.1.0.0. And the sequences that introduced excessive branch lengths in phylogenetic trees have detected using TreeShrink v.1.3.8b
File: ABS75_matrix80.7z
Description: after being decompressed, it is a folder containing the information of the matrix ABS75_matrix80.
ABS.alignments:the list of aligments used for concatenation to ABS75_matrix80. The original aligments used for concatenation to ABS75_matrix80 can be seleted from "aligment_total.tar.gz" based on this list.
FcC_supermatrix.fas: the fasta format of ABS75_matrix80.
FcC_supermatrix_partition.txt: the partition file of ABS75_matrix80
File: ABS75_matrix70.7z
Description: after being decompressed, it is a folder containing the information of the matrix ABS75_matrix70.
ABS.alignments:the list of aligments used for concatenation to ABS75_matrix70. The original aligments used for concatenation to ABS75_matrix70 can be seleted from "aligment_total.tar.gz" based on this list.
FcC_supermatrix.fas: the fasta format of ABS75_matrix70.
FcC_supermatrix_partition.txt: the partition file of ABS75_matrix70.
File: L_ABS75_matrix80.7z
Description: after being decompressed, it is a folder containing the information of the matrix L_ABS75_matrix80.
ABS.alignments:the list of aligments used for concatenation to L_ABS75_matrix80.
FcC_supermatrix.fas: the fasta format of L_ABS75_matrix80.
FcC_supermatrix_partition.txt: the partition file of L_ABS75_matrix80.
alignments.tar.gz: alignments used for concatenation to L_ABS75_matrix80.
gene_trees.tar.gz: gene trees of aligments in alignments.tar.gz The gene tree of each alignment was constructed by IQ-TREE v.2.2.2.7 with 1000 ultrafast bootstraps.
File: ABS75_matrix60.7z
Description: after being decompressed, it is a folder containing the information of the matrix ABS75_matrix60.
ABS.alignments:the list of aligments used for concatenation to ABS75_matrix60. The original aligments used for concatenation to ABS75_matrix60 can be seleted from "aligment_total.tar.gz" based on this list.
FcC_supermatrix.fas: the fasta format of ABS75_matrix60.
FcC_supermatrix_partition.txt: the partition file of ABS75_matrix60.
File: neuropterida_odb10.tar.gz
*Description: *after being decompressed, it is a lineage-specific BUSCO dataset for Neuropterida, which can be used to identified the orthologs for thr the inference of phylogeny of Neuropterida, and to assess the completeness of genomes of Neuropterida.
hmms: A folder containing HMM profile files for each BUSCO marker
info: A folder containing the list of species used to create the BUSCO dataset and additional information of the orthologous groups.
Herein, species.info: Information about the species used to designed the BUSCO dataset of neuropterida_odb10, and the number of BUSCOs identified from each species.
ogID.map.txt: the id of orthologous group and its corresponding id of BUSCO marker.
interproscan.annotations.info: the annotation information of each BUSCO marker obtained by InterProScan v5.60-92.0.
prfl: A folder containing one block profile file for each BUSCO marker
ancestral: A FASTA file containing consensus sequences of each BUSCO marker
ancestral_variants: A FASTA file containing consensus sequences and its variants of each BUSCO marker
dataset.cfg: Information about the dataset
lengths_cutoff: The threshold of length for a BUSCO to be classified as complete
refseq_db.faa: A FASTA file containing representative sequences of each BUSCO marker
scores_cutoff: The HMMER score threshold for each gene to be considered as orthologous to BUSCO markers.
Note: The species name of "Balmes birmanus" in all data should be changed to "Balmes notabilis"