Data on mitochondrial genome rearrangement patterns, annotation resources, and phylogenetic visualization in Actinopteri (ray-finned fishes)
Data files
Mar 24, 2025 version files 538.33 MB
-
0_gb_actinopteri_14092_from_NCBI.gb
508.91 MB
-
1_fasta_bed_actinopteri.zip
24.28 MB
-
2_geneious_alignment_validation_files.geneious
1.97 MB
-
3_1_Fish-10664-Gene_order-All_Samples.csv
2.19 MB
-
3_2_Fish-10664-Gene_order-Type_Sta.csv
96 B
-
3_3_Fish-10664-Gene_order-Type.csv
53.93 KB
-
3_4_Fish-10664-Gene_order-ByFamily-676.csv
23.11 KB
-
3_5_Fish-10664-Gene_order-ByOrder-73.csv
2.62 KB
-
3_6_Fish-10664-Gene_order-Hotspot_Area_Sta.csv
1.14 KB
-
4_Fish-phy.png
884.74 KB
-
README.md
4.40 KB
Abstract
This dataset integrates structural variations, annotation resources, and phylogenetic analysis results of mitochondrial genomes in ray-finned fishes (Actinopterygii). The raw data were sourced from publicly available mitochondrial genome records in NCBI GenBank (0_gb_actinopteri_14092_from_NCBI.gb
), comprising 14,092 original sequences. Standardized annotations were generated using the MITOS online tool, resulting in a compressed package (1_fasta_bed_actinopteri.zip
) containing FASTA sequences and BED annotation files, which can be imported into Geneious software for visualizing gene structures and boundaries. For cases where discrepancies were found between NCBI annotations and MITOS results, a Geneious-formatted validation file (2_geneious_alignment_validation_files.geneious
) is provided, including manually corrected alignment evidence. The final compiled CSV file systematically organizes taxonomic information, gene rearrangement patterns (e.g., ND5-ND6 inversion clusters, tRNA translocations), and their associations with order/family-level phylogenetic branches for 10,664 genomes. Additionally, a visualization file (4_Fish-phy.png
) is included, displaying the distribution of gene rearrangement events on a phylogenetic tree for intuitive interpretation.
This dataset is suitable for studies on mitochondrial genome structural evolution, annotation pipeline validation, comparative genomics, and molecular phylogenetic analysis. All data comply with NCBI GenBank usage terms, involve no ethical concerns.
Dataset DOI: 10.5061/dryad.m37pvmdd0
Description of the data and file structure
This dataset is suitable for studies on mitochondrial genome structural evolution, annotation pipeline validation, comparative genomics, and molecular phylogenetic analysis. All data comply with NCBI GenBank usage terms, involve no ethical concerns.
Files and variables
File: 2_geneious_alignment_validation_files.geneious
Description: Manual verification process for discrepancies between NCBI and MITOS annotations.
File: 4_Fish-phy.png
Description: Phylogenetic tree with annotated gene rearrangement events.
File: 0_gb_actinopteri_14092_from_NCBI.gb
Description: Original mitochondrial genome records of ray-finned fishes downloaded from NCBI.
File: 1_fasta_bed_actinopteri.zip
Description: FASTA sequences and BED annotation files generated by MITOS, supporting visualization and analysis in Geneious.
File: 3_2_Fish-10664-Gene_order-Type_Sta.csv
Description: This sheet contains a visual representation in the form of a pie chart that illustrates the proportion of different rearrangement types within the dataset.
Variables
- :Classify the given numbers based on the changes in gene sequence order.
- Type:The classification based on gene rearrangement is divided into five major categories: Short, Type1, Rearranged, GAP, and Unsure.
- Count:The number of data entries for each major category.
File: 3_1_Fish-10664-Gene_order-All_Samples.csv
Description: a comprehensive summary table that lists the basic information for all data samples, encompassing a total of 10,664 entries.
Variables
- LOCUS:Each piece of data has a unique identifier, which can be used to search.
- Order:Order
- Family:Family
- Genus:Genus
- Species_name:Species_name
- Length:The full length of the mitochondrial genome refers to the number of base pairs.
- GeneOder Type No.:Classify the given numbers based on the changes in gene sequence order.
- Gene Order (exclude CR) annotation from ncbi or mitos:The gene arrangement order after inspection (excluding CR).
File: 3_5_Fish-10664-Gene_order-ByOrder-73.csv
Description: This sheet provides a statistical analysis of rearrangement types, categorized by the biological order classification.
Variables
- Order:Order
- Number of families:The number of Families included under each Order.
- Number of samples:The number of samples included under each Order.
- GeneOder Type No.:the gene rearrangement types (represented by numbers, corresponding to Gene Rearrangement Types in Table 3-3) included under each Order have been counted.
File: 3_3_Fish-10664-Gene_order-Type.csv
Description: The focus of this sheet is on the methodology used to assign unique rearrangement numbers to genes based on their sequential order.
Variables
- GeneOder Type No.:Classify the given numbers based on the changes in gene sequence order.
- NOTE:A brief description of the gene rearrangement types and rearrangement positions.
- Gene Order (exclude CR) annotation from ncbi or mitos:The gene arrangement order after inspection (excluding CR).
- Count:Statistics of the data counts for different GeneOrder Types.
File: 3_4_Fish-10664-Gene_order-ByFamily-676.csv
Description: this sheet delves into the statistics of rearrangement types, organized by the biological family classification.
Variables
- Order:Order
- Family:Family
- Count:The number of data entries under each subcategory of the hierarchical classification has been counted.
- GeneOder Type No.:The gene rearrangement types (represented by numbers, corresponding to Gene Rearrangement Types in Table 3-3) included under each Family have been counted.
File: 3_6_Fish-10664-Gene_order-Hotspot_Area_Sta.csv
Description: This table provides statistics on the frequently rearranged positions.
Variables
- Hotspot/Mix-hotspot:Codes have been assigned to some regions where rearrangements frequently occur.
- NOTE:Describe the regions with different codes.
- GeneOder Type No.:Classify the given numbers based on the changes in gene sequence order.
- Count:number