Ion Torrent data for the genome assembly and phylogenomic placement of mitochondrial genomes with a focus on houndsharks (Chondrichthyes: Triakidae)
Data files
Jan 31, 2024 version files 4.25 MB
-
1_IonTorrent_NGS_Filtered_RawData.zip
4.25 MB
-
README.md
7.63 KB
Feb 26, 2025 version files 7 MB
-
1_IonTorrent_NGS_Filtered_RawData.zip
4.25 MB
-
2_Galeomorphii_mitogenome_sequences.zip
940.84 KB
-
3_Multiple_sequence_alignments.zip
1.80 MB
-
4_Partition_files.zip
3.77 KB
-
README.md
14.25 KB
Abstract
Here, we present the Ion Torrent® next-generation sequencing (NGS) data for five houndsharks (Chondrichthyes: Triakidae), which include Galeorhinus galeus (17,487 bp; GenBank accession number ON652874), Mustelus asterias (16,708; ON652873), Mustelus mosis (16,755; ON075077), Mustelus palumbes (16,708; ON075076), and Triakis megalopterus (16,746 bp; ON075075). All assembled mitogenomes encode 13 protein-coding genes (PCGs), two ribosomal (r)RNA genes, and 22 transfer (t)RNA genes (tRNALeu and tRNASer are duplicated), except for G. galeus which contains 23 tRNA genes where tRNAThr is duplicated. We also present our code and corresponding datasets used to assemble and annotate their mitogenomes, prepare alignments, partition our datasets, assign models of evolution, infer phylogenies based on traditional site homogeneous concatenation approaches as well as under the multispecies coalescent model (MSCM) and site heterogenous models, and generate statistical data for comparison of different topological outcomes. The data and code presented in this paper can assist other researchers in further elucidating the diversification of triakid species and the phylogenetic relationships within Carcharhiniformes (groundsharks) as mitogenomes accumulate in public repositories.
This README file was generated on 2025-02-25 by Jessica Winn
GENERAL INFORMATION
- Title of the journal article that uses this data set: Ion Torrent data for the genome assembly and phylogenomic placement of mitochondrial genomes with a focus on houndsharks (Chondrichthyes: Triakidae).
- Author Information:
A. Principle Investigator contact information
Name: Jessica C. Winn
Institution: Stellenbosch University
Address: Molecular Breeding and Biodiversity Group, Department of Genetics, Stellenbosch University, Stellenbosch, Western Cape, 7602, South Africa.
Email: jessica.winn16@gmail.com
ORCiD: https://orcid.org/0000-0003-1070-1276
B. Co-investigator contact information
Name: Simo N. Maduna
Institution: Norwegian Institute of Bioeconomy Research
Address: Department of Ecosystems in the Barents Region, Svanhovd Research Station, Norwegian Institute of Bioeconomy Research, 9925 Svanvik, Norway.
Email: simo.maduna@nibio.no
ORCiD: https://orcid.org/0000-0002-9372-4360
C. Co-investigator contact information
Name: Aletta E. Bester-van der Merwe
Institution: Stellenbosch University
Address: Molecular Breeding and Biodiversity Group, Department of Genetics, Stellenbosch University, Stellenbosch, Western Cape, 7602, South Africa.
Email: aeb@sun.ac.za
ORCiD: https://orcid.org/0000-0002-0332-7864 - Data collection:
Genomic DNA - Standard CTAB protocol or SDS-based lysis buffer (PL2) from the NucleoSpin Plant II mini kit (MACHEREY-NAGEL, Dueren, Germany).
DNA quality control - Qubit 4.0 fluorometer (ThermoFisher Scientific) and LabChip GXII Touch (PerkinElmer, Waltham, MA, USA).
Library preparation - Ion Plus Fragment Library Kit (ThermoFisher Scientific) according to the manufacturers protocol, Ion Xpress Plus gDNA Fragment Library Preparation User Guide (MAN0009847 K.0).
Polymerase chain reaction - 50 ng template DNA (Galeorhinus galeus genomic DNA), 1X GoTaq Buffer, 2.5 mM MgCl2, 200 µM dNTPs, 0.3 µM of each primer [Cytb CC F (5’- ACTTGAATTGGAGGGCAACC-3’) and Dloop Gga R (5’- AGGGTATGTGGGCCATATCA -3’)] and 0.625 U GoTaq DNA polymerase in a total reaction volume of 15 µL using the SimpliAmp™ Thermo Cycler machine. Cycling parameters: (i) one initial denaturation cycle at 94°C for 3 min, (ii)
denaturation at 94°C for 30 s, annealing for 30 s starting at 65°C and decreasing in 1°C increments to 55°C for 11 cycles and then maintaining 55°C for a further 24 cycles and elongation at 72°C for 1 min, and (iii) a final elongation cycle at 72°C for 10 min.
PCR quality control - 1.5% agarose gel electrophoresis at 100 Volts for 1 hour and visualisation using the Gel Doc™ XR+ documentation system (Bio- Rad).
NGS Sequencing - Ion GeneStudio S5 Prime System and postprocessing with Torrent Suite version 5.16 under default settings to generate BAM reads and cycle sequencing with standard Sanger sequencing chemistry (BigDye® Terminator v.3.1 cycle sequencing kit) and capillary electrophoresis on the ABI3730xl genetic analyser (Applied Biosystems®) for FASTA reads. Conducted at the Central Analytical Facility (CAF) at Stellenbosch University. - Date of data collection (range): Fin clip sampling (2014-2019), Ion Torrent NGS sequencing (2022), Data analysis (2022-2023)
- Geographic location of data collection: Fin clip tissue samples of Galeorhinus galeus, Mustelus palumbes, and Triakis megalopterus were collected along the coast of South Africa. Mustelus asterias and Mustelus mosis were sampled off the coasts of Wales and the Sultanate of Oman respectively.
- Information about funding sources that supported the collection of the data: MND210614611484, National Research Foundation, South Africa.
#########################################################################
SHARING/ACCESS INFORMATION
- Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
- Links to publications that cite or use the data:
Winn, J. C., Bester-van der Merwe, A. E., Maduna, S. N. (2024). Ion Torrent data for the genome assembly and phylogenomic placement of mitochondrial genomes with a focus on houndsharks (Chondrichthyes: Triakidae). Data in Brief. In Press.
Winn, J. C., Bester-van der Merwe, A. E., Maduna, S. N. (2024). A comprehensive phylogenomic study unveils evolutionary patterns and challenges in the mitochondrial genomes of Carcharhiniformes: A focus on Triakidae. Genomics. 11(1): 110771. doi:10.1016/j.ygeno.2023.110771.
Winn, J. C., Bester-van der Merwe, A. E. and Maduna, S. N. (2025). Annotated Bioinformatic Pipelines for Genome Assembly and Annotation of Mitochondrial Genomes. Bio-protocol. 15(5): e5231. doi:10.21769/BioProtoc.5231.
Winn, J. C., Bester-van der Merwe, A. E. and Maduna, S. N. (2025). Annotated Bioinformatic Pipelines for Phylogenomic Placement of Mitochondrial Genomes. Bio-protocol. 15(5): e5232. DOI: 10.21769/BioProtoc.5232. - Links to other publicly accessible locations of the data:
Assembled mitochondrial genomes
Repository name: GenBank
Data identification number: ON075075, ON075076, ON075077, ON652873, and ON652874
Direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/ON075075; https://www.ncbi.nlm.nih.gov/nuccore/ON075076; https://www.ncbi.nlm.nih.gov/nuccore/ON075077; https://www.ncbi.nlm.nih.gov/nuccore/ON652873; https://www.ncbi.nlm.nih.gov/nuccore/ON652874.
Raw Ion Torrent® next-generation sequencing (NGS) data
Repository name: BioProject *
Data identification number: PRJNA997468
BioSample accessions: SAMN36680060, SAMN36680061, SAMN36680062, SAMN36680063, SAMN36680064
Direct URL to data: https://www.ncbi.nlm.nih.gov/bioproject/997468; https://www.ncbi.nlm.nih.gov/biosample/36680060; https://www.ncbi.nlm.nih.gov/biosample/36680061; https://www.ncbi.nlm.nih.gov/biosample/36680062; https://www.ncbi.nlm.nih.gov/biosample/36680063; https://www.ncbi.nlm.nih.gov/biosample/36680064.- The data has been uploaded as a BioProject onto the SRA database, but it has been suppressed until the release of a related manuscript.
- Links/relationships to ancillary data sets: None
- Was data derived from another source? No
- Recommended citation for this dataset:
Winn, Jessica; Bester-van der Merwe, Aletta; Maduna, Simo (2025). Ion Torrent data for the genome assembly and phylogenomic placement of mitochondrial genomes with a focus on houndsharks (Chondrichthyes: Triakidae) [Dataset]. Dryad. https://doi.org/10.5061/dryad.sj3tx969h
#########################################################################
DATA & FILE OVERVIEW
- File List:
A) 1_IonTorrent_NGS_Filtered_RawData.zip
Data_1_Galeorhinus_galeus_IonTorrent_Filtered_RawData.bam
Data_2_Mustelus_asterias_IonTorrent_Filtered_RawData.bam
Data_3_Mustelus_mosis_IonTorrent_Filtered_RawData.bam
Data_4_Mustelus_palumbes_IonTorrent_Filtered_RawData.bam
Data_5_Triakis_megalopterus_IonTorrent_Filtered_RawData.bam
Data_6_Galeorhinus_galeus_Sanger_Forward
Data_7_Galeorhinus_galeus_Sanger_Reverse
B) 2_Galeomorphii_mitogenome_sequences
Data_8_ON652873_Mustelus_asterias
Data_9_ON652874_Galeorhinus_galeus
Data_10_ON075075_Triakis_megalopterus
Data_11_ON075076_Mustelus_palumbes
Data_12_ON075077_Mustelus_mosis
Data_13_Winn2023_Galeomorphii_mitogenome_seqs
C) 3_Multiple_sequence_alignments
Data_14_Galeomorphii_13PCGs_NT.fasta
Data_15_Galeomorphii_13PCGs_NT.phy
Data_16_Galeomorphii_13PCGs_NT.nex
Data_17_Galeomorphii_13PCGs_2rRNAs_NT.fasta
Data_18_Galeomorphii_13PCGs_2rRNAs_NT.phy
Data_19_Galeomorphii_13PCGs_2rRNAs_NT.nex
Data_20_Galeomorphii_13PCGs_AA.fasta
Data_21_Galeomorphii_13PCGs_AA.phy
Data_22_Galeomorphii_13PCGs_AA.nex
D) 4_Partition_files
0_partition_key
13PCGs_2rRNAs_NT.part
PS01.nex
PS01AA.nex
PS02.nex
PS03.nex
PS04.nex
PS05.nex
PS05AA.nex
PS06.nex
PS07.nex
PS08.nex - Relationship between files, if important:
The GenBank files for the Triakidae species assembled in this study (2_Galeomorphii_mitogenome_sequences) are the output files created during assembly of the raw Ion Torrent sequences (under 1_IonTorrent_NGS_Filtered_RawData).
The multiple sequence alignment files (3_Multiple_sequence_alignments) were created using the Triakidae GenBank files (2_Galeomorphii_mitogenome_sequences) and additional Galeomorphii mitogenomes and outgroups saved together as Data_13_Winn2023_Galeomorphii_mitogenome_seqs.
The multiple sequence alignment files (3_Multiple_sequence_alignments) and partition files (4_Partition_files) are used together for mitophylogenomic analyses under Maximum Likelihood and Bayesian Inference frameworks. - Additional related data collected that was not included in the current data package: None
- Are there multiple versions of the dataset? No
A. If yes, name of file(s) that was updated: NA
i. Why was the file updated? NA
ii. When was the file updated? NA
#########################################################################
DATA-SPECIFIC INFORMATION FOR: 1_Raw_IonTorrent_NGS_data
- Data type: Raw Mitogenomic Ion Torrent® NGS data files in BAM format for Galeorhinus galeus, Mustelus asterias, Mustelus mosis, Mustelus palumbes and Triakis megalopterus and Sanger sequence reads in FASTA format for the duplicated portion of the Galeorhinus galeus mitogenome.
- Data processing:
Adaptors and poor-quality bases (Phred score below 20) have been trimmed and reads shorter than 25 base pairs (bp) removed in Torrent Suite Version 5.16.
Raw reads were aligned to the Mustelus mustelus mitogenome (NC_039629.1) using the Geneious read mapper with medium sensitivity settings and five iterations in Geneious Prime v.2023.2.
The reads that mapped to the reference mitogenome were then saved in BAM format as filtered Ion Torrent reads.
The first and last 20-30 nucleotides were trimmed from the Sanger sequence reads in Finch TV v.1.5 and saved in FASTA format. - Specialized formats or other abbreviations used: NA
DATA-SPECIFIC INFORMATION FOR: 2_Galeomorphii_mitogenome_sequences
- Data type: Mitochondrial genome assemblies in GenBank format for Galeorhinus galeus, Mustelus asterias, Mustelus mosis, Mustelus palumbes and Triakis megalopterus. A combined GenBank file with our new assemblies, additional representative Galeomorphii mitogenomes and outgroups.
- Data processing:
All three assemblies were compared to produce consensus sequences for each species which were annotated with MitoAnnotator in MitoFish v.3.72. Mitogenome sequence files in GenBank format were generated and uploaded onto NCBI (ON075075, ON075076, ON075077, ON652873, and ON652874) (Data 8-12).
Additional representative mitogenomes from each family in the order Carcharhiniformes and four representative species each from the orders Lamniformes and Orectolobiformes were downloaded from GenBank to complete the Galeomorphii cluster and merged with our newly assembled mitogenomes to create a single GenBank file (Data 13). - Specialized formats or other abbreviations used: NA
DATA-SPECIFIC INFORMATION FOR: 3_Multiple_sequence_alignments
- Data type: Concatenated multiple sequence alignments in FASTA, NEXUS and PHYLIP format.
- Data processing:
Codon-aware multiple sequence alignments (MSA) were produced for each of the 13 PCGs in MACSE v.2.07 and the 2 rRNAs in MAFFT v.7.299 using the Q-INS-i iterative refinement method. Geneious Prime v.2023.2 was used to remove stop codons and check the reading frame of each PCG alignment. Ambiguously aligned regions were cleaned in BMGE v.1.12_1. The 13 PCG alignments were concatenated and saved in FASTA, NEXUS and PHYLIP format to create Data 14-16, the 13 PCG and 2 rRNA alignments were concatenated and saved in FASTA, NEXUS and PHYLIP format to create Data 17-19, and the 13 PCGs were translated and then concatenated and saved in FASTA, NEXUS and PHYLIP format to create Data 20-22. Gene partitions were added to the Data_19_Galeomorphii_13PCGs_2rRNAs_NT.nex file to create an input file (Data 23) for multispecies coalescent modelling with SVDQuartets in PAUP* v.4.0a169. - Specialized formats or other abbreviations used: NA
DATA-SPECIFIC INFORMATION FOR: 4_Partition_files
- Data type: Raw Mitogenomic Ion Torrent® NGS data files in BAM format for Galeorhinus galeus, Mustelus asterias, Mustelus mosis, Mustelus palumbes and Triakis megalopterus.
- Data processing: Gene and codon partition files in NEXUS format.
Length summaries were made with the length and alignment locations of each PCG from the alignment information in Geneious Prime v.2023.2 and used to create codon, gene and gene x codon partition files. - Specialized formats or other abbreviations used: NA
#########################################################################
BIOINFORMATICS PIPELINE
The full bioinformatics pipeline and code describing how to process the data is available in two separate Bio-protocol articles:
Winn, J. C., Bester-van der Merwe, A. E. and Maduna, S. N. (2025). Annotated Bioinformatic Pipelines for Genome Assembly and Annotation of Mitochondrial Genomes. Bio-protocol. 15(5): e5231. doi:10.21769/BioProtoc.5231.
Winn, J. C., Bester-van der Merwe, A. E. and Maduna, S. N. (2025). Annotated Bioinformatic Pipelines for Phylogenomic Placement of Mitochondrial Genomes. Bio-protocol. 15(5): e5232. DOI: 10.21769/BioProtoc.5232.
Data collection
Genomic DNA extraction: Standard CTAB protocol or SDS-based lysis buffer (PL2) from the NucleoSpin Plant II mini kit (MACHEREY-NAGEL, Dueren, Germany); DNA quality control: Qubit 4.0 fluorometer (ThermoFisher Scientific) and LabChip® GXII Touch (PerkinElmer, Waltham, MA, USA); Library preparation: Ion Plus Fragment Library Kit (ThermoFisher Scientific) according to the manufacturer’s protocol, Ion Xpress™ Plus gDNA Fragment Library Preparation User Guide (MAN0009847 K.0); Polymerase chain reaction: 50 ng template DNA (Galeorhinus galeus genomic DNA), 1X GoTaq Buffer, 2.5 mM MgCl2, 200 µM dNTPs, 0.3 µM of each primer [Cytb CC F (5’- ACTTGAATTGGAGGGCAACC-3’) and Dloop Gga R (5’- AGGGTATGTGGGCCATATCA -3’)] and 0.625 U GoTaq DNA polymerase in a total reaction volume of 15 µL using the SimpliAmp™ Thermo Cycler machine. Cycling parameters: (i) one initial denaturation cycle at 94°C for 3 min, (ii) denaturation at 94°C for 30 s, annealing for 30 s starting at 65°C and decreasing in 1°C increments to 55°C for 11 cycles and then maintaining 55°C for a further 24 cycles and elongation at 72°C for 1 min, and (iii) a final elongation cycle at 72°C for 10 min; PCR quality control : 1.5% agarose gel electrophoresis at 100 Volts for 1 hour and visualisation using the Gel Doc™ XR+ documentation system (Bio- Rad); NGS Sequencing: Ion GeneStudio S5 Prime System and postprocessing with Torrent Suite version 5.16 under default settings to generate BAM reads and cycle sequencing with standard Sanger sequencing chemistry (BigDye® Terminator v.3.1 cycle sequencing kit) and capillary electrophoresis on the ABI3730xl genetic analyser (Applied Biosystems®) for FASTA reads. Conducted at the Central Analytical Facility (CAF) at Stellenbosch University.
Data processing
For the five houndshark species for which sequencing data was generated, sequence quality was checked in FastQC, adaptors and poor-quality bases (Phred score below 20) were trimmed, and reads shorter than 25 bp were removed in Torrent Suite Version 5.16. Raw reads were aligned to the Mustelus mustelus mitogenome (NC_039629.1) using the Geneious read mapper with medium sensitivity settings and five iterations in Geneious Prime v.2023.2 (Data 1-5). The reads that mapped to the reference mitogenome were then saved in BAM format as filtered Ion Torrent reads to use for mitogenome assembly in SPAdes v.3.15 with the input set for unpaired Ion Torrent reads with 8 threads, kmers 21,33,55,77,99,127, the careful option to reduce the number of mismatches and short indels and all other parameters left as default. A third assembly approach was conducted by mapping the original raw reads de novo in SPAdes using the same parameters. Sanger sequence reads were generated for the Galeorhinus galeus mitogenome to confirm a structural deviation (tandem duplication and random loss mutation) in the mitogenome and trimmed using Finch TV v.1.5 (Data 6-7). All three assemblies were compared to produce a consensus sequence for each species and these were annotated with MitoAnnotator in MitoFish v.3.72. Mitogenome sequence files in GenBank format were generated and uploaded onto NCBI (ON075075, ON075076, ON075077, ON652873, and ON652874) (Data 8-12).
The five newly assembled mitogenomes were used in an extensive statistical workflow to expand the phylogeny of Triakidae. First, we downloaded additional representative mitogenomes from each family in the order Carcharhiniformes and four representative species each from the orders Lamniformes and Orectolobiformes from GenBank to complete the Galeomorphii cluster (Data 13) and produced codon-aware multiple sequence alignments (MSA) for each of the 13 PCGs in MACSE v.2.07 and the 2 rRNAs in MAFFT v.7.299. We concatenated the PCG alignments (Data 14-16), concatenated the PCG and rRNA alignments (Data 17-19), and translated and then concatenated the PCG alignments (Data 20-22). These concatenated alignments were used for phylogenetic reconstruction under Maximum Likelihood (ML) and Bayesian Inference (BI) approaches in IQTree v.2.1.3 and MrBayes v.3.2.6 as well as multispecies coalescent modelling in ASTRAL v.5.6.3 and SVDQuartets in PAUP* v4.0a 169 (a modified input NEXUS file with gene partitions is available as Data 23). Site heterogenous models were applied in PHYLOBAYES_MPI v.1.9. We designated eight different a priori partitioning schemes with varying degrees of complexity to determine the effects of partitioning on phylogeny estimation (PS01-PS08) for ML and BI analyses.
BAM: Raw filtered Ion Torrent® NGS data files in BAM format can be viewed in a sequence analysis software. Here we use Geneious Prime v.2023.2 and SPAdes v.3.15.
GenBank: Mitochdonrial genome assemblies in GenBank format can be viewed in a sequence analysis software. We used Geneious Prime v.2023.2. They can also be viewed and edited in a standard text editor such as Notes.
FASTA: Sanger sequence data files in FASTA format can be trimmed in Finch TV v.1.5 and viewed in a sequence analysis software. Multiple sequence alignments in FASTA format can be viewed in a sequence analysis software. We used Geneious Prime v.2023.2 but DAMBE v.7.0.35 and MEGA 11 can be used as alternatives. They can also be viewed and edited in a standard text editor such as Notes.
NEXUS: Multiple sequence alignments and partition files in NEXUS format can be viewed in a sequence analysis software. We used Geneious Prime v.2023.2 but DAMBE v.7.0.35, PAUP* v.4.0a169 and MEGA 11 can be used as alternatives. They can also be viewed and edited in a standard text editor such as Notes.
PHYLIP: Multiple sequence alignments in PHYLIP format can be viewed in a sequence analysis software. We used Geneious Prime v.2023.2 but DAMBE v.7.0.35 and MEGA 11 can be used as alternatives. They can also be viewed and edited in a standard text editor such as Notes.