Raw sequence data and analytical resources for: Mitochondrial genome structure and composition in 70 fishes: a key resource for fisheries management in the South Atlantic
Data files
Jan 29, 2024 version files 2.71 GB
-
Ablennes-hians-AI97_S79_L001_R1_001.fastq
-
Ablennes-hians-AI97_S79_L001_R2_001.fastq
-
Acanthocybium_solandri-AT68_S23_L001_R1_001.fastq
-
Acanthocybium_solandri-AT68_S23_L001_R2_001.fastq
-
Acanthurus-bahianus-D02_S30_L001_R1_001.fastq
-
Acanthurus-bahianus-D02_S30_L001_R2_001.fastq
-
Anchoa_tricolor-AI10_S54_L001_R1_001.fastq
-
Anchoa_tricolor-AI10_S54_L001_R2_001.fastq
-
Anisotremus-surinamensis-AX61_S77_L001_R1_001.fastq
-
Anisotremus-surinamensis-AX61_S77_L001_R2_001.fastq
-
Astroscopus-sexspinosus-AC02_S89_L001_R1_001.fastq
-
Astroscopus-sexspinosus-AC02_S89_L001_R2_001.fastq
-
Cantherhines-pullus-AU37_S8_L001_R1_001.fastq
-
Cantherhines-pullus-AU37_S8_L001_R2_001.fastq
-
Caranx-barholomaei-F48_S86_L001_R1_001.fastq
-
Caranx-barholomaei-F48_S86_L001_R2_001.fastq
-
Caranx-crysos-AJ89_S40_L001_R1_001.fastq
-
Caranx-crysos-AJ89_S40_L001_R2_001.fastq
-
Caranx-latus-RZ97-267_S51_L001_R1_001.fastq
-
Caranx-latus-RZ97-267_S51_L001_R2_001.fastq
-
Centropomus-parallelus-AJ39_S55_L001_R1_001.fastq
-
Centropomus-parallelus-AJ39_S55_L001_R2_001.fastq
-
Cephalopholis-fulva-D11_S17_L001_R1_001.fastq
-
Cephalopholis-fulva-D11_S17_L001_R2_001.fastq
-
Chaetodipterus-faber-AC19_S25_L001_R1_001.fastq
-
Chaetodipterus-faber-AC19_S25_L001_R2_001.fastq
-
Chilomycterus_spinosus-AU38_S93_L001_R1_001.fastq
-
Chilomycterus_spinosus-AU38_S93_L001_R2_001.fastq
-
Chloroscombrus-chrysurus-E53_S74_L001_R1_001.fastq
-
Chloroscombrus-chrysurus-E53_S74_L001_R2_001.fastq
-
Conodon-nobilis-AF40_S78_L001_R1_001.fastq
-
Conodon-nobilis-AF40_S78_L001_R2_001.fastq
-
Cookeolus-japonicus-AF32_S94_L001_R1_001.fastq
-
Cookeolus-japonicus-AF32_S94_L001_R2_001.fastq
-
Coryphaena-equiselis-B22_S5_L001_R1_001.fastq
-
Coryphaena-equiselis-B22_S5_L001_R2_001.fastq
-
Cynoscion-jamaicensis-RNP06_S20_L001_R1_001.fastq
-
Cynoscion-jamaicensis-RNP06_S20_L001_R2_001.fastq
-
Cynoscion-leiarchus-RNP11_S59_L001_R1_001.fastq
-
Cynoscion-leiarchus-RNP11_S59_L001_R2_001.fastq
-
Cynoscion-striatus-E25_S41_L001_R1_001.fastq
-
Cynoscion-striatus-E25_S41_L001_R2_001.fastq
-
Dermatolepis-inermis-RZ97-218_S3_L001_R1_001.fastq
-
Dermatolepis-inermis-RZ97-218_S3_L001_R2_001.fastq
-
Diapterus-auratus-AJ42_S46_L001_R1_001.fastq
-
Diapterus-auratus-AJ42_S46_L001_R2_001.fastq
-
Epinephelus-adscensionis-RZ97-144_S29_L001_R1_001.fastq
-
Epinephelus-adscensionis-RZ97-144_S29_L001_R2_001.fastq
-
Epinephelus-marginatus-RZ97-292_S75_L001_R1_001.fastq
-
Epinephelus-marginatus-RZ97-292_S75_L001_R2_001.fastq
-
Epinephelus-morio-AK22_S76_L001_R1_001.fastq
-
Epinephelus-morio-AK22_S76_L001_R2_001.fastq
-
Genidens-barbus-AN11_S36_L001_R1_001.fastq
-
Genidens-barbus-AN11_S36_L001_R2_001.fastq
-
Gymnothorax-miliares-AU100_S82_L001_R1_001.fastq
-
Gymnothorax-miliares-AU100_S82_L001_R2_001.fastq
-
Haemulon-parra-D20_S15_L001_R1_001.fastq
-
Haemulon-parra-D20_S15_L001_R2_001.fastq
-
Hemiramphus-brasiliensis-CA24_S68_L001_R1_001.fastq
-
Hemiramphus-brasiliensis-CA24_S68_L001_R2_001.fastq
-
Hyporthodus-niveatus-AC36_S21_L001_R1_001.fastq
-
Hyporthodus-niveatus-AC36_S21_L001_R2_001.fastq
-
Isopisthus-parvipinnis-AN78_S39_L001_R1_001.fastq
-
Isopisthus-parvipinnis-AN78_S39_L001_R2_001.fastq
-
Lobotes-surinamensis-AQ06_S4_L001_R1_001.fastq
-
Lobotes-surinamensis-AQ06_S4_L001_R2_001.fastq
-
Lophius-gastrophysus-AK42_S91_L001_R1_001.fastq
-
Lophius-gastrophysus-AK42_S91_L001_R2_001.fastq
-
Lopholatilus-villarii-AE76_S11_L001_R1_001.fastq
-
Lopholatilus-villarii-AE76_S11_L001_R2_001.fastq
-
Lutjanus-analis-D30_S52_L001_R1_001.fastq
-
Lutjanus-analis-D30_S52_L001_R2_001.fastq
-
Macrodon-ancylodon-E41_S87_L001_R1_001.fastq
-
Macrodon-ancylodon-E41_S87_L001_R2_001.fastq
-
Menticirrhus-americanus-E32_S26_L001_R1_001.fastq
-
Menticirrhus-americanus-E32_S26_L001_R2_001.fastq
-
Merluccius_hubbsi-AB31_S2_L001_R1_001.fastq
-
Merluccius_hubbsi-AB31_S2_L001_R2_001.fastq
-
Micropogonias-furnieri-A81_S1_L001_R1_001.fastq
-
Micropogonias-furnieri-A81_S1_L001_R2_001.fastq
-
Mullus-argentinae-AH84_S53_L001_R1_001.fastq
-
Mullus-argentinae-AH84_S53_L001_R2_001.fastq
-
Mycteroperca-acutirostris-AI03_S62_L001_R1_001.fastq
-
Mycteroperca-acutirostris-AI03_S62_L001_R2_001.fastq
-
Nemadactylus-bergi-AF95_S19_L001_R1_001.fastq
-
Nemadactylus-bergi-AF95_S19_L001_R2_001.fastq
-
Ocyurus-chrysurus-D24_S42_L001_R1_001.fastq
-
Ocyurus-chrysurus-D24_S42_L001_R2_001.fastq
-
Oligoplites-saurus-AH62_S57_L001_R1_001.fastq
-
Oligoplites-saurus-AH62_S57_L001_R2_001.fastq
-
Opisthonema-oglinum-E1_S61_L001_R1_001.fastq
-
Opisthonema-oglinum-E1_S61_L001_R2_001.fastq
-
Orthopristis-rubra-AI47_S35_L001_R1_001.fastq
-
Orthopristis-rubra-AI47_S35_L001_R2_001.fastq
-
Pagrus-pagrus-AB52_S13_L001_R1_001.fastq
-
Pagrus-pagrus-AB52_S13_L001_R2_001.fastq
-
Paralonchurus-brasiliensis-AC43_S65_L001_R1_001.fastq
-
Paralonchurus-brasiliensis-AC43_S65_L001_R2_001.fastq
-
Polyprion-americanus-AE84_S58_L001_R1_001.fastq
-
Polyprion-americanus-AE84_S58_L001_R2_001.fastq
-
Pomatomus-saltatrix-AB41_S88_L001_R1_001.fastq
-
Pomatomus-saltatrix-AB41_S88_L001_R2_001.fastq
-
Priacanthus-arenatus-AP04_S84_L001_R1_001.fastq
-
Priacanthus-arenatus-AP04_S84_L001_R2_001.fastq
-
Prionotus-nudigula-A12_S38_L001_R1_001.fastq
-
Prionotus-nudigula-A12_S38_L001_R2_001.fastq
-
Pseudopercis-numida-CA22_S12_L001_R1_001.fastq
-
Pseudopercis-numida-CA22_S12_L001_R2_001.fastq
-
Pseudupeneus-maculatus-D06_S16_L001_R1_001.fastq
-
Pseudupeneus-maculatus-D06_S16_L001_R2_001.fastq
-
Python-Scripts.py
-
R-Scripts.R
-
Raneya-brasiliensis-AI55_S32_L001_R1_001.fastq
-
Raneya-brasiliensis-AI55_S32_L001_R2_001.fastq
-
README.md
-
Rhomboplites-aurorubens-AH87_S28_L001_R1_001.fastq
-
Rhomboplites-aurorubens-AH87_S28_L001_R2_001.fastq
-
Rypticus-randalli-AJ59_S44_L001_R1_001.fastq
-
Rypticus-randalli-AJ59_S44_L001_R2_001.fastq
-
Sarda-sarda-AG07_S64_L001_R1_001.fastq
-
Sarda-sarda-AG07_S64_L001_R2_001.fastq
-
Sardinella-aurita-Sau1_S18_L001_R1_001.fastq
-
Sardinella-aurita-Sau1_S18_L001_R2_001.fastq
-
Sardinella-brasiliensis-AP38_S56_L001_R1_001.fastq
-
Sardinella-brasiliensis-AP38_S56_L001_R2_001.fastq
-
Sardinops-sagax-AP51_S67_L001_R1_001.fastq
-
Sardinops-sagax-AP51_S67_L001_R2_001.fastq
-
Scomberomorus-brasiliensis-AF20_S48_L001_R1_001.fastq
-
Scomberomorus-brasiliensis-AF20_S48_L001_R2_001.fastq
-
Scorpaena-brasiliensis-AC18_S10_L001_R1_001.fastq
-
Scorpaena-brasiliensis-AC18_S10_L001_R2_001.fastq
-
Selene-vomer-AI42_S43_L001_R1_001.fastq
-
Selene-vomer-AI42_S43_L001_R2_001.fastq
-
Sphyraena-guachancho-AC87_S60_L001_R1_001.fastq
-
Sphyraena-guachancho-AC87_S60_L001_R2_001.fastq
-
Thyrsites_lepidopodea-AL23_S47_L001_R1_001.fastq
-
Thyrsites_lepidopodea-AL23_S47_L001_R2_001.fastq
-
Umbrina-conosai-A1_S49_L001_R1_001.fastq
-
Umbrina-conosai-A1_S49_L001_R2_001.fastq
-
UNIX-Scripts.sh
-
Upeneus-parvus-AN96_S7_L001_R1_001.fastq
-
Upeneus-parvus-AN96_S7_L001_R2_001.fastq
-
Urophycis-brasiliensis-E94_S50_L001_R1_001.fastq
-
Urophycis-brasiliensis-E94_S50_L001_R2_001.fastq
Abstract
Background
Phylogenetic gaps of public databases of reference sequences are a major obstacle for comparative genomics and management of marine resources, particularly in the Global South, where economically important fisheries and conservation flagship species often lack closely-related references. We applied target-enrichment to obtain complete mitochondrial genomes of marine ichthyofauna from the Brazilian coast selected based on economic significance, conservation status and lack of phylogenetically-close references. These included sardines (Dorosomatidae, Alosidae), mackerels (Scombridae) croakers (Sciaenidae), groupers (Epinephelidae) and snappers (Lutjanidae).
Results
Custom baits were designed to enrich mitochondrial DNA across a broad phylogenetic range of fishes. Sequencing generated approximately 100k reads per sample, which were assembled in a total of 70 complete mitochondrial genomes and include fifty-two new additions to GenBank, including five species with no previous mitochondrial data. Departures from the typical gene content and order occurred in only three taxa and mostly involved tRNA gene duplications. Start-codons for all genes, except Cytochrome C Oxidase subunit I (COI), were consistently ATG, whilst a wide range of stop-codons deviated from the prevailing TAA. Phylogenetic analysis confirmed assembly accuracy and revealed signs of cryptic diversification within the Mullus genus. Lineage delimitation methods using Sardinella aurita and S. brasiliensis mitochondrial genomes support a single Operational Taxonomic Unit.
Conclusions
Target enrichment was highly efficient, providing complete novel mitochondrial genomes with little sequencing effort. These sequences are deposited in public databases to enable subsequent studies in population genetics and adaptation of Latin American fish species and serve as a vital resource for conservation and management programs that rely on molecular data for species and genus-level identification.
README: Raw sequence data and analytical resources for "Mitochondrial genome structure and composition in 70 fishes: a key resource for fisheries management in the South Atlantic"
https://doi.org/10.5061/dryad.rr4xgxdg4
This repository contains raw sequence data (Illumina MiSeq reads 150 bp) and scripts applied for all analyses performed in the paper "Mitochondrial genome structure and composition in 70 fishes: a key resource for fisheries management in the South Atlantic" Alvarenga & D'Elia et al. 2024.
Analytical Resources 1: Genome Assembly, Mapping, and Pilon Pipeline (MA, FH, AD - September 2021 - September 2022): UNIX-Scripts.sh
Section 1: GENOME ASSEMBLY WITH NOVOPLASTY (MA - September 2021)
- About: Assembly of mitochondrial genomes using NOVOPlasty.
- Requirements: Reads (.fastq) and config files (.txt) in the working folder.
- Contig files: The config files can be downloaded here (https://github.com/ndierckx/NOVOPlasty/blob/master/config.txt)
- Parameters: K-mer = 39, Insert Size = 550, Optional parameters = Insert Range = 1.8, Insert Range strict = 1.3.
Section 2: MAPPING AND PILON PIPELINE (FH & MA - September 2021)
- About: Mapping mitochondrial genomes, correcting errors with Pilon, and re-mapping.
- Requirements: Reference assembly (.fasta), reads (.fastq), and Mitofish results (.txt, .fas, _genes.fa, etc).
- Check that the sample names (prefix) of the refs and reads are identical
- It will create a new folder “pilon_results” containing the remaped reads to the corrected fasta assembly
Section 3: MAXIMUM LIKELIHOOD PHYLOGENETIC INFERENCE PIPELINE (AD - September 2022)
- About: Maximum likelihood phylogenetic analysis using local BLAST and RAxML.
- Requirements: Reference database (Mitofish_db) and assembled sequences after quality control (SAMPLES.fa).
- It will create a new FASTA file (multifasta.fa) containing reference genomes and your .fa files
- The new FASTA file will be used as input file for the ML inference
Analytical Resources 2: Extract Coverage Stats and Alignment Rate (MA - April 2022): Python-Scripts.py
Section 1: Extract Coverage Stats with COVERAGE-STATS.PY
- About: Calculate basic coverage statistics.
- Requirements: 'Sample_ID.txt' with genome names and '$0.cov.txt' files from the mapping pipeline.
Section 2: Extract Percentage of Mapping Stats with ALIGNMENT-RATE.PY
- About: Extract alignment percentage from all genomes.
- Requirements: 'Sample_ID.txt' with genome names and '$0.stat.txt' files from the mapping pipeline.
Analytical Resources 3: Species Delimitation by GMYC/Splits Package (MA and AKPD - April-July 2022): R-Scripts.R
Section 1: Species Delimitation by GMYC/splits package (AKPD - July 2022)
- Requirements: Downloading the 'splits' R package and a bayesian tree file in NEXUS format.