Mitochondrial DNA (mtDNA) has formed the backbone of phylogeographic research for many years, however, recent trends focus on genome-wide analyses. One method proposed for calibrating inferences from noisy Next-Generation data, such as RAD sequencing, is to compare these results with analyses of mitochondrial sequences. Most researchers using this approach appear to be unaware that many Single Nucleotide Polymorphisms (SNPs) identified from genome-wide sequence data are themselves mitochondrial, or assume that these are too few to bias analyses. Here we demonstrate two methods for mining mitochondrial markers using RAD sequence data from three South African species of yellowfish, Labeobarbus. First, we use a rigorous SNP discovery pipeline using the program STACKS, to identify variant sites in mtDNA, which we then combine into haplotypes. Secondly, we directly map sequence reads against a mitochondrial genome reference. This method allowed us to reconstruct up to 98% of the Labeobarbus mitogenome. We validated these mitogenome reconstructions through BLAST database searches and by comparisons with cytochrome b gene sequences obtained through Sanger sequencing. Finally, we investigate the organismal consequences of these data including ancient genetic exchange and a recent translocation among populations of L. natalensis, as well as interspecific hybridisation between L. aeneus and L. kimberleyensis.
Raw sequence reads: PRJNA493727
This is a link out to the raw RAD data files for both Orange-Vaal
yellowfish libraries, batch 2 and batch 3. The files are given as
paired-end FastQ files, with _1 referring to Read 1 files and _2 Read 2
files. Read 1 files are associated with the enzyme NlaIII, whereas Read 2
files are the paired reads associated with MluCI. Sequence data was
generated by an Illumina HiSeq 2000 for batch 2, and an Illumina HiSeq 4000
for batch 3. The sequencing depth between libraries also differs. Some
initial filtering has been performed on these data files by the sequencing
company (Beijing Genomics Institute, Hong Kong), however they may require
additional filtering as discussed in the paper. FastQ files are in the
standard format, with all barcodes and adapters trimmed (apart from any
pollution due to overlap). Replicated samples are included in this set of
files - for batch 2 this is sample LaO05 which is split into replicates
LaO05-1 and LaO05-2 (6 files for each replicate instead of the standard 2).
In batch 3 the replicates are 018A3, 029K5, 047K5, and LkA06 where each
replicate pair is specified as either -A or -B (two files per replicate,
the same as the other unreplicated samples). Sample names match those given
in the Supporting Information - please refer to the Supporting Information
if you require further information relating to the specimens.
filtering_data_files
This directory contains files relating to filtering the data. These are divided into two subdirectories, FastQC_results and Regular_expressions_removing_adapter_pollution
FastQC_results:
This directory contains the FastQC output for each Read 1 or Read 2 raw data file for the 69 samples (including separated replicate files). The output can be viewed in a web browser.
Regular_expressions_removing_adapter_pollution:
This file contains information on the regular expressions we used to remove adapter pollution from our sequence data.
stacks-based_mtdna_snp_identification_approach
This directory is split into two subfolders - one for KwaZulu-Natal yellowfish and one for Orange-Vaal yellowfish. The same parameters (-m 5 -M 2 -n 2 --max_locus_stacks 7 -r 0) were used between these analyses.
Files are provided for the final analysis where mitochondrial loci had been identified from the initial Stacks results and run separately on a whitelist to provide results specific to only these candidate mitochondrial loci.
For further details of the contents of this directory, please refer to the README.
mitogenome_mapping_approach_result_files
This directory contains the NEXUS and XML files resulting from the mitogenome mapping approach.
Consensus sequences for each of the 69 samples may be extracted from the NEXUS file, which was also used to generate the XML files, run TCS to produce haplotype networks, and run MrBayes to produce a phylogram. This NEXUS file includes the outgroup (Labeobarbus intermedius) which was removed for the MrBayes analysis. KwaZulu-Natal yellowfish and Orange-Vaal yellowfish samples were separated for the TCS analysis.
The XML files include three replicates of the run used to produce the Yule Process chronogram (Figure 6). These replicates were run to produce .log files which were assessed in Tracer. Tree output files were viewed independently using TreeAnnotator and FigTree, combined in LogCombiner and then reprocessed in TreeAnnotator and viewed in FigTree.
nuclear_comparison_approach
This directory is split into two subfolders - one for KwaZulu-Natal yellowfish and one for Orange-Vaal yellowfish. The KwaZulu-Natal yellowfish dataset is made up of data from Read 1 fragments only, using the parameters -m 3 -M 3 -n 2 -r 0.8 --max_locus_stacks 7 --min_maf 0.03 --write_single_snp. The Orange-Vaal yellowfish dataset also uses only Read 1 fragments, and the parameters -m 3 -M 2 -n 1 -r 0.8 --max_locus_stacks 7 --min_maf 0.015 --write_single_snp.
Within each of these directories there are another two subfolders separating the nuclear runs filtering out and retaining mitochondrial SNPs (mtDNA_filtered or mtDNA_retained).
For more detailed information on the files contained in this directory, please refer to the README.