Despite many studies illustrating the perils of utilising mitochondrial DNA in phylogenetic studies, it remains one of the most widely used genetic markers for this purpose. Over the last decade, nuclear introns have been proposed as alternative markers for phylogenetic reconstruction. However, the resolution capabilities of mtDNA and nuclear introns have rarely been quantified and compared. In the current study we generated a novel ∼5 kb dataset comprising six nuclear introns and a mtDNA fragment. We assessed the relative resolution capabilities of the six intronic fragments with respect to each other, when used in various combinations together, and when compared to the traditionally used mtDNA. We focused on a major clade in the horseshoe bat family (Afro-Palaearctic clade; Rhinolophidae) as our case study. This old, widely distributed and speciose group contains a high level of conserved morphology. This morphological stasis renders the reconstruction of the phylogeny of this group with traditional morphological characters complex. We sampled multiple individuals per species to represent their geographic distributions as best as possible (122 individuals, 24 species, 68 localities). We reconstructed the species phylogeny using several complementary methods (partitioned Maximum Likelihood and Bayesian and Bayesian multispecies-coalescent) and made inferences based on consensus across these methods. We computed pairwise comparisons based on Robinson–Foulds tree distance metric between all Bayesian topologies generated (27,000) for every gene(s) and visualised the tree space using multidimensional scaling (MDS) plots. Using our supported species phylogeny we estimated the ancestral state of key traits of interest within this group, e.g. echolocation peak frequency which has been implicated in speciation. Our results revealed many potential cryptic species within this group, even in taxa where this was not suspected a priori and also found evidence for mtDNA introgression. We demonstrated that by using just two introns one can recover a better supported species tree than when using the mtDNA alone, despite the shorter overall length of the combined introns. Additionally, when combining any single intron with mtDNA, we showed that the result is highly similar to the mtDNA gene tree and far from the true species tree and therefore this approach should be avoided. We caution against the indiscriminate use of mtDNA in phylogenetic studies and advocate for pilot studies to select nuclear introns. The selection of marker type and number is a crucial step that is best based on critical examination of preliminary or previously published data. Based on our findings and previous publications, we recommend the following markers to recover phylogenetic relationships between recently diverged taxa (<20 My) in bats and other mammals: ACOX2, COPS7A, BGN, ROGDI and STAT5A.
Fasta alignment files for all genetic loci used
FASTA alignment files of the sequences for each molecular marker used in the current study. The number of individuals is indicated in the file name (e.g. N=122). Information concerning choice of loci, primers used and alignment optimisation is described in section 2.2 of the paper (2.2 DNA extraction & PCR), in Table S2 and section 2.3 Sequencing & alignments. The alignment files provided (and used in the paper) are the optimised alignments (i.e. sites in the alignment which consisted of one or a few individuals having a 1bp insertion were typically highlighted a bad following T-COFFEE analysis and were removed). The original sequences are available on GenBank (see Table S1).
1_Fasta_alignment_files.zip
Maximum likelihood tree files
Maximum likelihood tree files generated using RAXML for each nuclear intron and mtDNA and 6-nuclear intron partition. Please see the ReadME file for the full data package for further information.
3_ML_tree_files.zip
PAML MCMCTREE dated phylogeny
PAML MCMCTREE dated phylogeny based on the Bayesian 6-nuclear topology generated in BEAST. Please see the ReadME file for the full data package for further information.
5_PAML.zip
Bayesian tree files - Gene Trees
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
1_Individual_gene_LogC_TAn_files.zip
Bayesian tree files - 6-intron partitioned tree
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
2_6NucIntronTopology_LogC_TAn_files.zip
Bayesian tree files - 5-intron partitioned topologies
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
3_5NucTopologies_LogC_TAn_files.zip
Bayesian tree files - Combinations-of-two-introns topologies
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
4_Intron_combinations_LogC_TAn_files.zip
Bayesian tree files - Combinations-of-mtDNA+each intron in turn
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
5_mtDNA_with_each_intron_LogC_TAn_files.zip
Bayesian tree files - Subsets of the mtDNA locus topologies
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
6_parts_of_mtDNA_LogC_TAn_files.zip
Bayesian tree files - 6-intron+mtDNA partitioned topology
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
7_6NucIntronPlusMtDNATopology_LogC_TAn_files.zip
Bayesian tree files - Rhinolophus capensis clade, mtDNA topology
Tree files for the Bayesian analysis (NEXUS format). All Bayesian analysis were conducted in BEAST v. 1.8. This is described in section 2.4 of the paper (2.4 Phylogenetic reconstruction). The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.4. The LogCombiner files contain all trees generated from 3 independent runs with burn-in removed for each analysis (i.e. 27,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 27,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files. ‘P’ in the file name simply stands for ‘partitioned’; however all phylogenetic analyses with more than one locus were partitioned whether this is stated in the file name or not. See Readme file section 3 for more information
8_CapensisCladeOnly_mtDNA_LogC_TAn_files.zip
Tree files for the Bayesian multi-species coalescent (*BEAST; NEXUS format).
Tree files for the Bayesian multi-species coalescent (*BEAST; NEXUS format). This analysis is described in section 2.5 of the paper, Species delimitation & divergence time estimation, and was conducted in BEAST v. 1.8. The files were generated using the software associated with BEAST: LogCombiner and TreeAnnotator as described in section 2.5. The LogCombiner files contain all trees generated from 5 independent runs with burn-in removed for each analysis (i.e. 180,000 trees); here called ‘LogC’ files. The TreeAnnotator files contain a single consensus topology generated from these 180,000 trees (a maximum clade credibility tree, keeping target node heights); here called ‘TAn’ files.
4_StarBEAST.zip