Data from: Mitochondrial DNA for phylogeny building: Assessing individual and grouped mtGenes as proxies for the mtGenome in Platyrrhines
Data files
Mar 13, 2025 version files 14.71 MB
-
12S_16S_Cebidae.nex
310.01 KB
-
12S_16S_Platyrrhini.nex
353.81 KB
-
12S_Cebidae.nex
122.10 KB
-
12S_Platyrrhini.nex
136.94 KB
-
16S_Cebidae.nex
193.86 KB
-
16S_Platyrrhini.nex
224.71 KB
-
ATP6_Cebidae.nex
88.27 KB
-
ATP6_Platyrrhini.nex
97.73 KB
-
ATP8_Cebidae.nex
30.67 KB
-
ATP8_Platyrrhini.nex
35.29 KB
-
COX_Genes_Cebidae.nex
371.11 KB
-
COX_Genes_Platyrrhini.nex
410.70 KB
-
COX1_Cebidae.nex
193.51 KB
-
COX1_Platyrrhini.nex
218.52 KB
-
COX2_Cebidae.nex
90.07 KB
-
COX2_Platyrrhini.nex
99.71 KB
-
COX3_Cebidae.nex
100.75 KB
-
COX3_Platyrrhini.nex
111.33 KB
-
CYB_Cebidae.nex
143.35 KB
-
CYB_COX3_Cebidae.nex
237.42 KB
-
CYB_COX3_Platyrrhini.nex
261.94 KB
-
CYB_DLOOP_Cebidae.nex
286.38 KB
-
CYB_ND2_Cebidae.nex
268.27 KB
-
CYB_Platyrrhini.nex
161.36 KB
-
DLOOP_Cebidae.nex
149.58 KB
-
mtGenome_Cebidae.nex
2.01 MB
-
mtGENOME_CEBIDAE.tre
14.19 KB
-
mtGENOME_PLATYRRHINE.tre
76.76 KB
-
mtGenome_Platyrrhini.nex
2.35 MB
-
ND_Genes_Cebidae.nex
769.86 KB
-
ND_Genes_Platyrrhini.nex
849.73 KB
-
ND1_Cebidae.nex
121.38 KB
-
ND1_Platyrrhini.nex
134.16 KB
-
ND2_Cebidae.nex
131.46 KB
-
ND2_CYB_Platyrrhini.nex
295.99 KB
-
ND2_Platyrrhini.nex
145.38 KB
-
ND3_Cebidae.nex
48.30 KB
-
ND3_Platyrrhini.nex
53.51 KB
-
ND4_Cebidae.nex
171.54 KB
-
ND4_ND5_Cebidae.nex
388.62 KB
-
ND4_ND5_Platyrrhini.nex
429.32 KB
-
ND4_Platyrrhini.nex
189.74 KB
-
ND4L_Cebidae.nex
42.18 KB
-
ND4L_Platyrrhini.nex
47.04 KB
-
ND5_Cebidae.nex
223.63 KB
-
ND5_Platyrrhini.nex
247.44 KB
-
ND6_Cebidae.nex
70.62 KB
-
ND6_Platyrrhini.nex
79.51 KB
-
README.md
4.63 KB
-
Sample_info.pdf
511.02 KB
-
Sample_info.xlsx
39.37 KB
-
Shortest_Cebidae.nex
255.67 KB
-
Shortest_Platyrrhini.nex
283.72 KB
-
VisualTreeCMP_scores.csv
36.23 KB
Abstract
Phylogenetic trees are analytic tools used in primate studies to elucidate evolutionary relationships. Because of its relative ease to sequence and rapid evolution compared to nuclear genomes, mitochondrial DNA is frequently used for phylogeny building. This project evaluated the effectiveness of using individual or grouped mitochondrial genes (mtGenes) as a proxy for the mitochondrial genome (mtGenome) in phylogeny building within two nested primate datasets, Cebidae and Platyrrhini, with differing divergence dates. MtGene utility rankings were determined based on congruence values to the mtGenome tree. MtGenes trees were also assessed on tree resolution and ability to sort nested clades. We found that most individual mtGenes, including ribosomal genes (12S and 16S), COX genes, most ND genes, and D-Loop are not appropriate for use as proxies for the mtGenome when tree building in either the Cebidae or Platyrrhini set. On average, grouped mtGenes outperformed individual mtGenes in both sets, and mtGene and grouped mtGene rankings varied between sets. Pairing CYB and COX3 together or pairing ND2 and CYB worked well in both the Cebidae set and Platyrrhini set. We also found that nucleotide diversity is not a predictor of mtGene performance. Instead, it may be that unique mtGene or mtGene system evolutionary history impacts mtGene performance.
https://doi.org/10.5061/dryad.q2bvq83w8
Description of the data and file structure
Files and variables
Alignment files: all "mtGene"_"Clade".nex files
Description: Nexus files of the nucleotide alignments of all mtGenes, grouped mtGenes, and the mtGenomes. The name of the datafile indicates the mtGene (e.g. "ND5") or group of mtGenes (e.g. "ND4_ND5") followed by which analysis set (Cebidae or Platyrrhini). IDs for the newly assembled genomes use naming is consistent with the mtGenome sample code IDs, which can be found in the Sample info spreadsheet. The nexus files were generated through extraction from whole mtGenome data, some newly published in this study and available on GenBank (PP454502-PP454561). Extractions were either present in the mtGenome as annotated from previous authors, or were predicted using Geneious Prime's v. 2023.1.2 (Biomatters Ltd.) prediction and annotation feature using a reference from a close-relative annotated genome:Sapajus xanthosternos (Accession no. KC757410) for Sapajus samples, and Cebus albifrons (Accession no. AJ309866) for Cebus samples. Entire mtGenomes, rRNA genes (n = 2), and D-Loop were aligned using the Clustal Omega v.1.2.2 (Sievers et al., 2011) alignment feature. Protein coding genes (n = 13) were aligned using the Muscle v.5.1 (Edgar, 2022) multiple alignment feature. All gene alignments were additionally checked by eye and re-aligned if necessary. The format of the data includes whether the mtGene identified followed by whether the alignment comes from the Platyrrhini set or the Cebidae set.
File: Sample_info.xlsx
Description: Sample information of all of the samples used in this project. The .xlsx and .pdf (see below) have three sheets/sections. The first ('Platyrrhini info') contains information about the samples used in the Platyrrhini analysis, and the third sheet 'Cebidae info' contains information about the samples used in the Cebidae analysis. Both sheets include the Sample ID as used in the study ('Sample ID'), the Genbank identifier, Species, Latitude, and Longitude of where the samples were collected. The second sheet 'Newly assembled mtGenomes' contains information relevant to the mtGenomes that we assembled, annotated, and published to Genbank. These have more information, including 'Sample ID', or what the samples were called in the project, 'Genbank identifier', 'Species', 'Lima Alignment Identifier', which is the code used in a previous study using the same sample source (Lima 2018), mtGenome sample code, the original naming attached to the raw mtGenome reads, Latitude, Longitude, Voucher or Source, and BioSample Accession number. For Latitude and Longitude, 'Captive' indicates that the animal was captive and 'NA' indicates that the information for locality is not available.
File: Sample_info.pdf
Description: This is the same file as above (Sample_info.xlsx), but in an easier to read .pdf format.
File: VisualTreeCMP_scores.csv
Description: Raw incongruency scores for each set (Cebidae and Platyrrhini) across three PPTs (0.50, 0.75, 0.90). Z-scores are also included for comparison across metrics. This data was achieved using the online tool VisualTreeCMP (https://eti.pg.edu.pl/TreeCmp/), a free and accessible tool for comparing phylogenetic trees using a eighteen possible metrics (separated into rooted or unrooted tree categories). I highly recommend checking this tool out!
File: 'mtGENOME_PLATYRRHINE.tre' and 'mtGENOME_CEBIDAE.tre'
Description: Phylogenetic tree files generated from entire mtGenome alignments using MrBayes.
References:
Edgar, R. C. (2022). HighAccuracy Alignment Ensembles Enable Unbiased Assessments of Sequence Homology and Phylogeny. bioRxiv. .
Lima, M. G. M., J. S. SilvaJnior, D. ern, et al. 2018. A Phylogenomic Perspective on the Robust Capuchin Monkey (Sapajus) Radiation: First Evidence for Extensive Population Admixture Across South America. Molecular Phylogenetics and Evolution 124: 137150.
Sievers, F., A. Wilm, D. Dineen, et al. 2011. Fast, Scalable Generation of HighQuality Protein Multiple Sequence Alignments Using Clustal Omega. Molecular Systems Biology 7: 539. .
Code/software
Any software that can read alignments will work for the nexus (.nex) file. A free option is UGENE. Microsoft Excel can open the excel files, and .pdf bundles are also provided if excel is inaccessible. For the trees, FigTree is a commonly used free option.
This dataset includes four main categories of data: the nexus files, the Visual TreeCmp scores, the mtGenome trees for both Platyrrhini and Cebidae, and the sample info.
(1) The nexus files were generated through extraction from whole mtGenome data, some newly published in this study and available on GenBank (PP454502-PP454561). Extractions were either present in the mtGenome as annotated from previous authors, or were predicted using Geneious Prime's v. 2023.1.2 (Biomatters Ltd.) prediction and annotation feature using a reference from a close-relative annotated genome: Sapajus xanthosternos (Accession no. KC757410) for Sapajus samples, and Cebus albifrons (Accession no. AJ309866) for Cebus samples.Entire mtGenomes, rRNA genes (n = 2), and D-Loop were aligned using the Clustal Omega v.1.2.2 (Sievers et al., 2011) alignment feature. Protein coding genes (n = 13) were aligned using the Muscle v.5.1 (Edgar, 2022) multiple alignment feature. All gene alignments were additionally checked by eye and re-aligned if necessary. The format of the data includes whether the mtGene identified followed by whether the alignment comes from the Platyrrhini set or the Cebidae set.
(2) The Visual Tree CMP scores were generated using trees created from each of the alignments into the Visual TreeCmp online tool (https://eti.pg.edu.pl/TreeCmp/). Thus, each mtGene shows scores for all rooted metrics for both the Cebidae set and the Platyrrhini set. Additionally, some analyses have different posterior-probability thresholds (PPT) sets with corresponding metric data, where the PPT indicates the minimum posterior probability to collapse a clade into a soft polytomy.
(3) The mtGenome trees were the reference trees used to evaluate the scores of the individual mtGenes.
(4) Sample info is simply a collection of information about the samples that we used in each of the datasets.
