Phylogenetic analytical data, measurement analytical code to help identify dental fossils of Oligocene hyracoids from eastern Africa
Data files
Oct 17, 2024 version files 159.14 MB
-
hyracoid_phylogenetic_analysis_mrbayes.zip
159.10 MB
-
hyracoid_phylogenetic_analysis_tnt.zip
15.03 KB
-
hyracoidea_topernawi_r_code.zip
9.57 KB
-
README.md
7.26 KB
-
Topernawi_hyracoid_published_tooth_sizes.csv
6.15 KB
Abstract
The Topernawi area of west Turkana, northern Kenya, preserves a number of recently discovered vertebrate fossil localities of mid-Oligocene age. The Topernawi fauna provides important new data on mammalian evolution in equatorial eastern Africa during the mid-Cenozoic. Here, we describe five new species of hyracoids from Topernawi: Nengohyrax josephi, Abdahyrax philipi, Geniohyus ewoii, Thyrohyrax lokutani, and Thyrohyrax ekaii. These species range in reconstructed body mass from ∼8 to ∼150 kg, comparable to the body size range that has been observed at other hyracoid-rich Paleogene sites. We use Bayesian tip-dating phylogenetic analyses to estimate hyracoid relationships. We find that non-Thyrohyrax species from Topernawi are members of Geniohyidae, a clade of bunodont, Paleogene hyracoids. Despite being approximately the same age as some of the youngest and best-sampled horizons in the Jebel Qatrani Formation (Fayum, northern Egypt), the Topernawi hyracoid fauna is distinct, and shows no overlap at the species level; it also shows no species overlap with the ∼1.5–2.5 Ma younger Chilga localities in northern Ethiopia. The hyracoid assemblage from Topernawi adds to a growing body of evidence which suggests that certain distinctive clades known from earlier Oligocene horizons in northern Africa (Saghatherium, Selenohyrax, Titanohyrax) did not persist into the late Oligocene.
https://doi.org/10.5061/dryad.2jm63xsw2
This repository includes
- analytical code to replicate body size estimates
- analytical code to replicate other evaluations of tooth sizes and proportions, including coefficients of variation and upper vs. lower size plots
- analytical code to replicate phylogenetic analyses and process output.
Description of the data and file structure
File list:
Topernawi_hyracoid_published_tooth_sizes.csv
(measurements in millimeters, copy of Table 1 of associated publication)- Species refers to each species described in the associated publication.
- Locus refers to the tooth position of each tooth.
- uppercase refers to upper teeth, lowercase refers to lower teeth
- ‘I’ = incisor, ‘C’ = canine, ‘P’ = premolar, ‘M’ = molar, ‘d’ = deciduous
- ’#’ refers to the position of the tooth within a dental field (ex: m1 = first molar, m2 = second molar, m3 = third molar)
- “L” refers to mesiodistal length of the tooth crown (in mm)
- “W_M” refers to mesial buccolingual width (in mm)
- “W_D” refers to mesial buccolingual width (in mm)
- “NA” refers to measurements that could not be taken, usually because likely either because damage prevented an accurate measurement.
- Zipped file
hyracoidea_topernawi_r_code.zip
contains the following scripts described further in the Code/Software section:hyracoid_mass_estimate.R
plot_tree-JVP.R
topernawi_dimensions.R
topernawi_association_context.R
- This script calls additional data files associated with a separate publication (Vitek and Princehouse, 2024), reposited here: https://doi.org/10.5061/dryad.n2z34tn40
- Zipped file:
hyracoid_phylogenetic_analysis_mrbayes.zip
- in each of the folders
mk
andmkv
- Input files:
hyracoidea_mrbayes.nex
is a copy of the nexus input file for either analysis. In addition to a file header, it contains data in matrix format where rows are species, columns are morphological characters, and cells contain character state scores in integer format.hyracoid_run_mk[v].txt
contains model settings used in the phylogenetic analysis, including tip dates, character models, etc.hyracoid_mrbayes_mk[v].sh
is a shell script for submitting the main part of the analysis as a job to a SLURM workload manager
- Output files:
- files ending in .log, .out, .mcmc, .lstat, .ckp, and everything in the
allcompat
andhalfcompat
folders are all output files.
- files ending in .log, .out, .mcmc, .lstat, .ckp, and everything in the
- Input files:
- in each of the folders
- Zipped file:
hyracoid_phylogenetic_analysis_txt.zip
- Input:
hyracoid_tnt_ord.tnt
is the input data file. In addition to a file header, it contains data in matrix format where rows are species, columns are morphological characters, and cells contain character state scores in integer format.hyracoid_tnt_ord.tnt
is the set of commands input into TNT, with notes and annotations left in between /* and */ symbols..out
files are output log files.-
.tre
and.nex
files are phylogenetic tree files in Newick format output by the program Stats.run
is a helper file used when Bremer support values were calculated.
- Input:
Phylogenetic analysis NEXUS file (matrix) data is also available on a separate repository, MorphoBank, under project P4786 (http://morphobank.org/permalink/?P4786.
Sharing/Access information
This data is associated with the publication: Vitek, N.S., Seiffert, E.R., Heritage, S., Gaiku, M.W., Feibel, C.S., Sousa, F.J., Nengo, I.O., Aoron, E.E., Princehouse, P.E. In Press. Hyracoidea from the Oligocene locality Topernawi, Turkana Basin, Kenya. Journal of Vertebrate Paleontology. doi: 10.1080/02724634.2024.2409326
Please cite that publication if you use the data or code from this repository.
Corresponding Author:
Natasha S. Vitek
Department of Ecology & Evolution, Stony Brook University
natasha.vitek@stonybrook.edu
Code/Software
The code files and are organized as follows for analysis:
For analyses of tooth sizes, including estimation of body mass.
Each of the following R scripts can be run on its own. Each calls the data file Topernawi_hyracoid_published_tooth_sizes.csv
-
hyracoid_mass_estimate.R
: This script estimates body masses of hyracoid species from Topernawi based on the lengths of their second molars using previously published equations. topernawi_dimensions.R
: This script formats and plots length and width information, and calculates some ratios used for species differentiation.topernawi_association_context.R
: This scripts formats and plots additional metrics that can be used to evaluate associations between isolated specimens, such as coefficients of variation for collected specimens, proportions of upper vs lower molars, and proportions of teeth from different loci in the same arcade (ex: m1 vs m3 size).- This script calls additional data files associated with a separate publication (Vitek and Princehouse, 2024), reposited here: https://doi.org/10.5061/dryad.n2z34tn40
For phylogenetic analyses
- Bayesian analyses:
- Input files were uploaded to the SeaWulf High Performance Computing Cluster at Stony Brook University. The entry point is the shell script
hyracoid_mrbayes_mk[v].sh
, which calls the .nex and .txt file. - The analysis outputs the .log, .out, .mcmc, .lstat, .ckp files automatically.
- After the primary analysis was finished and output was checked for convergence, the following commants were run in MrBayes on a personal computer to generate all additional output, including folders for different tree types or test the effect of burnin fraction:
- Input files were uploaded to the SeaWulf High Performance Computing Cluster at Stony Brook University. The entry point is the shell script
set dir=D:\\Dropbox\Documents\research\Turkana\hyracoidea\phylogeny\hyracoid_mrbayes_v4\mk
execute hyracoidea_mrbayes.nex
sump nrun=2
sump nrun=2 burninfrac =0.5
sumt nrun=2
sumt nrun=2 contype = allcompat filename=hyracoidea_mrbayes.nex [Ended up moving output files to a new directory so they weren't overwritten by next line]
sumt nrun=2 contype = halfcompat
sumt nrun=2 contype = allcompat burninfrac =0.5
- TNT analyses:
- TNT was used in graphical user interface (GUI) format. Input data
hyracoid_tnt_ord.tnt
was read into the program, and commands inTNT_commands_ordered.txt
were input in blocks to conduct the analysis. The program automatically called the helper scriptStats.run
. - The scripts output .out [output information, including file log] and .tre [phylogenetic tree data in Newick format] files
- TNT was used in graphical user interface (GUI) format. Input data
- Resulting trees are plotted for publication using the R script:
plot_tree-JVP.R
- This script calls output from phylogenetic analyses
Code in the R language was written using R version 4.3.1
Phylogenetic analysis code was written to run in either (A) MrBayes verson 3.2.6, parallel (MPI) version on the SeaWulf computing cluster at Stony Brook University, or (B) TNT version 1.5 running on a personal Windows computer.
In short, this dataset contains a copy of raw measurement data, and a snapshot of code as analyzed in a study identifying, describing, and phylogenetically analyzing a set of fossilized teeth of Hyracoidea from the Oligocene locality of Topernawi in eastern Africa. A live version of the code can be found in the GitHub repository 'hyracoid-locus-example' managed by the lead author (GitHub username: nsvitek). Full citations for associated references can be found in the manuscript associated with this dataset. Raw data consists of linear measures. Please see more detailed materials and methods of associated manuscript.