Data for: Phylogenetic analyses reveal that horses deviate from a pervasive pattern of skull shape evolution

Published Mar 26, 2026 on Dryad. https://doi.org/10.5061/dryad.crjdfn3g3

Data files

Mar 26, 2026 version files 285.87 MB

EvolutionOfEquidSkull_dryad_final.zip

285.85 MB
README.md

14.07 KB

Abstract

Here, we investigate the evolution of the equid skull through the lens of craniofacial evolutionary allometry (CREA), a pattern of relative facial elongation in larger mammals that has been hypothesized to result from similar patterns of facial elongation over ontogeny. Using 3D geometric morphometrics and linear measurements, we describe the major axes of shape variation in the equid skull, test whether equids follow or deviate from the CREA pattern over the clade’s evolutionary history, and assess whether the evolution of high-crowned teeth (hypsodonty) is related to relative facial proportions, all in an explicitly phylogenetic context. We find that equids deviate from the CREA pattern and that the evolution of hypsodont dentition did not significantly influence facial proportions in the group. Importantly, these results are only apparent when using statistics appropriate for phylogenetic data. In order to produce a phylogeny for comparative analyses we employed a meta-analytical approach, producing the first formally inferred species-level phylogeny for family Equidae.

Comparison with previously published data on facial proportions from modern equids indicates that ontogenetic patterns of facial elongation do not scale to produce patterns of facial proportions observed at the intraspecific and evolutionary levels. Taken together, our results complicate the historic narrative that a single set of selective factors drove patterns of morphological evolution within the group.

Dataset DOI: https://doi.org/10.5061/dryad.crjdfn3g3

Description of the data and file structure:

All landmark and measurement data needed to conduct analyses described in the paper. All source studies and associated inputs used to infer meta tree, as well as the results of those metatree analyses.

Files types present within this dataset:

.csv, .xlsx (can be read into R or viewed in Microsoft Excel or text editor)
Image files: .png (can be viewed and measured in ImageJ (free) )
Landmark data: .pp (can be opened and inspected in standard text editors, can be viewed in 3D in Meshlab (free) )

Tree files: .tre / .trees files can be opened using text editor, visualized in FigTree, or imported into R
.nex (nexus) and .tnt (ant) files are phylogenetic analysis files that can be edited/ viewed in text editors.

EvolutionOfEquidSkull_dryad_final.zip

the main directory contains the follows directories:

equid_metatree: Directory containing files necessary for and output of equid meta tree analysis

metatree_source_studies.xlsx: details on the morphological source studies used in the metatree analysis. This was mainly used for keeping track of the progress of analyses and for taking notes. This file contains the following columns:
- File_name: name of source study/ name of fileSource study: this column is blank
- 3rd column: notes on where analysis files were obtained (many were downloaded from Graeme T. Lloyd’s website database of morphological character-taxon matrices.)
- NEXUS: whether NEXUS analysis file was generated/obtained (yes for all)
- TNT: whether TNT analysis file was generated/obtained (yes for all)
- MRP: whether MRP file was generated/obtained (yes for all)
- TREES: whether TREES file was generated/obtained (yes for all)
- XML: whether XML file was generated/obtained (yes for all)
- PDF: whether PDF file was downloaded and databased (yes, except for book chapters)
- Outgroup: outgroup taxon specified for each analysis
- Ordered char?: whether characters were treated as ordered and weighted in parsimony analysis

Input_data: Directory containing input files used to create metatree - divided into directories based on file type
- Heintzman_2017: directory containing input files derived from the molecular phylogeny of Heintzman et al., 2017
  - BEAST_mtDNADataset5_equidsOnly_allPartns_rootConstrained_inclEovodovi.mcc.tre The maximum clade credibility tree in nexus format (tree of the same name from Heintzman et al., 2018)
  - Heintzman_2017a_pruned1.tre : a pruned version of the mcc tree with a single tip per OTU in phylip format
- MRP: a directory containing 17 matrix representations of most parsimonious trees recovered from reanalysis of 16 original morphological datasets plus the MRP representation of the Heintzman tree. All in nexus form and required as input for the metatree analysis. See metatree_source_studies.xlsx for reference. Can be opened using a text editor or read into R.
- NEXUS: 16 nexus files containing morphological character data from published studies included in the metatree analysis. not used directly in the metatree analysis but converted into TNT files prior to reanalysis and generation of MRP files. Can be opened using a text editor or read into R. File names correspond to source studies names. See metatree_source_studies.xlsx for reference. Can be opened using a text editor or read into R.
- TNT: 16 TNT files generated from the files in NEXUS and analyzed to generate MPTS for inclusion in the metatree analysis Can be opened using a text editor or read into R. File names correspond to source studies names. See metatree_source_studies.xlsx for reference. Can be opened using a text editor or read into R. Analysis was conducted using TNT.
- TREES: .tre files in phylip format containing the most parsimonious trees recovered from reanalysis of the original morphological datasets
- XML: 17 xml files that describe the number of taxa, year of publication, OTU names, and taxonomic reconciliation via the paleobiology database taxonomy for all taxa included in the morphological and molecular analyses. Required to perform Metatree analysis.

Analysis_files: Folder containing two directories
- Metatree_analysis_files folder containing the output files from the metatree fuction. Files as follows:
  - Equids.nex the full MRP file in nexus format
  - Equids.tnt the full MRP file in TNT format
  - EquidsSTR.nex the MRP matrix after safe taxonomic reduction in nexus format
  - EquidsSTR.tnt the MRP matrix after safe taxonomic reduction in tnt format. This file was used to sample most parsimonius metatrees
  - EquidsEXCTaxonomy.tre a newick representation of the taxonomic tree generated from the Paleobiology database Equid Taxonomy
  - EquidsSTRforEXC.txt a text file containing 3 columns: Junior - the safetly removed taxon, Senior - the taxon to which the removed taxon should be joined, and Rule - the rule determining the placement for safely reinserting taxa removed during STR. See the Metatree documentation for details regarding rules for reinsertion. This file is required for building consensus trees
- Metatree_output a folder containing the products of analysis of the MRP matrix:
  - STR_MPTs.tre all unique most parsimonious trees recovered from parsimony analysis of the STR matrix in TNT
  - MPTs.tre all unique most parsimonious trees after reinsertion of STR taxa following the rules in EquidsSTRforEXC.txt
  - SCC.tre the strict consensus tree produced from MPTs.tre
  - MRC.tre the majority rule consensus tree produced from MPTs.tre

BEAST: A folder containing the input files and analysis products from BEAST analysis of the metatree
- Equids2_30_2.xml a BEAST2 xml file containing sequence data, topology constraints, stratigraphic occurence data for fossils, and specification of topology, molecular evolution, molecular clock, and birth-death process priors and models. Can be run directly from the command line or GUI in BEAST 2
- 100_equid_trees.tre 100 chronograms sampled from the combined posterior distribution of trees and used in comparative analyses to explore the impacts of topological and branch length variation
- medequidtree.tre the median chronogram from the posterior sample, selected as the consensus estimate of topology and branch lengths
- run_100mill*` four folders where * indicates 1-4 and corresponds to the output of an independent run of the markov chain monte carlo specified in Equids2_30_2.xml. Each folder contains two files representing the output of that particular run:
  - Equids2_30_2-*.log where * represents the random seed used to start that chain and log indicates the log file for the posterior parameters
  - Equids2_30_2-*.trees where * represents the random seed used to start that chain and trees indicates the nexus file for the posterior sample of chronograms

Picked points: contains landmark data for evolutionary samples of equid. File names trace to specimens IDs in Table 1. Note : “point names” in .pp files correspond to older landmarking scheme, the first point corresponds to landmark 1, second landmark 2… through landmark 18. Landmark numbers and corresponding descriptions can be found in Table S2. File names that include “picked points” are original landmark data, .pp files with abbreviated names are outputs from estimating missing landmark data in equid_skull_script_multivariate_update.R.

**Picked points ontogeny: contains landmark data for ontogenetic sample of Plains zebraFile names trace to specimens IDs in Table S1. Note : “point names” in .pp files correspond to older landmarking scheme, the first point corresponds to landmark 1, second landmark 2… through landmark 18. Landmark numbers and corresponding descriptions can be found in Table S2. File names that include “picked points” are original landmark data, .pp files with abbreviated names are outputs from estimating missing landmark data in ontogeny.R.

hypsodonty_values_used.xlsx: contains data on hypsodonty index and sources for those values. Note this file is not read into R files, but these data exist as a list within the R analysis files. The file contains the following columns:

Species: Equid species in our dataset (note: more species present in this list than represented in final analysis).
HI: Hypsodonty index value (unitless ratio)
Source: Source study that HI value derives from
Note: notes, e.g., if genus average was substituted for unavailable species value, differences in naming, etc.

Retrodeformation: Directory containing files necessary for retrodeformation of Hyracotherium vasacciense PAL 336125.

hyr_landmarks.csv (landmarks [x,y,z coordinates] used in retrodeformation analysis)
hyracotherium_landmarks_curves_strat_export.csv (landmarks including semi-landmarks curves)
slide_attempt.csv and slide_attempt.pp landmarks after semilandmark sliding
retrodefhyr_picked_points.pp landmarks on retrodeformed model

alt_horse_measurements_formatted_radin.csv formatted Radinsky measurements (ready to read in to scripts, CREA_regression_distribution.R and CREA_regressions.R) The file contains the following columns:

Species (the equid species as originally ID’ed in museum )
Specimen_number (Museum accession provided for all except those that have species averages)
ID (file name of picked points file)
y_n (identifier in other scripts)
facial length (facial length measured according to Radinsky 1984) units in mm
braincase length (facial length measured according to Radinsky 1984) units in mm.

**radinsky_horse_measurements.xlsx raw Radinsky measurements taken from screenshots in Geomagic_screenshots folder using ImageJ. This is an intermediate data sheet. Column names and units follow those in alt_horse_measurements_formatted_radin.csv,

**Interlandmark_distance_wavg_withhyr.csv interlandmark distances outputted from interlandmark distances in the R script equid_skull_script_multivariate_update.R, species averages are computed for species with more than two representative specimens. The file contains the following columns:

First unnamed column: name of object in R — corresponds to a unique specimen / picked points object that was read into R.
faceL: interlandmark distance between landmarks 15 and 16, corresponding to facial length as defined by Cardini 2019.
braincaseL: interlandmark distance between landmarks 16 and 17, corresponding to facial length as defined by Cardini 2019.
Rtoothrow: interlandmark distance between landmarks 9 and 11, corresponding to right-side toothrow length
Ltoothrow: interlandmark distance between landmarks 10 and 11, corresponding to left-side toothrow length

Geomagic_screenshots Folder containing screenshots (.png files) of 3D models of skulls which were imported into Image J to take measurements. Names of files correspond to specimens in Table 1, and measurements are recorded in radinsky_horse_measurements.xlsx and alt_horse_measurements_formatted_radin.csv

Heck_2018_2019: this folder contains the following datasets:

Heck_etal_2018a_crania_data: Landmark data from Heck et al., 2018 (Additional File 3)
- The files contains the following columns [for more detailed descriptions see Heck et al., 2018 (Additional File 3) ] see ModHeckCode.R to read these data into R.
  - ID_String: specimen identifier
  - Museum: Museum Identifier
  - Group: H (Horse), D (Donkey), Z (Zebra)
  - Breed: Breed identifier
  - Morphotype: horse morphotype
  - All other columns are coordinate data
Heckplains_zebra_ontogeny_linear_distances.csv: Contains inter-landmark measurement for ontogenetic sample of Plains zebra.
- The file contains the following columns:
  - First unnamed column: name of object in R — corresponds to a unique specimen / picked points object that was read into R.
  - faceL: interlandmark distance between landmarks 15 and 16, corresponding to facial length as defined by Cardini 2019.
  - braincaseL: interlandmark distance between landmarks 16 and 17, corresponding to facial length as defined by Cardini 2019.
  - Rtoothrow: interlandmark distance between landmarks 9 and 11, corresponding to right-side toothrow length
  - Ltoothrow: interlandmark distance between landmarks 10 and 11, corresponding to left-side toothrow length

Note (same information as in Data Availability section): Heck et al., 2019 landmark data available here. Rename file to Heck_etal_2019a_coord_data.txt to read into R scripts.

Data Availability

Heck et al., 2019 landmark data available here. Rename file to Heck_etal_2019a_coord_data.txt to read into R scripts.

Morphological data associated with this studied can be obtained on MorphoSource (Project ID 000715259)

Code / Software:

R Scripts:

Available on Zendodo: https://doi.org/10.5281/zenodo.18635760