Using machine learning to investigate premolar ecomorphology in anthropoid primates
Data files
Sep 30, 2025 version files 130.29 KB
-
batch_ariaDNE.py
1.32 KB
-
permutation_feature_importance.R
2.74 KB
-
phylANOVAs.R
5.47 KB
-
rawdata_premolar_manuscript.csv
116.68 KB
-
README.md
4.08 KB
Abstract
This dataset includes raw linear and topographic measurements taken on the mandibular dentitions of 313 extant anthropoid primates. Measurements include cusp reliefs of mandibular second molars (M2), mandibular fourth premolars (P4), and mandibular honing premolars (P2/P3), positive ariaDNE of M2, P4, and P2/P3, the heights, widths, and breadths of mandibular first incisors (I1), and lengths of mandibular first molars (M1). Demographic data (genus, species, dietary ecology, specimen ID, sex, and institution) is also included for each specimen. This dataset also includes the R scripts for performing phylogenetic ANOVAs and Random Forest permutation feature importance, and the Python script for calculating positive ariaDNE.
Dataset DOI: 10.5061/dryad.xsj3tx9sn
Description of the data and file structure
The rawdata_premolar csv file includes linear and topographic measurements for the mandibular incisors, molars, and premolars of 313 specimens of crown anthropoid.
Linear measurements were taken in GeoMagic Wrap 2024 software on 3D scans of mandibular dentitions and, in the case of incisor linear dimensions, using digital calipers on epoxy casts of mandibular dentitions.
Topographic measurements were calculated for ply files of cropped premolars and molars in PyCharm using the Python batch_ariaDNE.py script.
Phylogenetic ANOVAs were performed for species' averages of scaled premolar data using the R script phylANOVAs.R.
Random Forest permutation feature importance was calculated for the unworn dataset of molar, premolar, and incisor data (i.e., for a subset of specimens for which no averaged dental variable is "NA" or "worn") using the R script RF_feature_importance.R
Files and variables
File: rawdata_premolar_manuscript.csv
Description:
Variables
- Genus: Taxonomic genus of specimen
- Species: Taxonomic species of specimen
- Subspecies: Taxonomic subspecies of specimen
- Binomial: Genus and species name of specimen
- SpecimenID: Museum catalogue number of specimen
- Institution: Museum code indicating institution from which the specimen was sourced
- Sex: Biological sex (male [M] or female [F]) of specimen
- Side: Signifies whether left tooth or right tooth was measured, or the average (Avg) of left and right dental measurements. When a single tooth was measured, the value listed for "Avg" is simply the value for that tooth, for purposes of streamlining our workflow.
- Diet: Dietary category based on literature review (see paper for more information).
- P4 protoconid relief: Vertical height of the protoconid on the distal mandibular premolar (p4) in millimeters (mm)
- P4 metaconid relief: Vertical height of metaconid on the distal mandibular premolar (p4) in millimeters (mm)
- P4 positive ariaDNE: Positive aria Dirichlet Normal Energy of the distal mandibular premolar (p4). This is a dimensionless variable with no units.
- P3 protoconid relief: Vertical height of protoconid on the anterior-most mandibular premolar (p2/p3). This is a dimensionless variable with no units.
- P3 positive ariaDNE: Positive aria Dirichlet Normal Energy of the anterior-most mandibular premolar (p2/p3). This is a dimensionless variable with no units.
- M2 protoconid relief: Vertical height of protoconid on the second mandibular molar (m2) in millimeters (mm)
- M2 positive ariaDNE: Positive aria Dirichlet Normal Energy of the second mandibular molar. This is a dimensionless variable with no units.
- M1 length: Mesiodistal length of the first mandibular molar (m1) in millimeters (mm)
- I1 labiolingual breadth: Labiolingual breadth of the first mandibular incisor (i1) in millimeters (mm)
- I1 height: Crown height of the first mandibular incisor (i1) in millimeters (mm)
- I1 mesiodistal width: Mesiodistal width of the first mandibular incisor (i1) in millimeters (mm)
File: phylANOVAs.R
Description: R script to perform phylogenetic ANOVAs for datasets of species averages.
File: permutation_feature_importance.R
Description: R script to train a Random Forest classification model and check permutation feature importance.
File: batch_ariaDNE.py
Description: Python script to batch calculate positive ariaDNE, negative ariaDNE, and ariaDNE for ply files of 3D mesh objects.
Code/software
RStudio or R is needed, with the caret package, geiger package, and phytools package installed.
A platform for coding in Python such as PyCharm is needed with the signDNE package installed.
Access information
Data was derived from the following sources:
- All dental measurements were taken by the authors; see linked manuscript for comprehensive citations list of dietary literature.
