Skip to main content
Dryad

Data from: Learning to see the wood for the trees: machine learning, decision trees and the classification of isolated theropod teeth

Data files

Sep 14, 2020 version files 14.39 KB

Abstract

Taxonomic identification of fossils based on morphometric data traditionally relies on the use of standard linear models to classify such data. Machine learning and decision trees offer powerful alternative approaches to this problem but are not widely used in palaeontology. Here, we apply these techniques to published morphometric data of isolated theropod teeth in order to explore their utility in tackling taxonomic problems. We chose two published datasets consisting of 886 teeth from 14 taxa and 3020 teeth from 17 taxa, respectively, each with five morphometric variables per tooth. We also explored the effects that missing data have on the final classification accuracy. Our results suggest that machine learning and decision trees yield superior classification results over a wide range of data permutations, with decision trees achieving accuracies of 96% in classifying test data in some cases. Missing data or attempts to generate synthetic data to overcome missing data seriously degrade all classifiers predictive accuracy. The results of our analyses also indicate that using ensemble classifiers combining different classification techniques and the examination of posterior probabilities is a useful aid in checking final class assignments. The application of such techniques to isolated theropod teeth demonstrate that simple morphometric data can be used to yield statistically robust taxonomic classifications and that lower classification accuracy is more likely to reflect preservational limitations of the data or poor application of the methods.