Skip to main content
Dryad

AI and paleontology: Effects of vertebrate fossil sample size on machine learning image classification

Abstract

With the growing application of artificial intelligence (AI) and machine learning (ML), great potential exists to leverage these technologies in paleontology. Relative to many other scientific fields, a challenge of ML applied to paleontology is small sample sizes, particularly for fossil vertebrates. Shark teeth, abundant in the fossil record, provide a model system to use ML across varying sample sizes. Here we use six classes (taxa) of Neogene shark teeth for taxonomic identification, including a curated dataset of 3150 images. Each class was evaluated using an 80% training and 20% validation split, with a separate, external test set of 25 samples per class. Pretrained models perform well (accuracy > 90%), providing a strong baseline for classification. However, enabling fine-tuning of the ML model to identify fossil shark teeth improves performance considerably. Likewise, sample size per class also affects the accuracy of the models’ classifications. Smaller sample sizes (n = 50 individuals per class) yielded a mean accuracy of 93.4%, but plateaued at ~99% between 200 and 500 images per class. Confidence likewise increases with larger samples, from 81.8% (n = 50 individuals per class) to >90% (n = 300 to 500 individuals per class). Misidentifications followed consistent patterns, reflecting morphological similarities and/or poor preservation. Artificially increasing the training datasets using data augmentation improves the confidence of identifications. This research indicates that relatively small samples of vertebrate species (~50 to 500 individuals per class) can effectively train an ML model to identify these shark teeth with high levels of accuracy.