Code from: Machine learning can accurately assign fossil and extant species to crown toxicoferan (Reptilia: Squamata) groups using inner ear shape data
Data files
Mar 18, 2026 version files 58.27 KB
-
MachineLearningInnerEarDryad.zip
51.84 KB
-
README.md
6.44 KB
Abstract
Because the inner ear is involved in gaze stabilization, balance, and hearing, fossil inner ear endocast morphology has been used to infer the palaeoecology of extinct species. These results have been used to inform major evolutionary transitions, including the ecological origin of snakes. However, prior studies found only modest correlations between inner ear shape and ecological traits, and did not apply machine learning approaches, which could potentially reveal greater predictive relationships between inner ear morphology and ecology. Here, we combine three-dimensional geometric morphometrics with machine learning to evaluate the performance of inner ear morphology as a predictor of habitat use and phylogenetic affinities across a broad sample of toxicoferans (snakes, anguimorphs, and iguanians) representing 73 extant species and 4 fossil species. We find a weak correlation between habitat and inner ear morphology, but machine learning models cannot accurately predict habitat preference in extant species (44% accuracy). In contrast, we find a strongly predictive relationship (95% accuracy) between inner ear shape and higher-order classification. Our results demonstrate that the inner ear shape data we measured strongly predict evolutionary classifications rather than habitat use in crown toxicoferan squamates. We conclude that machine learning provides a versatile analytical approach to the reconstruction of palaeobiology.
https://doi.org/10.5061/dryad.2fqz612zp
This is the readme file for the supplementary data from "Machine learning with toxicoferan (Squamata) inner ear shape data robustly predicts higher-order evolutionary relationships, but not habitat use" by Forcellati MR, Napoli JG, Meyer D, Watanabe A, Benson RBJ, Raxworthy CJ (2026). Zoological Journal of the Linnean Society. 206(3).
Paper doi: https://doi.org/10.1093/zoolinnean/zlaf188
All code here, besides the Davesne et al. 2021 code and Püschel et al. 2018, 2020 code cited, was authored by Meghan R Forcellati (mforcellati@amnh.org).
Davesne, D., Friedman, M., Schmitt, A. D., Fernandez, V., Carnevale, G., Ahlberg, P. E., & Benson, R. B. (2021). Fossilized cell structures identify an ancient origin for the teleost whole-genome duplication. Proceedings of the National Academy of Sciences, 118(30), e2101780118.
Püschel Thomas A., Marcé-Nogué Jordi, Gladman Justin T., Bobe René, & Sellers William I. (2018). Inferring locomotor behaviours in Miocene New World monkeys using finite element analysis, geometric morphometrics, and machine-learning classification techniques applied to talar morphology. Journal of The Royal Society Interface, 15(146), 20180520.
Püschel, T. A., Marcé-Nogué, J., Gladman, J., Patel, B. A., Almécija, S., & Sellers, W. I. (2020). Getting Its Feet on the Ground: Elucidating Paralouatta’s Semi-Terrestriality Using the Virtual Morpho-Functional Toolbox. Frontiers in Earth Science, 8.
See: https://www.thomaspuschel.com/post/decision_boundary_plot2/
Please note that the data here are derived from MorphoSource specimens subject to copyright and under fair usage, non-commercial licenses, as described in the main text and the supplementary file, Supplementary Data S1, attached for your convenience. The other supplementary tables are with the main manuscript text.
Contact: Meghan R Forcellati, mforcellati@amnh.org
Secondary Contact: Christopher J Raxworthy, rax@amnh.org
Tertiary Contact: Roger Benson, rbenson@amnh.org
I am particularly interested in whether you run into issues reproducing any of my code. Please reach out if you need help getting it to run.
All necessary Rdata and pts files are uploaded to Zenodo (due to different licensing-non-CC0 files): https://zenodo.org/records/19008448.
Description of the data and file structure
File: MachineLearningInnerEarDryad.zip
Code - This folder includes analysis code. * indicates a script that outputs a data file, which can be loaded by other scripts downstream of the analyses to save runtime.
- Data_Cleaning_QC.txt - This includes all data preprocessing, such as calculating sampling error, mirroring right-only specimens so we only have left inner ears, taking mean shapes of effectively bilaterally symmetrical structures, generating training and test sets (including residual allometry ones), and generating trees. *
- Simulations.txt - This was how training sets were generated for phylogenetic residuals in machine learning and how 1000 potential branch length topologies were generated for inferential analyses. *
- Inference.txt - Includes phylogenetic and OLS regressions, classic physignal (NOT updated physignal), and data partitioning into cochlear and vestibular landmarks for analyses. Many of these are commented out for runtime. Also has initialization scripts for the new Physignal Analysis*
- Ecology_Validation_Machine_Learning.txt - For models predicting habitat, loads data from Data_Cleaning_QC.txt and Simulations.txt to perform statistical learning validation on our dataset. Outputs training models to be used on the final test set. *
- Evolution_Validation_Machine_Learning.txt - For models predicting evolutionary classification, loads data from Data_Cleaning_QC.txt to perform statistical learning validation on our dataset. Outputs training models to be used on the final test set. *
- Predict_Best_Model_ML.txt - Loads validation models and calculates final accuracy on the holdout test set, as well as makes projections onto fossil species for both evolutionary classification and habitat. Generates plots assessing the accuracy of different models. Outputs results into .csv files stored in the Results directory of accuracies and probabilities, which were used to generate main text figures. Also generates a Decision-Boundary Morphospace, courtesy of code modified from Püschel et al. 2018, 2020.
- treeFigureGeneration.txt - How we generated the main text figure visualizing a phylogeny, which was then modified in Adobe Acrobat and Microsoft PowerPoint for graphical illustration.
- Supplement_Physignal folder containing NewK.R: This script measures phylogenetic signal (how much related species resemble each other) in multivariate shape data across 1000 evolutionary trees using an eigenvalue-based version of Blomberg’s K. It compares observed signal (traceK, detK, Kmult, eigenvalues) to randomized data to test significance. Finally, it summarizes results and visualizes which trait dimensions show a stronger evolutionary signal.
Access information
Other publicly accessible locations of the data:
- URL to MorphoSource: https://www.morphosource.org/projects/000612487
A full list of citations is included in the main text supplement, and in SupplementaryDataS1_AcknowledgmentsReferencesSpecimens.csv
Other data was derived from the following sources:
- https://www.morphosource.org; Yi, H., & Norell, M. A. (2015). The burrowing origin of modern snakes. Science advances, 1(10), e1500743; Zheng, Y., & Wiens, J. J. (2016). Combining phylogenomic and supermatrix approaches, and a time-calibrated phylogeny for squamate reptiles (lizards and snakes) based on 52 genes and 4162 species. Molecular phylogenetics and evolution, 94, 537-547.
