Skip to main content
Dryad

Predicting amphibian intraspecific diversity with machine learning: Challenges and prospects for integrating traits, geography, and genetic data

Data files

Nov 12, 2020 version files 65.44 MB

Abstract

The growing availability of genetic datasets, in combination with machine learning frameworks, offer great potential to answer long-standing questions in ecology and evolution. One such question has intrigued population geneticists, biogeographers, and conservation biologists: What factors determine intraspecific genetic diversity? This question is challenging to answer because many factors may influence genetic variation, including life history traits, historical influences, and geography, and the relative importance of these factors varies across taxonomic and geographic scales. Furthermore, interpreting the influence of numerous, potentially correlated variables is difficult with traditional statistical approaches. To address these challenges, we analyzed repurposed data using machine learning and investigated predictors of genetic diversity, focusing on Nearctic amphibians as a case study. We aggregated species traits, range characteristics, and >42,000 genetic sequences for 299 species using open-access scripts and various databases. After identifying important predictors of nucleotide diversity with random forest regression, we conducted follow-up analyses to examine the roles of phylogenetic history, geography, and demographic processes on intraspecific diversity. Although life history traits were not important predictors for this dataset, we found significant phylogenetic signal in genetic diversity within amphibians. We also found that salamander species at northern latitudes contain lower genetic diversity. Data repurposing and machine learning provide valuable tools for detecting patterns with relevance for conservation, but concerted efforts are needed to compile meaningful datasets with greater utility for understanding global biodiversity.