Modeling pulsed evolution and time-independent variation improves the confidence level of ancestral and hidden state predictions
Gao, Yingnan; Wu, Martin (2022), Modeling pulsed evolution and time-independent variation improves the confidence level of ancestral and hidden state predictions, Dryad, Dataset, https://doi.org/10.5061/dryad.kkwh70s4b
Ancestral state reconstruction is not only a fundamental tool for studying trait evolution, but also very useful for predicting the unknown trait values (hidden states) of extant species. A well-known problem in ancestral and hidden state predictions is that the uncertainty associated with predictions can be so large that predictions themselves are of little use. Therefore, for meaningful interpretation of predicted traits and hypothesis testing, it is prudent to accurately assess the uncertainty of the predictions. Commonly used constant-rate Brownian motion (BM) model fails to capture the complexity of tempo and mode of trait evolution in nature, making predictions under the BM model vulnerable to lack-of-fit errors from model misspecification. Using empirical data (mammalian body size and bacterial genome size), we show that the distribution of residual Z-scores under the BM model is neither homoscedastic nor normal as expected. Consequently, the 95% confidence intervals (CIs) of predicted traits are so unreliable that the actual coverage probability ranges from 33% (strongly permissive) to 100% (strongly conservative). Alternative methods such as BayesTraits and StableTraits that allow variable rates in evolution improve the predictions but are computationally expensive. Here we develop RasperGade, a method of ancestral and hidden state prediction that uses the Levy process to explicitly model gradual evolution, pulsed evolution and time-independent variation. Using the same empirical data, we show that RasperGade outperforms both BayesTraits and StableTraits and is orders-of-magnitude faster. Our results suggest that, when predicting the ancestral and hidden states of continuous traits, the tempo and mode of evolution should always be assessed and the quality of confidence estimates should always be examined.