Skip to main content

Data from: R2s for correlated data: phylogenetic models, LMMs, and GLMMs

Cite this dataset

Ives, Anthony R. (2018). Data from: R2s for correlated data: phylogenetic models, LMMs, and GLMMs [Dataset]. Dryad.


Many researchers want to report an R2 to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an R2 faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the R2 to include the variance explained by the covariances by asking questions such as “How much of the data is explained by phylogeny?” Here, I investigate three R2s for phylogenetic and mixed models. R2resid is an extension of the ordinary least-squares R2 that weights residuals by variances and covariances estimated by the model; it is closely related to R2glmm presented by Nakagawa and Schielzeth (2013). R2pred is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. R2lik is based on the likelihood of fitted models and therefore reflects the amount of information that the models contain. These three R2s are formulated as partial R2s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial R2s compare a full model with a reduced model without components of the full model, they are distinct from marginal R2s that partition additive components of the variance. The properties of the R2s for phylogenetic models were assessed using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the R2s are designed broadly for any model for correlated data, the R2s were also compared for LMMs and GLMMs. R2resid, R2pred, and R2lik all have similar performance in describing the variance explained by different components of models. However, R2pred gives the most direct answer to the question of how much variance in the data is explained by a model. R2resid is most appropriate for comparing models fit to different datasets, because it does not depend on sample sizes. And R2lik is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.

Usage notes


National Science Foundation, Award: DEB-LTREB-1052160 and DEB-1240804