Skip to main content

A Bayesian extension of phylogenetic generalized least squares (PGLS): incorporating uncertainty in the comparative study of trait relationships and evolutionary rates

Cite this dataset

Fuentes González, Jesualdo Arturo; Polly, P. David; Martins, Emília P. (2019). A Bayesian extension of phylogenetic generalized least squares (PGLS): incorporating uncertainty in the comparative study of trait relationships and evolutionary rates [Dataset]. Dryad.


Phylogenetic comparative methods use tree topology, branch lengths, and models of phenotypic change to take into account non-independence in statistical analysis. However, these methods normally assume that trees and models are known without error. Approaches relying on evolutionary regimes also assume specific distributions of character states across a tree, which often result from ancestral state reconstructions that are subject to uncertainty. Several methods have been proposed to deal with some of these sources of uncertainty, but approaches accounting for all of them are less common. Here we show how Bayesian statistics facilitates this task while relaxing the homogeneous rate assumption of the well-known phylogenetic generalized least squares (PGLS) framework.  This Bayesian formulation allows uncertainty about phylogeny, evolutionary regimes, or other statistical parameters to be taken into account for studies as simple as testing for coevolution in two traits or as complex as testing whether bursts of phenotypic change are associated with evolutionary shifts in inter-trait correlations. A mixture of validation approaches indicate that the approach has good inferential properties and predictive performance. We provide suggestions for implementation and show its usefulness by exploring the coevolution of ankle posture and forefoot proportions in Carnivora.

Usage notes

There are two sets of data, one corresponds to the worked example and the other one to toy simulated data to test the scripts implementing the method described in the paper, which are also provided. Below we provide further details on the three sets: worked example, toy data, and R scripts.

Worked example

The worked example explores limb coevolution in the order Carnivora. Both phenotypic data (CarnData.cvs) and phylogenetic trees (CarnTrees.nex) are provided.

The phenotypic data inlcudes three variables for 102 carnivoran species: lengths for the third metacarpal (MC3L) and third proximal phalanx (fph3p), and posture codification (Digitigrady). The continuous variables (lengths) were obtained from Samuels et al. (2013; J Morphol), and the categorical variable (posture) from Polly (2010; Carnivoran evolution: New views on phylogeny, form, and function). See paper for full references. Polly (2010) proposed the posture variable based on the use of the heel during normal resting stance and during walking locomotion, recognizing three categories: plantigrade (heel in contact with the ground), semidigitigrade (heel may or may not be in contact with the ground during walking locomotion), and digitigrade (heel does not come in contact with the ground). This three-state categorization offers the advantage of recognizing a continuum in posture (with an intermediate state between plantigrady and digitigrady), which partially captures the continuous nature of ankle morphology. However, the intermediate state is ambiguous regarding the use of the heel during walking locomotion, making it little informative for our purposes (which link the coevolutionary pattern of limb morphology with locomotor performance). Therefore, our categorization reduced posture to two character states, clearly differentiating those carnivorans that keep the heel elevated during walking locomotion (digitigrady: 1) from the rest (plantigrady: 0).

The phylogenies correspond to a posterior distribution of 1000 dated trees pruned from the 10kTrees project (Arnold et al. 2010; Evol Anthropol). See paper for reference and further details about the phylogeny. 

Toy data

The files of the toy dataset allow testing the scripts provided in this package (see below). The toy simulated data (Data.csv) emulates the variable system of the worked example (CarnData.csv) and was generated with the R packages phytools and geiger. It is accompanied by a sample of 10 trees with 5 stochastic character mappings reconstructed for each of them (Maps.tre), which can be generated with programs such as SIMMAP or phytools.

R scripts

The main script implements the mixed model described in the paper (MM.R), but scripts implementing specific models of phylogenetic relatedness are also provided: Lambda model (LM.R), Brownian motion (BM.R), and non-phylogenetic analysis (IID.R). Each file allows accounting for both equal and unequal rates, and currently calls the toy data files (Data.csv and Maps.tre) as inputs (thus, the toy data files can be used as templates to prepare new files to be analyzed with the scripts). 

The scripts modify codes from de Villemereuil et al. (2012: BMC Evol Biol) and Kruschke (2011; Doing Bayesian Data Analysis: A Tutorial with R and BUGS), and require prior installation of the program JAGS as well as the R packages rjags, ape, and phytools (See paper for full references). Detailed instructions on installing JAGS can be found at the website of Kruschke (2011). This resource also offers detailed explanations (with codes) on processing the MCMC outputs, which can be helpful as the scripts provided here incorporate very basic descriptions of the posterior distributions (based on posterior means, medians, and credible intervals).


National Science Foundation, Award: IOS-1257562

National Science Foundation, Award: EAR-1338298

National Science Foundation, Award: EAR-0843935

Colciencias, Award: Becas Caldas 497-2009