Supplementary tables for: Dependent variable selection in phylogenetic generalized least squares regression analysis under Pagel’s lambda model
Data files
Jun 15, 2023 version files 6.71 MB
Abstract
Phylogenetic generalized least squares (PGLS) regression is widely used to detect evolutionary correlations. In contrast to the equal treatment of analyzed traits in conventional correlation methods such as Pearson and Spearman's rank tests, we must designate one trait as the independent variable and the other as the dependent variable. However, in our PGLS regression analyses (using Pagel’s λ model) of both empirical and simulated datasets, switching independent and dependent variables yielded many conflicting results. A serious problem with PGLS regression that has not been noticed before is that selecting an inappropriate trait as the dependent variable will often result in an error. To assess correlations in simulated data, we established a gold standard by analyzing changes in traits along phylogenetic branches. Next, we tested seven potential criteria for dependent variable selection: log-likelihood, Akaike information criterion, R2, p-value, Pagel’s λ, Blomberg et al.’s K, and the estimated λ in Pagel’s λ model. We determined that the last three criteria performed equally well in selecting the dependent variable and were superior to the other four. For practicality, we suggest using the trait with a higher λ or K value as the dependent variable in future PGLS regressions. In analyzing the evolutionary relationship between two traits, we should designate the trait with a stronger phylogenetic signal as the dependent variable even if it could logically assume the cause in the relationship.
Methods
Using different models and variances, we conducted 16,000 simulations of the evolution of two traits (X1 and X2) along a binary tree with 100 terminal nodes.
Supplementary Table S1 includes all the PGLS regression analysis results, the gold standard for the correlation of each simulation established by analyzing changes in traits along phylogenetic branches, and the values of log-likelihood, Akaike information criterion, R2, p-value, Pagel’s λ, Blomberg et al.’s K, and the estimated λ in Pagel’s λ model of each simulated trait or each PGLS regression.
Supplementary Table S2 shows the differences in the performance of the models selected by seven criteria observed in the analysis of all 16,000 simulations.