Data from: Optimizing phylogenetic eigenvector regression: Union eigenvectors, robust estimation, and flexible application to comparative analyses
Data files
Mar 28, 2026 version files 51.11 MB
-
Main_code.zip
48.29 KB
-
README.md
4.31 KB
-
Spatial_methods.zip
6.38 KB
-
Supplementary_Tables.xlsx
327.05 KB
-
Table_a1.xlsx
25.24 MB
-
Table_a2.xlsx
23.80 MB
-
Table_b_to_k.xlsx
1.69 MB
Abstract
Phylogenetic eigenvector regression (PVR) is widely used in ecology and evolution by representing phylogenetic structure through separable eigenvectors. Despite this flexibility, its implementation faces three key challenges: (1) the selection of eigenvectors, (2) the reduced robustness of ordinary least-squares (OLS) regression under shift-like evolutionary heterogeneity, and (3) the applicability of conventional model complexity rules such as the "samples-per-variable (SPV) ≥ 10" guideline. Here, we propose an optimized PVR framework that addresses these limitations. First, we show that trait-specific selections of eigenvectors often diverge, sometimes producing inconsistent results, and that using their union offers stronger control of phylogenetic non-independence. Second, we evaluate robust regression estimators within PVR, demonstrating that PVR-MM – and in most cases PVR-L2, the standard OLS estimator – maintains high accuracy under non-stationary evolutionary shifts where other non-robust methods fail. Third, through simulation, we reassess the SPV ≥ 10 rule, showing that PVR tolerates eigenvector counts well beyond this threshold, offering greater flexibility while requiring attention to potential overfitting. Extensive simulations across diverse trees and evolutionary scenarios confirm that the optimized framework improves accuracy and robustness. By addressing key aspects of eigenvector selection, regression, and model complexity, our findings strengthen the reliability and applicability of PVR.
Dataset DOI: 10.5061/dryad.4tmpg4frg
Description of the data and file structure
The data used to plot the main text figures and supplementary figures
Files and variables
File: Supplementary_Tables.xlsx
Description: The data used to plot Figures 2 - 5 (corresponding to Table S1, S4, S5, and S6), and Table 1,2 (corresponding to Table S2 and S3), and performance of spatial methods (Table S7)
Variables
- Table S1:
group- the eigenvector adding case;scenario- the traits simulation scenario; digital10^{-4}0.011416642561024- the magnitude of noise term variance and shift - Table S2: same as Table S1
- Table S3: same as Table S1
- Table S4:
diff_r- absolute differences in partial correlation coefficients;scenario- the traits simulation scenario ;method- the EV selection method - Table S5:
group- the EV set used in OLS or residuals, other columns are the same as Table S4 and Table S1 - Table S6:
Method- the PCM methods, other columns are the same as Table S5 - Table S7:
Models- the spatial statistical methods
File: Table_b_to_k.xlsx
Description: The data used to plot Figure S2 - Figure S11 (corresponding to Table b - k)
Variables
- Table b - the same as Table S4
- Table c-k - the same as Table S6
File: Table_a1.xlsx
Description: The data used to plot Figure 1
Variables
EV_number- the number of EVs;group- the traits simulation scenarios;method- the EV selection method;label- the EV set
File: Table_a2.xlsx
Description: The data used to plot Figure S1
Variables
- The same as Table_a1.xlsx
File: Main_code.zip
Description: There are four subdirectories in the Main_code directory: TreeConstruction, TraitsSimulation, RegressionModelFitting, and StatisticalAnalysis.
TreeConstruction contains five scripts:
- 128balanced_rho_1.R is used to construct a balanced tree with Grafen's rho = 1, (128 species).
- 128balanced_rho_01.R is used to construct a balanced tree with Grafen's rho = 0.1, (128 species).
- 128balanced_rho_2.R is used to construct a balanced tree with Grafen's rho = 2, (128 species).
- 128ladder_rho_1.R is used to construct a ladder-like tree with Grafen's rho = 1, (128 species).
- 16balanced_rho_1.R is used to construct a balanced tree with Grafen's rho = 1, (16 species).
TraitsSimulation contains six scripts, each of which corresponds to the traits simulation scenario.
The scripts in the RegressionModelFitting directory are used for regression model fit:
- PGLS_RobustEstimator.R is for PGLS, PIC, and PIC-MM
- PVR_ResidualAnalysis.R is for correlation test of residual of X1 and X2 when controling phylogenetic autocorrelation seperately.
- PVR_RobustEstimator.R is for PVR_union with various estimators
- PVR_SingleEV.R is for the original PVR considering the eigenvector of a single variable(X1 or X2).
The scripts in the StatisticalAnalysis directory are used for regression model fit:
- Table01_EV_XYYXUnion.R is used to count the number of selected eigenvectors
- Table01_EVDiff.R is used to count the number of divergent eigenvector sets
- Table02_conflictingnumber_kappa is used to test the divergent correlation results
- Table03_Cor_diff.R is used to get the differences of correlation coefficients
- Table05_Residual_Pearson is used to get the results of the residual analysis
- Table06_PGLS_best.R is used to get the best PGLS results
- Table07_PGLS_Rob_Spearman is used to get the PGLS/PIC, and with robust estimators
- Table08_PVR_Rob_Spearman.R is used to get the PVR union, and with robust estimators
File: Spatial_methods.zip
Description: There are three files in the Spatial_methods directory: simulation.R, analysis.R, summary.R.
- simulation.R: used to simulate spatial data
- analysis.R: used to apply the spatial statistical method on the simulation data
- summary.R: used to summarize the results of the spatial statistical method
