Modelling trait heterogeneity and inferring causal links in the macroevolution of growth habit in eudicot angiosperms
Data files
Jan 02, 2026 version files 3.29 GB
-
Rawfiles_Jan1_2026.zip
3.29 GB
-
README.md
16.56 KB
Abstract
Phylogenetic comparative methods (PCMs) help researchers understand and predict trait evolutionary relationships. While improvements to PCMs have focused on increasing model complexity, understanding processes remains difficult due to persistent challenges in grounding complex models in biological reality and synthesizing findings across multiple analyses. We examined the evolution of growth habit in eudicots (75% of all angiosperms) and tested how variables such as vessel diameter, leaf phenology, and minimum temperature influence macroevolutionary inference. We used a series of PCMs to synthesize our understanding of trait interrelationships, explored plausible causal relationships using phylogenetic path analysis, and employed phylogenetic cross-validation to assess predictive performance among taxa. We found that discrete coding of growth form was linked to other measured and unmeasured traits, and that these interrelationships can help overcome limitations arising from incomplete data and simplistic coding of complex traits. Analysis of growth form using phylogenetic path analysis helps reconcile competing views of trait interrelationships from previous studies. Furthermore, including identified covariates improves prediction of growth habit and other traits. Our study shows that incorporating causal structure improves macroevolutionary inference, identifies when analyses that omit key causal traits become unreliable, and underscores the importance of integrating phylogenetic models with natural-history knowledge.
https://doi.org/10.5061/dryad.dfn2z35cg
Data Description: Rawfiles_Jan1_2026.zip
This dataset contains phylogenetic, trait, and environmental data used to investigate the evolution of woodiness in eudicot plants. The data were collected to examine the relationships between plant growth form (woody vs. herbaceous), vessel anatomy, plant height, climate variables, and leaf phenology across the eudicot clade. The data files and analytical outputs are organized in directories corresponding to specific figures in the manuscript (Figure_1_2_S1, Figure_3, Fig_4, Figure_5_S3_S4, Figure_S5, Figure_6). Each directory contains the input data files and output files required to generate the corresponding figure, allowing for reproducible analysis of each component of the study.
Missing values (NA): Across all data files and analytical outputs, NA denotes missing data arising from unavailable measurements in the original source databases or failed matching after taxonomic standardization.
I. DATA
1. Phylogenetic Data:
A comprehensive phylogeny of eudicot plants extracted from Smith and Brown (2018)
File name: GBMB.tre
2. Morphological & Anatomical Data:
-
Plant growth form data (woody vs. herbaceous) from the Global Woodiness Database (Cornwell et al., 2013)
File name: GlobalWoodinessDatabase_Zanne_2014.txtColumns used:
- gs: species name (character; spaces converted to underscores to match phylogeny tip labels)
- woodiness: growth form category (W = woody, H = herbaceous, variable)
-
Vessel diameter (µm) measurements from woody species from the Global Vessel Anatomy Database (Zanne et al., 2010)
File name: CombinedGFConduit.csv -
Vessel diameter (µm) measurements from herbaceous species from The Xylem Database (Schweingruber and Landolt, 2005)
File name: herb_conduit_mapped.csv -
Derived woody taxa identified by Zizka et al. (2022).
File name: Lens_derived_woods (dataset in RDS format) or global_derived_woody_species_list.csv (csv format) -
Maximum plant height (m) measurements from the TRY Plant Trait Database (Kattge et al., 2011)
File name: 6799.txtVariables used:
• SpeciesName: original species name in the TRY database
• AccSpeciesName: accepted species name
• OrigValueStr: reported plant height value
• OrigUnitStr: original unit of measurement (cm, m, mm, or feet)
Processing notes: Only records for the trait “Plant height vegetative” were retained. Non-numeric values and non-positive measurements were removed. Plant height values were converted to meters (m) from centimeters, millimeters, or feet where necessary. Species names were standardized to match phylogeny tip labels (spaces replaced by underscores; subspecies and varietal epithets removed). For each species, the maximum recorded height was retained for analyses.
-
Leaf phenology data from species trait data (Zanne et al., 2018)
File name: speciesTraitDataAEZ3.csvColumns used:
• gs: species name (character; spaces converted to underscores to match phylogeny tip labels)
• phenology: leaf phenology category (EV = evergreen, D = deciduous, D_EV = mixed)
Processing notes: Species coded as D_EV (mixed deciduous–evergreen) were excluded. The final dataset contains two columns: species and phenology.
3. Environmental Data:
Climate data (minimum temperature and precipitation) from species distribution summaries (Zanne et al., 2018)
File name:species_summaries_all.csv
Columns used:
- species (column 1): species name (character; spaces converted to underscores to match phylogeny tip labels)
- tmin.025: minimum temperature (2.5th percentile) from species range summaries
- pmin.025: minimum precipitation (2.5th percentile) from species range summaries
Processing notes:
- `logLowTmp
was calculated by converting minimum temperature (tmin.025`) to Kelvin and applying a natural log transformation. - Precipitation values were retained from
pmin.025; zero values were replaced with 0.001 prior to log transformation to generatelog_PP. - Temperature and precipitation tables were filtered to remove missing values and matched to the angiosperm phylogeny using
make.treedata(). - Species with missing temperature or precipitation values were excluded at this stage, resulting in NA-free derived climate variables used in downstream analyses.
II. ANALYTICAL OUTPUTS
- Hidden Markov models to identify fast and slow evolutionary regimes and transition rate estimates (using R packages corHMM)
- Stochastic character mapping to reconstruct ancestral states (using R packages phytools, corHMM)
- Bayesian MCMC methods to estimate trait correlations while accounting for phylogeny under multivariate phylogenetic threshold model results (using R package MCMCglmmRAM)
- Phylogenetic path analysis to test hypothesized causal relationships between traits and environmental variables (using R package phylopath)
- Cross-validation predictions of growth form (using R packages castor, MCMCglmmRAM)
- Statistical analyses and data processing were primarily conducted in R with additional packages including treeplyr, ape, geiger, dplyr, taxonlookup, doParallel, ratematrix, rethinking, ggplot2, ggpubr, and data.table
Figure_1_2 (associated script: Figure_1_2_S2.R)
MCMCglmmRAM ouput for habit trait liabilities without covariances.
File name: predict_habit_may2020 (R object saved in RDS format)
Derived wood inferences: 100 stochastic character maps of woody/herbaceous evolution across eudicot phylogeny with ancestral vs. derived woodiness classification for each tip taxon.
File names: Sec_wood_infer_2states.RDS and Sec_wood_infer_2states_71_100.RDS or Dec3_2019_twostates_witout.RDS (SIMMAP based on MK2 - 2 states, R object saved in RDS format)
File names: Sec_wood_infer_4states_1_50recons.RDS and Sec_wood_infer_4states_recoded_2_51_100recons.RDS or all 100 runs simmaps_2019_fourstates_Dec4_b (SIMMAP based on CorHMM - 4 states, R object saved in RDS format)
Derived woody taxa identified by Zizka et al. (2022).
File name: Lens_derived_woods (R object saved in RDS format)
A pruned and merged dataset containing species-level trait data (woodiness, vessel size, minimum temperature, plant height, precipitation, and leaf phenology) matched to the phylogeny, with species names as row names and a factor column for phylogenetic analysis.
File name: new_mydata (R object saved in RDS format)
MLEs (Maximum Likelihood Estimates) of transition rates inferred using R package corHMM
File name: marginal_root_0100pp_corhmm_v_1.2 or marginal_root_0100pp (CorHMM - 4 states, R object saved in RDS format)
File name: marginal_root_01pp_b (MK2 - 2 states, R object saved in RDS format)
The most likely state (herbaceous or woody) at each tip of the phylogeny inferred using R package corHMM
File name:corHMM_result (R object saved in RDS format)
Figure_3 (associated script: Figure_3.R)
An MCMCglmmRAM model object containing a multivariate phylogenetic mixed-effects model with 6 traits: log vessel diameter (log_VD), log plant height (log_SL), log minimum temperature (logLowTmp), temperature-phenology interaction (phe_tmp), leaf phenology (threshold trait), and growth habit (threshold trait).
File name: VD_SL_phe_logLowTmp_habit (R object saved in RDS format)
A species-level comparative trait dataset for phylogenetic threshold modeling containing log-transformed continuous traits (vessel diameter, plant height, minimum temperature, phenology and precipitation) and recoded binary categorical traits (leaf phenology and growth habit).
File name: new_mydata_twoPPtypes.RDS (R object saved in RDS format)
Figure_4 (associated script: Figure_4.R)
A list of 100 stochastic character maps (simmap objects) showing the evolutionary history of growth habit across the phylogeny under a 4-state hidden rates model, with the root fixed as "woody".
File name: simmaps_Nov19_2019.RDS (R object saved in RDS format)
BAYOU analysis results: vessel diameter optima estimated under multi-regime OU models for slow woody, fast woody, and herbaceous (imputed from missing data) regimes across 100 stochastic character map reconstructions using woody-lineage-only data.
File name: marginal_root_0100pp (R object saved in RDS format)
Trait data from Olson et al., 2018 (evaluated for methodological comparison but excluded from final analyses)
File name: pnas.csv
Figure_5_S3_S4 (associated script: Figure_5_S3_S4.R)
Model-averaged phylogenetic path coefficients for 4 or 5 variables across 1,800 habit liability posteriors (MCMCglmmRAM), averaged over best models (ΔCICc < 2) per iteration
File name: coefs_habit_reduced_5para_tmpxphen_allsampled_Jun24_2022_average_models or coefs_habit_reduced_4para_Jun22_2022_average_model or coefs_habit_reduced_5para_phen_allsampled_Jun25_2022_average_models (R object saved in RDS format, 5 variables)
File name: coefs_habit_reduced_4para_Jun21_2022_best_model (R object saved in RDS format, 4 variables)
Phylogenetic path analysis input containing candidate causal models (DAGs) among variables (minimum temperature, vessel diameter, plant height, habit threshold liabilities, phenology, temperature × phenology interaction)
File name: Nov4_5model.R (606 plausible models/DAGs among minimum temperature, vessel diameter, phenology, temperature × phenology interaction)
File name: Nov_6_6.R (606 plausible models/DAGs among minimum temperature, vessel diameter, plant height, habit threshold liabilities, phenology, temperature × phenology interaction)
File name: models_SLhabitVDTmp.R (137 candidate models/DAGs among vessel diameter, habit threshold liabilities, habit, temperature )
File name: model_5by5b.R (7337 possible models/DAGs among phenology, temperature, phenology X temperature interaction, vessel diameter, plant height)
File name: model_5by5c_habit.R (7337 possible models/DAGs among phenology, temperature, habit threshold liabilities, vessel diameter, plant height)
File name: result5_3338models (R object saved in RDS format containing possible 3,338 models/DAGS among phenology, temperature, phenology X temperature interaction, vessel diameter, plant height, phylogeny, trait matrix)
Figure_6 (associated script: Figure_6.R)
corHMM marginal tip state probabilities for fast herbaceous and fast woody taxa
File name: Fast_herbaceous_corHMM (R object saved in RDS format)
File name: Fast_Woody_corHMM (R object saved in RDS format)
Growth habit prediction using threshold model using MCMCglmmRAM. Each tip iteratively treated as missing; posterior liabilities from multivariate threshold models (habit ~ vessel diameter + plant height + minimum temperature) used to calculate probability woody (% samples > threshold)
File name: LiabilityCrossValidation_Habit-VD+SL_1_100.rds and LiabilityCrossValidation_Habit-VD+SL_101_end.rds (R object saved in RDS format)
Growth habit prediction under Mk model using Castor (Louca & Doebeli, 2018). Each tip iteratively treated as missing and reconstructed using maximum likelihood.
File name: mk_estimates (R object saved in RDS format)
Cross-validation results for 307 mispredicted taxa under threshold models with various covariate sets (R object saved in RDS format)
File name: VD_SL_logLowTmp_habit (habit ~ vessel diameter + plant height + minimum temperature, R object saved in RDS format)
File name: VD_SL_logLowTmp_PP025_habit (habit ~ vessel diameter + plant height + minimum temperature + minimum precipitation, R object saved in RDS format)
File name: VD_SL_phe_logLowTmp_habit (habit ~ vessel diameter + plant height + phenology + minimum temperature, R object saved in RDS format)
File name: predict_habit_may2020 (habit only (no covariates, R object saved in RDS format)
File name: predict_SL_habit_may2020 (habit ~ plant height, R object saved in RDS format)
File name: predict_VD_habit_may2020 (habit ~ vessel diameter, R object saved in RDS format)
File name: predict_VD_SL_habit_may2020 and predict_VD_SL_habit_may2020_2 (habit ~ vessel diameter + plant height, R object saved in RDS format)
35 miscoded taxa identified via threshold model cross-validation and verified through manual reevaluation; originally had erroneous growth habit assignments in trait database
File name: miscoded_thresholdpredciton_Anna_verfied_highcon.rds (high-confidence mispredictions, R object saved in RDS format)
File name: miscoded_thresholdpredciton_Anna_verfied_boundaryzone.rds (mispredictions in boundary zones, R object saved in RDS format)
File name: miscoded_thresholdpredciton_Anna_verfied.rds (all mispredictions, R object saved in RDS format)
Other intermediate files listed in the script Figure_6.R
match1_ARD (R object saved in RDS format)match1_ER (R object saved in RDS format)mismatch1_ARD (R object saved in RDS format)mismatch1_ER (R object saved in RDS format)mk_thres_estimates_full (R object saved in RDS format)mk_threshold_mismatch (R object saved in RDS format)new_mydata (R object saved in RDS format)new_mydata_twoPPtypes.RDS (R object saved in RDS format)phylogenyplant_trait_data (R object saved in RDS format)result5a (R object saved in RDS format)threshold123457.RDS (R object saved in RDS format)thresholds12345.RDS (R object saved in RDS format)
Figure_S5 (associated script: Figure_S5.R)
MCMCglmmRAM cross-validation results from five independent replicates, each based on 307 randomly selected taxa, evaluated under various phylogenetic threshold models with different covariate combinations.
Associated data files (5 replicates/independent runs, R object saved in RDS format):
File name: taxa_remove*.rds (Lists of 307 taxa randomly excluded to generate each cross-validation per replicate, total 5 replicates, R object saved in RDS format)
File name: data_pre*.rds (Preprocessed datasets used to various threshold models in each replicate, total 5 replicates, R object saved in RDS format)
File name: model_support*.rds (Posterior support (prediction probabilities) for all threshold models in each cross-validation replicate, total 5 replicates, R object saved in RDS format)
References
Cornwell, W. K., R. G. FitzJohn, P. F. Stevens, A. Calaminus, D. E. Soltis, P. S. Soltis, R. Govaerts, I. J. Wright, J. Oleksyn, P. B. Reich, and others. 2013. Global woodiness database. Data from: Three keys to the radiation of angiosperms into freezing environments. doi:10.5061/dryad.63q27/2.
Hijmans, R., S. Cameron, J. Parra, P. Jones, and A. Jarvis. 2004. The worldclim interpolated global terrestrial climate surfaces. ver. 1.3. http://www.worldclim.org/, Accessed 13 February 2018.
Kattge, J., S. Diaz, S. Lavorel, I. C. Prentice, P. Leadley, G. Bönisch, E. Garnier, M. Westoby, P. B. Reich, I. J. Wright, et al. 2011. TRY–a global database of plant traits. Global Change Biology 17: 2905–2935.
Louca S, M. Doebeli. 2018. Efficient comparative phylogenetics on large trees. Bioinformatics 34: 1053–1055.
Schweingruber, F., and W. Landolt. 2005. The xylem database. http://www.wsl.ch/dendropro/xylemdb/, Accessed 13 Mar 2019.
Smith, S. A., and J. W. Brown. 2018. Constructing a broadly inclusive seed plant phylogeny. American Journal of Botany 105: 302–314.
Zanne, A. E., M. Westoby, D. S. Falster, D. D. Ackerly, S. R. Loarie, S. E. Arnold, and D. A. Coomes. 2010. Angiosperm wood structure: global patterns in vessel anatomy and their relation to wood density and potential conductivity. American Journal of Botany 97: 207–215.
Zanne, A. E., W. D. Pearse, W. K. Cornwell, D. J. McGlinn, I. J. Wright, and J. C. Uyeda. 2018. Functional biogeography of angiosperms: life at the extremes. New Phytologist 218: 1697–1709.
Zizka, A., R. E. Onstein, R. Rozzi, P. Weigelt, H. Kreft, M. J. Steinbauer, H. Bruelheide, and F. Lens. 2022. The evolution of insular woodiness. Proceedings of the National Academy of Sciences, USA 119(37): e220862911.
