Estimating ancestral states of complex characters: A case study on the evolution of feathers
Data files
Oct 21, 2025 version files 699.07 KB
-
ASE_Results.zip
659.89 KB
-
Data.zip
34.55 KB
-
README.md
4.62 KB
Abstract
Feathers are a key novelty underpinning the evolutionary success of birds, yet the origin of feathers remains poorly understood. Debates about feather evolution hinge upon whether filamentous integument has evolved once or multiple times independently in the lineage leading to modern birds. These contradictory results stem from methodological differences in statistical ancestral state estimates. Here, we conduct a comprehensive comparison of ancestral state estimation methodologies applied to stem-group birds, testing the role of outgroup inclusion, tree time scaling method, model choice, and character coding strategy. Models are compared based on their Akaike Information Criteria (AIC), mutual information, as well as the uncertainty of marginal ancestral state estimates. Our results demonstrate that ancestral state estimates of stem-bird integument are strongly influenced by tree time scaling method, outgroup selection, and model choice, while character coding strategy seems to have less effect on the ancestral estimates produced. We identify the best-fitting and most generalizable models using AIC scores and leave-one-out cross-validation (LOOCV), respectively. Our analyses broadly support the independent origin of filamentous integument in dinosaurs and pterosaurs and support a younger evolutionary origin of feathers than has been suggested previously. In terms of model selection, we observe little correlation between AIC/AICc and LOOCV error, suggesting that, for our dataset, model fit does not reliably predict generalizability. However, both approaches favor models that infer a similar pattern of feather evolution. More globally, our study highlights that special care must be taken in selecting the outgroup, tree, and model when conducting ASE analyses.
NB: Please note that, due to the sudden and tragic passing of Pierre Cockx, the first author of this study, we have been unable to update the Dryad submission with the full level of detail that would normally be provided.
The scripts and data required to run the analyses described and produce the plots are provided here.
Data
The folder Data contains the following files:
- coding3: plumage data using coding strategy 3
- coding3_croco: same but with Crocodylus instead of Aetosaurus in the outgroups
- codingstrategies_plumage: file with plumage data coded using the different strategies described in the paper
- nodes: file with the age of the Avemetatarsalian node, used for time scaling of the tree
- ranges; ranges_croc: FAD and LAD of the taxa
- tree_raw; tree_raw_croco: raw trees
- treetime_croc: time-scaled tree with outgroup 4 for Experiment #1
- treetime_equal_tpp: time-scaled tree using the Equal (timePaleoPhy) method
- treetime_equal_dp: time-scaled tree using the Equal (DatePhylo) method
- treetime_mbl_tpp: time-scaled tree using the minimum branch length method
The folder ASE_Results contains the following files:
- Results of the ancestral state estimation (ASE) analyses conducted with 63 combinations of tree/model:
- 7 evolutionary models: Unordered, Ordered, Embedded Dependency, SMM switch, SMM independent, HRM with 2 rate categories, and HRM with 3 rate categories
- 3 transition rate models: ER, SYM, ARD
- 3 time-scaled trees: Equal (timePaleoPhy), Equal (DatePhylo), and minimum branch length 'mbl'
- ASE results for each of the experiments: experiment1_outgroup; experiment2_timescaling; experiment3_codingstrat; experiment4_models
- Results arranged for plotting with subset trees to facilitate comparisons (figures of the main text): experiment1_outgroup_dataforplot; experiment2_timescaling_dataforplot; experiment3_codingstrat_dataforplot; experiment4_models_dataforplot
- Results arranged for plotting with subset trees using ER (figures of the SI): experiment1_outgroup_dataforplot_ER_SI; experiment2_timescaling_dataforplot_ER_SI; experiment3_codingstrat_dataforplot_ER_SI; experiment4_models_dataforplot_ER_SI
- Excel file with calculation of the weights for model averaging: model_averaging_weigths
- Excel file with metrics comparing the models, for plotting Figure 7: model_metrics_dataforplot
Code
The folder Scripts_Analyses_Plots contains the scripts required to conduct the ASE analyses with the 63 combinations model-tree:
- ER+ER_ORD+ER_ED: script used for the analyses under the ER model with the Unordered, Ordered, and Embedded dependency models
- SYM+SYM_ORD+SYM_ED: script used for the analyses under the SYM model with the Unordered, Ordered, and Embedded dependency models
- ARD+ARD_ORD+ARD_ED: script used for the analyses under the ARD model with the Unordered, Ordered, and Embedded dependency models
- SMM_ER: script used for the analyses under the ER model with the Structured Markov Model (SMM), switch, and independent
- SMM_SYM: script used for the analyses under the SYM model with the Structured Markov Model (SMM), switch, and independent
- SMM_ARD: script used for the analyses under the ARD model with the Structured Markov Model (SMM), switch, and independent
- HRM_ER: script used for the analyses under the ER model with the Hidden Rates Model (HRM), 2 and 3 rate categories
- HRM_SYM: script used for the analyses under the SYM model with the Hidden Rates Model (HRM), 2 and 3 rate categories
- HRM_ARD: script used for the analyses under the ARD model with the Hidden Rates Model (HRM), 2 and 3 rate categories
- LOOCV: script used to conduct leave-one-out cross-validation
Scripts for conducting the experiments described in the main paper and for plotting the figures are also provided in the folder Scripts_Analyses_Plots.
- Experiment1; Experiment2; Experiment3: script used for conducting the experiments.
- Fig1; Fig3_Experiment1; Fig4_Experiment2; Fig5_Experiment3; Fig6_Experiment4; Fig7; Fig8+SI_FigS5: script used for producing the figures of the main text and Fig. S5 of the SI.
- SI_FigS1; SI_FigS2; SI_FigS3; SI_FigS4: scripts used for the other figures of the SI.
The folder Scripts_timescaling contains the scripts used for the a posteriori time scaling of the raw tree:
- treetime: script used for time scaling under the equal (timePaleoPhy), equal (DatePhylo), and mbl methods
- timetree_croco: script used for time scaling of the raw tree with Crocodylus in the outgroup
