Data from: How to validate a Bayesian evolutionary model
Data files
Nov 12, 2024 version files 1.74 MB
-
coal_BHV_dist.csv
28.53 KB
-
coal_KC_dist.csv
25.96 KB
-
coal_max_branch_dist.csv
29.01 KB
-
coal_range_branch_dist.csv
29.18 KB
-
coal_RF_dist.csv
1.11 KB
-
coal_tree_length_dist.csv
28.69 KB
-
covg_match_100_to_200_tips_yule_bm.RData
11.81 KB
-
covg_match_3_to_300_tips_yule_bm.RData
11.54 KB
-
covg_mismatch_3_to_300_tips_yule_bm.RData
11.37 KB
-
README.md
4.51 KB
-
RUV_height_scenario_1.csv
246.48 KB
-
RUV_height_scenario_2.csv
245 KB
-
RUV_height_scenario_3.csv
256.32 KB
-
RUV_scenario_1.csv
220.15 KB
-
RUV_scenario_2.csv
218.89 KB
-
RUV_scenario_3.csv
220.12 KB
-
RUV_topological.csv
148.41 KB
Abstract
Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate, and introduce new good practices for assessing the correctness of a model implementation, with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.
The main supplementary file (text document) is mendes_etal_validation_supp_2024.pdf
. You should find this file hosted on Zenodo and pointed to by Dryad.
This Dryad repository also hosts key output files for reproducing the figures in the manuscript. Namely:
covg_match_3_to_300_tips_yule_bm.RData
: R workspace containing table with simulated and inferred (HPD) values of parameters investigated in “Scenario 1”. This data underlies coverage validation plots (Fig. 4 and 7) in the main manuscript;covg_mismatch_3_to_300_tips_yule_bm.RData
: R workspace containing table with simulated and inferred (HPD) values of parameters investigated in “Scenario 2”. This data underlies coverage validation plots (Fig. 4 and 7) in the main manuscript;covg_match_100_to_200_tips_yule_bm.RData
: R workspace containing table with simulated and inferred (HPD) values of parameters investigated in “Scenario 3”. This data underlies coverage validation plots (Fig. 4 and 7) in the main manuscript;coal_BHV_dist.csv
: Table containing simulated, inferred (HPD) values, and rank values of the BHV phylogenetic-space statistic, for the coalescent model mentioned in the main text (Fig. 8), and examined in more detail in the supplement (Supplementary Figs. 4 and 5);coal_KC_dist.csv
: Table containing simulated, inferred (HPD) values, and rank values of the KC phylogenetic-space statistic, for the coalescent model mentioned in the main text (Fig. 8), and examined in more detail in the supplement (Supplementary Figs. 4 and 5);coal_RF_dist.csv
: Table containing simulated, inferred (HPD) values, and rank values of the Robinson-Foulds phylogenetic-space statistic, for the coalescent model mentioned in the main text (Fig. 8), and examined in more detail in the supplement (Supplementary Figs. 4 and 5);coal_max_branch_dist.csv
: Table containing simulated, inferred (HPD) values, and rank values of the largest branch length (LB) phylogenetic-space statistic, for the coalescent model mentioned in the main text (Fig. 8), and examined in more detail in the supplement (Supplementary Figs. 4 and 5);coal_range_branch_dist.csv
: Table containing simulated, inferred (HPD) values, and rank values of the maximum branch distance (R) phylogenetic-space statistic, for the coalescent model mentioned in the main text (Fig. 8), and examined in more detail in the supplement (Supplementary Figs. 4 and 5);coal_tree_length_dist.csv
: Table containing simulated, inferred (HPD) values, and rank values of the tree length (LEN) phylogenetic-space statistic, for the coalescent model mentioned in the main text (Fig. 8), and examined in more detail in the supplement (Supplementary Figs. 4 and 5);RUV_topological.csv
: This table merges all “coal_*” tables listed above;RUV_scenario_1.csv
: Table containing simulated, inferred (HPD) values, and rank values of the parameters investigated in “Scenario 1”. This data underlies RUV plots (Fig. 6) in the main manuscript;RUV_scenario_2.csv
: Table containing simulated, inferred (HPD) values, and rank values of the parameters investigated in “Scenario 2”. This data underlies RUV plots (Fig. 6) in the main manuscript;RUV_scenario_3.csv
: Table containing simulated, inferred (HPD) values, and rank values of the parameters investigated in “Scenario 3”. This data underlies RUV plots (Fig. 6) in the main manuscript;RUV_height_scenario_1.csv
: Table containing simulated, inferred (HPD) values, and rank values of the root age in “Scenario 1”. This data underlies RUV plots (Fig. 7) in the main manuscript;RUV_height_scenario_2.csv
: Table containing simulated, inferred (HPD) values, and rank values of the root age in “Scenario 2”. This data underlies RUV plots (Fig. 7) in the main manuscript;RUV_height_scenario_3.csv
: Table containing simulated, inferred (HPD) values, and rank values of the root age in “Scenario 3”. This data underlies RUV plots (Fig. 7) in the main manuscript.
Finally, the interested reader can also find the GitHub repository for this paper here. In that repository you will find instructions and the necessary files to reproduce the entire pipeline of the (i) coverage, and (ii) RUV analyses, as we did.