Timescales are of fundamental importance to evolutionary biology as they facilitate hypothesis tests of historical evolutionary processes. Through the incorporation of fossil occurrence data, the fossilised birth-death (FBD) process provides a framework for estimating divergence times using more palaeontological data than traditional node calibration approaches have allowed. The inclusion of more data can refine evolutionary timescale estimates, but for many taxonomic groups it is computationally infeasible to include all fossil occurrence data. Here, we utilise both empirical data and a simulation framework to identify approaches to subsampling fossil occurrence data that result in the most accurate estimates of divergence times. To achieve this we assess the performance of the FBD-Skyline model when implementing multiple approaches to incorporating subsampled fossil occurrences. Our results demonstrate that it is necessary to account for all available fossil occurrence data to achieve the most accurate estimates of clade age. We show that this can be achieved if an empirical Bayes approach to account for fossil sampling through time is applied to the FBD process. Random subsampling of occurrence data can lead to estimates of clade age that are incompatible with fossil evidence if no control over the affinities of fossil occurrences is enforced. Our results call into question the accuracy of previous divergence time studies incorporating the FBD process that have used only a subsample of all available fossil occurrence data.
Supplementary Figure 1
Median age estimates and 95% HPDs obtained to demonstrate the behaviour of the FBD process without the subsampling of fossil occurrences. These results demonstrate the suitability of the simulation framework employed for subsequent analyses.
Positive_Control.pdf
Supplementary Figure 2
Median age estimates and 95% HPDs for Hymenoptera obtained using a range of approaches to constructing a subsample of fossil occurrences and constraining their placement. These approaches consist of: a uniform subsample of occurrences with and without topological constraints, a uniform subsample of occurrences supplemented with the oldest unequivocal members of clades which were then constrained to their respective crown groups, a subsample consisting of only the oldest unequivocal members of clades with and without topological constraints applied to constrain them to their respective crown groups.
Supp_Empirical.PDF
Supplementary Figure 3
The accuracy and precision of estimated node ages obtained from subsamples of 100 replicate fossil occurrence datasets after the addition of the oldest occurrences for each clade to the subsample. These occurrences were then topologically constrained to lineages that descend from the node either 1, 2, or 4 nodes below the direct ancestor of the occurrence. Each point represents the median posterior age estimate of one clade, with grey bars representing the 95% HPD for that node age estimate. When occurrences are placed one node below their direct ancestor an approach in which the rate of fossil sampling is estimated produces the greatest accuracy. When occurrences are placed with reduced accuracy then the accuracy of age estimates when sampling rate is an estimated parameter of the FBD process decreases. Conversely, fixing the rate of fossil sampling or placing an informed prior on this parameter improves the accuracy as fossils are placed with reduced accuracy. For all cases in which fossils occurrences are placed at a node that is lower than their true ancestral fossil the 95% HPDs of age estimates extend to ages that violate the minimum age of the clade, as implied by the complete sample of fossil occurrences.
Drop_Results.eps