Data from: Fossilisation can mislead analyses of phenotypic disparity
Data files
Oct 08, 2024 version files 184.99 KB
-
empirical_data.zip
118.57 KB
-
README.md
1.75 KB
-
simulations.zip
2.48 KB
-
tables.zip
62.19 KB
Abstract
Analyses of morphological disparity can incorporate living and fossil taxa to facilitate the exploration of how phenotypic variation changes through time. However, taphonomic processes introduce non-random patterns of data loss in fossil data and their impact on perceptions of disparity is unclear. To address this, we characterise how measures of disparity change when simulated and empirical data are degraded through random and structured data loss. We demonstrate that both types of data loss can distort the disparity of clades, and that the magnitude and direction of these changes varies between the most commonly employed distance metrics and disparity indices. The inclusion of extant taxa and exceptionally preserved fossils mitigates these distortions and clarifies the full extent of the data lost, most of which would otherwise go uncharacterised. This facilitates the use of ancestral state estimation and evolutionary simulations to further control for the effects of data loss. Where the addition of such reference taxa is not possible, we urge caution in the extrapolation of general patterns in disparity from datasets that characterise subsets of phenotype, which may represent no more than the traits that they sample.
https://doi.org/10.5061/dryad.x69p8czmn
Thomas J. Smith, Robert S. Sansom, Davide Pisani & Philip C. J. Donoghue (2022)
Description of the data and file structure
folder:”empirical_data”
Contains both the small (rep1) and large (complete) mammal datasets in both nexus and RDS formats, and the mammal tree.
folder:”simulations”
Contains tree and model used to simulate discrete character data.
folder:”scripts”
Contains all R scripts used to conduct analyses and plot figures.
folder:”tables”
subfolder:”simulated”
Contains the summary statistics for the changes in disparity induced through the removal of data from the simulated matrix for all combinations of
data loss type, proportion of data removed, distance metric, and disparity index.
subfolder:”empirical”
Contains the summary statistics for the changes in disparity induced through the removal of data from the empirical matrices (small and large)
for all combinations of data loss type, proportion of data removed, distance metric, and disparity index.
Summary statistic CSV file naming key:
sim = simulated matrix
small = small mammal matrix
large = large mammal matrix
all = extant + fossil disparity
fossil = fossil disparity
GED = generalised Euclidean distance
MORD = maximum observable rescaled distance
R = random data loss
NR = non-random data loss
1-4 = proportion of data removed (1 being least, 4 being most)
Abbreviations inside summary statistic CSV files:
MPD = mean pairwise distance
MPEFD = mean pairwise extant-fossil distance
MaxPD = maximum pairwise distance
SOV = sum of variances
SOR = sum of ranges
DBEFC = distance between extant and fossil centroids