The ability of palaeontologists to correctly diagnose and classify new fossil species from incomplete morphological data is fundamental to our understanding of evolution. Different parts of the vertebrate skeleton have different likelihoods of fossil preservation and varying amounts of taxonomic information, which could bias our interpretations of fossil material. Substantial previous research has focused on the diversity and macroevolution of non-avian theropod dinosaurs. Theropods provide a rich dataset for analysis of the interactions between taxonomic diagnosability and fossil preservation. We use specimen data and formal taxonomic diagnoses to create a new metric, the Likelihood of Diagnosis (LoD), which quantifies the diagnostic likelihood of fossil species in relation to bone preservation potential. We use this to assess whether a taxonomic identification bias impacts the non-avian theropod fossil record. We find the patterns of differential species abundance and clade diversity are not a consequence of their relative diagnosability. Although there are other factors that bias the theropod fossil record, our results suggest patterns of relative abundance and diversity for theropods might be more representative of Mesozoic ecology than often considered.

Collected data:

Theropod_Completeness_data: Raw completeness data updated from Cashmore and Butler (2019). doi.org/10.5061/dryad.37c840g. Completeness data were primarily gathered from figures and descriptive text in the literature, supplemented by additional online sources, museum catalogues, first-hand observation of specimens, and via personal communications.

Theropod_Bone_occ_specimen_data: The number of individual occurrences of each bone per theropod specimen. Due to the nature of the completeness data collection these estimations had to be projected from raw completeness scores based on the expected number of elements per continuous series (teeth, vertebrae, ribs, digits)

Theropod_Bone_occ_species_data: The number of individual occurrences of each bone per theropod species, projected from raw completeness scores.

Theropod_Diagnoses_data: The number of autapomorphies and 'unique combination' of characters, identified for each skeletal element for all valid theropod species from taxonomic diagnoses within published literature. The total ‘unique combination’ of characters was regarded as equivalent to a single autapomorphy. Therefore, for each species with such diagnoses, the individual characters were scored as a proportion of the sum of all the characters (i.e., for a ‘unique combination’ of four characters, each character represents 25% of an autapomorphy)

R inputs:

R_input_bone_occs_species: Bone occurrences per species taken from Theropod_Bone_occ_species_data

R_input_bone_occs_species_0-1: Bone presence/absence per species taken from Theropod_Bone_occ_species_data

R_input_bone_occs_specimens: Bone occurrences per individual specimen taken from Theropod_Bone_occ_specimen_data

R_input_bone_occs_specimens_0-1: Bone presence/absence per individual specimen taken from Theropod_Bone_occ_specimen_data

R_input_characters_autaps&UCs: Diagnostic characters identified for each skeletal element for all valid theropod species taken from Theropod_Diagnoses_data

R_input_pbdb_data_all_taxa: Non-avian theropod dinosaur occurrence data downloaded from the Paleobiology Database (PBDB) and manually cleaned and manipulated for key data extraction

Output data reused in analyses:

R_output_diag_df_LoD: species specific data including calculated LoD scores and abundance proxies. Used for species-specific statistical comparisons

R_output_subgroup_data: subgroup data including calculated mean LoD scores and summarised abundance proxies, derived from personal data collection.

R_output_subgroup_data2: subgroup data including calculated mean LoD scores and summarised abundance proxies with additional PBDB data included. Used for subgroup statistical comparisons

R_output_temporal_data_pers: temporal data included calculated mean LoD scores and summarised abundance proxies, derived from personal data collection. Used for temporal statistical comparisons

R_output_temporal_data_pbdb: temporal data included calculated mean LoD scores and summarised abundance proxies, derived from PBDB data. Used for temporal statistical comparisons

Formations_data_tables: Raw species-specific input data (personal and PBDB data) and Formation specific outputs manipulated from R, and used in formation based statistical comparisons. Formation data workflow in file.

R code: R code. LoD calculation; R code. Taxonomic comparisons; R code. Formation comparisons; R code. Temporal comparisons

Analytical supporting results:

Support results table. Ljung-Box_tests_pbdb_data_residuals: Results from Ljung-Box residual autocorrelation tests of GLS models comparing temporal changes in mean LoD to abundance proxies derived from PBDB data.

Support results table. Ljung-Box_tests_personal_data_residuals: Results from Ljung-Box residual autocorrelation tests of GLS models comparing temporal changes in mean LoD to abundance proxies derived from personal data collection.

Support results table. Shapiro-Wilk_tests_pbdb_temporal_data_residuals: Results from Shapiro-Wilk's normality tests of residuals from GLS models comparing temporal changes in mean LoD to abundance proxies derived from PBDB data.

Support results table. Shapiro-Wilk_tests_personal_temporal_data_residuals: Results from Shapiro-Wilk's normality tests of residuals from GLS models comparing temporal changes in mean LoD to abundance proxies derived from personal data collection.

Support results table. Shapiro-Wilk_tests_Formation_LoD_dist: Results of Shapiro-Wilk's normality tests of the distribution of LoD scores for each of the most specious theropod formations.

Support results table. Shapiro-Wilk_tests_Stage_LoD_dist: Results of Shapiro-Wilk's normality tests of the distribution of LoD scores for each Mesozoic geological stage.

Support results table. Shapiro-Wilk_tests_Subgroup_dist: Results of Shapiro-Wilk's normality tests of the distribution of LoD scores for each Mesozoic geological stage.

Support results table. Subgroup_comparisons_mean,trim,wins: Results of comparisons between the Trimmed and Winsorized mean LoD and species richness and select abundance proxies across taxonomic subgroups using GLS models.

Support results table. Temporal_comparisons_mean,trim,wins: Results of comparisons between the Trimmed and Winsorized mean LoD and species richness and select abundance proxies through geological time using GLS models.

Data from: Taxonomic identification bias does not drive patterns of abundance and diversity in theropod dinosaurs

Data files

Abstract

Data from: Taxonomic identification bias does not drive patterns of abundance and diversity in theropod dinosaurs

Data files

Abstract

Methods

Works referencing this dataset