New frontiers in dinosaur exploration

Maidment, Susannah 1 ; Butler, Richard 2

Published May 15, 2025 on Dryad. https://doi.org/10.5061/dryad.05qfttfd3

Data files

May 15, 2025 version files 11.34 MB

Ages.txt

4.03 KB
Dino_tree_R1.nex

7.50 KB
Dinos.xlsx

505.22 KB
Dinosaur_all_publications_2014-2023_summary.xlsx

17.59 KB
Dinosaur_data.xlsx

9.25 MB
Dinosaur_taxonomy_2014-2023_summary.xlsx

17.54 KB
pbdb_data_occurrences.csv

1.53 MB
README.md

4.21 KB

Abstract

200 years after the naming of the first dinosaur, taxonomic studies remain an important component of dinosaur research. Around 50 new dinosaurs are named each year, and are discovered from across the globe. The rate of new dinosaur discovery shows no signs of slowing, but not all geographic areas and temporal windows have been equally investigated. The potential for new dinosaur discoveries in India and Africa seems particularly high, while the Carnian, when dinosaurs probably originated, and the Middle Jurassic, when the major clades diversified, offer the best opportunities to make discoveries that will fundamentally change our understanding of dinosaur evolution. A major challenge to the discovery of new dinosaurs is funding. Frontier fieldwork is sometimes viewed as too risky to fund, while basic taxonomic work is considered to lack impact. As a consequence, we risk an ‘extinction of experience’, where researchers have limited training in the basic field and specimen-based research that underpins our discipline. Going forward, new remote sensing techniques may help to find prospective areas, while 3D scanning apps on smartphones will allow us to quickly record field data. Artificial intelligence is likely to be used increasingly for CT segmentation and identification of problematic fossils.

https://doi.org/10.5061/dryad.05qfttfd3

Description of the data and file structure

These data were collected to review the state of dinosaur taxonomy and systematics today, as part of an invited review titled 'New Frontiers in Dinosaur Exploration'. The raw data tables in xlsx and csv format were downloaded from the Paleobiology Database or Scopus and then cleansed according to the methods provided here and in the publication. The .txt file was compiled from the literature, while the .nex file is a phylogenetic tree that represents a consensus dinosaur phylogeny and was hand-built in Mesquite.

Files and variables

File: Ages.txt

Description: A file showing the first and last appearance data for dinosaur taxa in the phylogenetic tree. This file is needed for time-calibration of the phylogenetic tree (DinotreeR1.nex) and is used in the code "Time-calibration_palaeotree.R".

Variables

Taxon: This is a list of dinosaur taxa that appear in the phylogenetic tree
FAD: first appearance datum
LAD: last appearance datum
#Ages based on International Chronostratigraphic Chart 2024-12#: The hashtag prevents any data from this column from being implemented in R, but provides a rationale for FADs and LADs.

File: Dinotree_R1.nex

Description: A consensus phylogeny of Dinosauria hand-built in Mesquite. This is a consensus phylogenetic tree of dinosaurs that is time-calibrated using the code "Time-calibration_palaeotree.R" and the file "Ages.txt".

File: Dinos.xlsx

Description: A dataset for building collector curves, downloaded from the PBDB and cleansed according to the methods. Blank cells in this spreadsheet represent missing data. It is used with the code "Collector_curve_code.R".

Variables

taxon_rank: the Linnean rank of the taxon
taxon_name: the name of the taxon
continent: the continent the taxon came from
how_discovered: empty column for this dataset
taxon_attr: the taxonomic authority for the taxon name
difference: current validity status of the taxon
accepted_rank: the Linnean or cladistic rank of the variable 'accepted_name'
accepted_name: the currently accepted name of the taxon
parent_name: the higher-level classification of the taxon
ref_author: the author who named the taxon
ref_pubyr: the year the taxon was published
n_occs: number of occurrences
class: the taxon class
family: the taxon family
genus: the taxon genus
primary_reference: the primary reference for the taxon.

File: Dinosaur_data.xlsx

Description: A dataset for building maps, downloaded from the PBDB and cleansed according to the methods. Blank cells in this spreadsheet represent missing data. It is used with the code "Map.R"

Variables:

This file has a very large number of variables, which are derived from the PBDB, most of which are not used in this script. Details of all of the variables can be found here: https://paleobiodb.org/data1.2/

File: pbdb_data_occurrences.csv

Description: A dataset for building the occurrence histogram, downloaded from the PBDB and cleansed according to the methods. Blank spaces in this spreadsheet represent missing data. It is used with the code "Occurrences_curve.R"

Variables:

File: Dinosaur all publications 2014-2023.xlsx

Description: Raw data for Scopus information on dinosaurs provided in the manuscript.

File: Dinosaur taxonomy 2014-2023 summary.xlsx

Description: Raw data for Scopus information on searches on dinosaur taxonomy provided in the manuscript.

Code/software

The data files can be viewed in any text editor, Excel, or Google Sheets. The code can be implemented in R.

Access information

Other publicly accessible locations of the data:

www.paleobiodb.org.

Collector curves–All dinosaur regular genera and species, both valid and invalid, were downloaded from the Paleobiology Database (PBDB; paleobiodb.org) on 17th December 2024. The data were cleaned to remove Avialae, ichnotaxa, and ootaxa. Taxa that were listed as invalid due to misspellings, obsolete variates, or that were renamed for grammatical or linguistic reasons were removed. Nomina dubia, nomina nuda, objective and subjective synonyms, and recombinations were retained. Collector curves (Fig. 1) were built in R 3.4.0 [124]. Code and raw data are available in the Supplementary Material.

Time-calibrated phylogeny–A consensus dinosaur phylogeny was manually produced in Mesquite [125]. First and last appearance data were collected for all taxa in the phylogeny and are listed in the data file provided in the Supplementary Material. First and last appearances generally correspond to the earliest and latest dates of the Stage from which the taxon is known, unless more accurate information was available in the literature. Ages were derived from the International Chronostratigraphic Chart version 2024/12. The consensus phylogeny was time-calibrated using the TimePalaeoPhy function in the R package Paleotree [126] with minimum branch lengths of 0.5 million years. Code and raw data are available in the Supplementary Material.

Occurrences through time histogram–All dinosaur body fossil occurrences were downloaded from the PBDB on 22nd January 2025 and were manually cleansed to remove ichnotaxa and ootaxa. The midpoint of the first appearance datum and last appearance datum was taken for each occurrence, and these midpoints were plotted in 1 million year bins in R 3.4.0 [124]. Code and raw data are available in the Supplementary Material.

Occurrences map–Non-avialan dinosaur occurrences, excluding form taxa, were downloaded from the PBDB (27th January 2025). These were manipulated in R 4.4.2. The data set was pruned to limit it to species type occurrences only. Triassic, Jurassic and Cretaceous occurrences were plotted onto modern day maps using the packages maps and ggplot. Code and raw data are available in the Supplementary Material.

124. R Core Team. 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

125. Maddison WP, Maddison DR. 2023. Mesquite: a modular system for evolutionary analysis. Version 3.81. http://www.mesquiteproject.org.

126. Bapst DW. 2012. Paleotree: an R package for palaeontological and phylogenetic analyses of evolution. Methods in Ecology and Evolution 3: 803–807.

New frontiers in dinosaur exploration

Data files

Abstract

README: New frontiers in dinosaur exploration

Description of the data and file structure

Files and variables

File: Ages.txt

Variables

File: Dinotree_R1.nex

File: Dinos.xlsx

Variables

File: Dinosaur_data.xlsx

Variables:

File: pbdb_data_occurrences.csv

Variables:

File: Dinosaur all publications 2014-2023.xlsx

File: Dinosaur taxonomy 2014-2023 summary.xlsx

Code/software

Access information

Methods