New frontiers in dinosaur exploration
Data files
May 15, 2025 version files 11.34 MB
-
Ages.txt
4.03 KB
-
Dino_tree_R1.nex
7.50 KB
-
Dinos.xlsx
505.22 KB
-
Dinosaur_all_publications_2014-2023_summary.xlsx
17.59 KB
-
Dinosaur_data.xlsx
9.25 MB
-
Dinosaur_taxonomy_2014-2023_summary.xlsx
17.54 KB
-
pbdb_data_occurrences.csv
1.53 MB
-
README.md
4.21 KB
Abstract
200 years after the naming of the first dinosaur, taxonomic studies remain an important component of dinosaur research. Around 50 new dinosaurs are named each year, and are discovered from across the globe. The rate of new dinosaur discovery shows no signs of slowing, but not all geographic areas and temporal windows have been equally investigated. The potential for new dinosaur discoveries in India and Africa seems particularly high, while the Carnian, when dinosaurs probably originated, and the Middle Jurassic, when the major clades diversified, offer the best opportunities to make discoveries that will fundamentally change our understanding of dinosaur evolution. A major challenge to the discovery of new dinosaurs is funding. Frontier fieldwork is sometimes viewed as too risky to fund, while basic taxonomic work is considered to lack impact. As a consequence, we risk an ‘extinction of experience’, where researchers have limited training in the basic field and specimen-based research that underpins our discipline. Going forward, new remote sensing techniques may help to find prospective areas, while 3D scanning apps on smartphones will allow us to quickly record field data. Artificial intelligence is likely to be used increasingly for CT segmentation and identification of problematic fossils.
https://doi.org/10.5061/dryad.05qfttfd3
Description of the data and file structure
These data were collected to review the state of dinosaur taxonomy and systematics today, as part of an invited review titled 'New Frontiers in Dinosaur Exploration'. The raw data tables in xlsx and csv format were downloaded from the Paleobiology Database or Scopus and then cleansed according to the methods provided here and in the publication. The .txt file was compiled from the literature, while the .nex file is a phylogenetic tree that represents a consensus dinosaur phylogeny and was hand-built in Mesquite.
Files and variables
File: Ages.txt
Description: A file showing the first and last appearance data for dinosaur taxa in the phylogenetic tree. This file is needed for time-calibration of the phylogenetic tree (DinotreeR1.nex) and is used in the code "Time-calibration_palaeotree.R".
Variables
- Taxon: This is a list of dinosaur taxa that appear in the phylogenetic tree
- FAD: first appearance datum
- LAD: last appearance datum
- #Ages based on International Chronostratigraphic Chart 2024-12#: The hashtag prevents any data from this column from being implemented in R, but provides a rationale for FADs and LADs.
File: Dinotree_R1.nex
Description: A consensus phylogeny of Dinosauria hand-built in Mesquite. This is a consensus phylogenetic tree of dinosaurs that is time-calibrated using the code "Time-calibration_palaeotree.R" and the file "Ages.txt".
File: Dinos.xlsx
Description: A dataset for building collector curves, downloaded from the PBDB and cleansed according to the methods. Blank cells in this spreadsheet represent missing data. It is used with the code "Collector_curve_code.R".
Variables
- taxon_rank: the Linnean rank of the taxon
- taxon_name: the name of the taxon
- continent: the continent the taxon came from
- how_discovered: empty column for this dataset
- taxon_attr: the taxonomic authority for the taxon name
- difference: current validity status of the taxon
- accepted_rank: the Linnean or cladistic rank of the variable 'accepted_name'
- accepted_name: the currently accepted name of the taxon
- parent_name: the higher-level classification of the taxon
- ref_author: the author who named the taxon
- ref_pubyr: the year the taxon was published
- n_occs: number of occurrences
- class: the taxon class
- family: the taxon family
- genus: the taxon genus
- primary_reference: the primary reference for the taxon.
File: Dinosaur_data.xlsx
Description: A dataset for building maps, downloaded from the PBDB and cleansed according to the methods. Blank cells in this spreadsheet represent missing data. It is used with the code "Map.R"
Variables:
This file has a very large number of variables, which are derived from the PBDB, most of which are not used in this script. Details of all of the variables can be found here: https://paleobiodb.org/data1.2/
File: pbdb_data_occurrences.csv
Description: A dataset for building the occurrence histogram, downloaded from the PBDB and cleansed according to the methods. Blank spaces in this spreadsheet represent missing data. It is used with the code "Occurrences_curve.R"
Variables:
This file has a very large number of variables, which are derived from the PBDB, most of which are not used in this script. Details of all of the variables can be found here: https://paleobiodb.org/data1.2/
File: Dinosaur all publications 2014-2023.xlsx
Description: Raw data for Scopus information on dinosaurs provided in the manuscript.
File: Dinosaur taxonomy 2014-2023 summary.xlsx
Description: Raw data for Scopus information on searches on dinosaur taxonomy provided in the manuscript.
Code/software
The data files can be viewed in any text editor, Excel, or Google Sheets. The code can be implemented in R.
Access information
Other publicly accessible locations of the data:
Collector curves–All dinosaur regular genera and species, both valid and invalid, were downloaded from the Paleobiology Database (PBDB; paleobiodb.org) on 17th December 2024. The data were cleaned to remove Avialae, ichnotaxa, and ootaxa. Taxa that were listed as invalid due to misspellings, obsolete variates, or that were renamed for grammatical or linguistic reasons were removed. Nomina dubia, nomina nuda, objective and subjective synonyms, and recombinations were retained. Collector curves (Fig. 1) were built in R 3.4.0 [124]. Code and raw data are available in the Supplementary Material.
Time-calibrated phylogeny–A consensus dinosaur phylogeny was manually produced in Mesquite [125]. First and last appearance data were collected for all taxa in the phylogeny and are listed in the data file provided in the Supplementary Material. First and last appearances generally correspond to the earliest and latest dates of the Stage from which the taxon is known, unless more accurate information was available in the literature. Ages were derived from the International Chronostratigraphic Chart version 2024/12. The consensus phylogeny was time-calibrated using the TimePalaeoPhy function in the R package Paleotree [126] with minimum branch lengths of 0.5 million years. Code and raw data are available in the Supplementary Material.
Occurrences through time histogram–All dinosaur body fossil occurrences were downloaded from the PBDB on 22nd January 2025 and were manually cleansed to remove ichnotaxa and ootaxa. The midpoint of the first appearance datum and last appearance datum was taken for each occurrence, and these midpoints were plotted in 1 million year bins in R 3.4.0 [124]. Code and raw data are available in the Supplementary Material.
Occurrences map–Non-avialan dinosaur occurrences, excluding form taxa, were downloaded from the PBDB (27th January 2025). These were manipulated in R 4.4.2. The data set was pruned to limit it to species type occurrences only. Triassic, Jurassic and Cretaceous occurrences were plotted onto modern day maps using the packages maps and ggplot. Code and raw data are available in the Supplementary Material.
124. R Core Team. 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
125. Maddison WP, Maddison DR. 2023. Mesquite: a modular system for evolutionary analysis. Version 3.81. http://www.mesquiteproject.org.
126. Bapst DW. 2012. Paleotree: an R package for palaeontological and phylogenetic analyses of evolution. Methods in Ecology and Evolution 3: 803–807.
