Data and code for: Diversity through space and time in the Upper Jurassic Morrison Formation, western USA
Data files
Mar 11, 2024 version files 1.08 MB
-
Collections_with_latitude.xlsx
-
Collections_with_time.xlsx
-
Collector_curve_data.xlsx
-
Corrected_abundance_with_latitude.xlsx
-
Corrected_abundance_with_time.xlsx
-
Genera_with_latitude.xlsx
-
Occurrence_data_with_STs_dinos.xlsx
-
Occurrence_data_with_STs.xlsx
-
README.md
Abstract
Understanding how biodiversity has changed through time and space is a central aim of paleobiology. To elucidate accurate biodiversity patterns in deep time, regional case studies, where sampling biases can be minimized, are needed. The Upper Jurassic Morrison Formation of the western USA crops out over 1.2 million km2 and covers 12 degrees of latitude. It was deposited over a ~9-million-year time period and was home to some of the most iconic dinosaurs. Utilizing a new, high-resolution chronostratigraphic framework for the formation, tetrapod occurrences from the Paleobiology Database were temporally and spatially mapped to examine patterns of diversity change through time and space, and the geographic ranges of taxa were examined to shed light on niche partitioning. Latitudinally, diversity was found to peak in the center of the basin, perhaps due to the availability of water resources. Diversity increased over time in the Morrison Formation, and there is no evidence to indicate a decline in diversity prior to the extinction of the fauna at the end of the Jurassic. There appears to be some degree of geographic separation of faunas in the Morrison basin, with southeastern and northwestern fauna, albeit with a number of overlapping taxa. High-resolution climate models paired with detailed sedimentological analysis could help to elucidate the drivers of the patterns observed here.
README: Data and code for: Diversity through space and time in the Upper Jurassic Morrison Formation, western USA
https://doi.org/10.5061/dryad.6m905qg77
This dataset provides raw data and code for all analyses carried out in the above paper. There are eight .xlsx files that contain raw data, and four scripts that implement the analyses carried out in R.
The R script 'Diversity_analysis_code.R' plots raw generic occurrences, tetrapod-bearing collections and abundance against latitude and systems tract, and carries out correlation tests to examine whether these are statistically correlated with each other. It uses the data files "Genera_with_latitude.xlsx", "Collections with time.xlsx", "Corrected abundance with time.xlsx", "Collections_with_latitude.xlsx" and "Corrected abundance with latitude.xlsx".
The R script 'iNext_code.R' sample standardizes the raw generic occurrence data for each degree of latitude and for each systems tract using the iNEXT package and quorum levels from 0.3 to 0.7 and at a confidence interval of 0.95, and plots the results against time and latitude. It uses the data files "Genera_with_latitude.xslx".
The R script 'Dino_data_code.R' calculates the proportions of different dinosaurs across the whole Morrison Formation and by systems tract and plots the results as a proportional bar graph. It uses the data file "Occurrence data with STs_dinos.xlsx". The same plots could be produced for all tetrapod occurrences in the Morrison Formation and thus the data file "Occurrence data with STs.xlsx" which contains this information, is also included in the data package.
The R script 'Collector_curve_code.R' plots collector curves for different systems tracts in the Morrison Formation and uses the data file "Collector_curve_data.xlsx".
Description of the data and file structure
The data package is divided into raw data and R scripts. The data files are referred to in the instructions in the R scripts.
"Collections with time.xlsx" comprises two columns, one titled 'coll', and which comprises all tetrapod-bearing collections for the Morrison Formation in the Paleobiology Database as of December 2022. The column titled "ST" comprises the systems tract that these collections are found in. Blank cells in column "ST" indicate that the systems tract is unknown.
"Collections_with_latitude.xlsx" comprises two columns. 'lat' is the degrees of latitude north rounded to 1 decimal place. 'coll' is the number of collections found within that degree of latitude in the Morrison Formation. These data are derived from a Paleobiology Database download in December 2022.
"Collector_curve_data.xlsx" comprises four columns. 'taxon' is an occurrence list of tetrapods from the Morrison Formation. 'pubyr' is the year that occurrence was published. 'collection' is the location where the occurrence was collected, and 'st' is the systems tract the occurrence (and collection) is found in. Blank cells indicate the systems tract is unknown. Data in the first three columns are from a Paleobiology Database download in December 2022.
"Corrected abundance with latitude.xlsx" has two columns. 'lat' is degrees latitude rounded to 1 decimal place. 'abun' is the number of tetrapod specimens found in the Morrison Formation. These data are derived from a Paleobiology Database download in December 2022 and are shown in the data sheet "Genera_with_latitude".
"Corrected abundance with time.xlsx" has two columns. 'ST' is systems tract. 'abun' is the number of tetrapod specimens found in the Morrison Formation. Abun data is derived from a Paleobiology Database download in December 2022, and is shown in the data sheet "Genera_with_latitude"
"Genera_with_latitude.xlsx" comprises four columns. 'accepted_name' is a list of tetrapod bearing occurrences to generic level known in the Morrison Formation. 'lat' is the latitude at which the occurrence was found. 'abund_va' is the number of specimens of that taxon that were found at the same site. 'ST' is systems tract. Blank cells indicate unknown values. Data in the first three columns are from a Paleobiology Database download made in December 2022.
"Occurrence data with STs.xlsx" is the Palaeobiology Database download made in December 2022 from which other data sheets are derived. It shows all tetrapod occurrences in the Morrison Formation recorded in the database at that time. Added to that is the column 'Systems tract', which gives the systems tract in which each occurrence is found. Blank cells in the Systems tract column indicate that the systems tract is unknown. Blank cells elsewhere in the data sheet indicate inapplicable values.
"Occurrence data with STs_dinos.xlsx" is as above but with only dinosaur occurrences (rather than all tetrapods).
Sharing/Access information
Data was derived from The Paleobiology Database.
Code/Software
This data package contains R scripts, which were written in R version 4.0.4. Their function and the data they use is described above. The scripts require the following packages to run correctly: Tidyverse, nlme, ggfortify, iNEXT, ggplot2, patchwork. This information is included in the individual scripts.
Methods
All vertebrate occurrences in the Morrison Formation were downloaded from the Paleobiology Database (PBDB; paleobiodb.org; accessed 23/12/2022). The data were visually inspected and occurrences related to eggshells or tracks were removed, leaving only those pertaining to body fossils. This resulted in 1397 occurrences. Taxonomy was cleansed following the recent literature. Occurrences were manually attributed to systems tracts described in Maidment & Muxworthy (2019) based on stratigraphic logs or descriptions in the literature for each locality and supplemented with first-hand observations of a number of quarries. A full list of quarries, systems tracts, and references for the stratigraphic location are provided in the spreadsheet “Quarry data.csv” in the Online Supplementary Material available with the manuscript. As not all references provided stratigraphic logs or descriptions, it was not always possible to attribute quarries to stratigraphic locations, but 1144 occurrences (82%) could be attributed to a systems tract. The occurrences represent 300 discrete collections, for which stratigraphic data is known for 182 (60%). 957 occurrences are identified to the generic level or better, of which 799 could be assigned stratigraphic data (83%). These data are available in this data package in the spreadsheet “Occurrence data with STs.csv”. Diversity analyses were carried out in R ver. 4.0.4 (R Core Team, 2021) using the Tidyverse package (Wickham et al., 2019) and all code is available in this data package.
Latitudinal biodiversity
Raw diversity—In order to assess how biodiversity changed with latitude in the Morrison Formation, two measures were used. The first measure was diversity, which herein equates to generic richness. Generic occurrence data (available in this data package in the spreadsheet “Genera_with_latitude.xlsx”) were binned per degree of latitude and the number of distinct genera in each latitudinal bin was summed. This was carried out for the total dataset and for data within each system tract. The second measure was abundance. An occurrence in the PBDB is the presence of a taxon within a collection; however, for some collections, there were multiple occurrences of the same taxon, and that is signified in the PBDB using abundance data. Abundance was calculated for each collection based on the “abund_value” column in the PBDB data. Where no abundance was specified for an occurrence, the abundance was assumed to be equal to one. Not all abundances are equal: a single abundance datapoint might indicate a single, more-or-less complete articulated sauropod skeleton or might refer to a single isolated fish scale. Microsites and bone beds are therefore heavily over-represented in the abundance data, while sites with articulated skeletons may be under-represented. Abundance data was binned per degree of latitude. This data is available in the spreadsheet “corrected abundance with latitude.xlsx” in this data package.
In order to assess whether the raw diversity patterns observed were influenced by sampling bias, the number of collections per degree of latitude was calculated from the PBDB occurrence data. This data is available in the spreadsheet “Collections_with_latitude.xlsx” in this data package. Diversity, abundance, and collections were plotted against latitude in R, and correlations between the curves were investigated using Spearman’s Rho, Kendall’s Tau, and generalized least squares regression using a first-order autoregressive model (corARMA). The latter was carried out because it reduces the chances of overestimating the statistical significance of regression lines due to serial correlation in the latitudinal series. Data was naturally log-transformed prior to GLS regression, which was carried out using the gls() function in the R package nlme (Pinheiro et al., 2018). Code for these analyses is available in this data package as “diversity_analysis_code.R”.
Subsampled diversity—In order to account for the strong degree of sampling bias observed in the data (see Results), shareholder quorum sub-sampling (SQS; Alroy 2010) was carried out on the whole dataset using the ‘estimateD’ command and a confidence interval of 0.95 in the R package iNEXT (Hsieh et al. 2016). The analysis was carried out in R ver. 4.0.4 (R Core Team, 2021) and the code is available in this data package, “iNext_code.R”. Cleansed generic occurrence data from the PBDB was used, and abundances of specific taxa were calculated for each degree of latitude. Investigation of the corrected abundance data (see above) indicated that it was overwhelmed with occurrences from two sites: specimens of Diplodocus from the Mother’s Day Quarry in southern Montana (1483 specimens recorded), and specimens of Allosaurus from the Dry Mesa Quarry of Utah (200 specimens recorded). These quarries are bone beds and the abundance values most likely relate to the number of individual bones found, rather than the number of individuals that were actually present. Using these abundance data when sample-standardizing is therefore problematic, and consequently, it was not used. To investigate whether latitudinal bins with very low sample sizes and a limited number of generic occurrences were impacting the results of the analysis, the latitudinal data were examined and latitudinal bins with fewer than 10 occurrences were removed. The analysis was re-run. SQS was carried out at quorum levels from 0.7 to 0.3. Sub-sampled diversity analyses were also attempted for each system tract, but there was too little data to provide meaningful results.
Temporal diversity
Raw diversity—In order to assess how diversity changed through time in the Morrison Formation, diversity (=generic richness) and abundance were again used. Cleansed occurrences and abundance data from the PBDB were binned by systems tract; those for which no systems tract data was known were discarded. Diversity and abundance were plotted against systems tract in R.
Subsampled diversity—To account for different levels of sampling in different systems tracts, shareholder quorum subsampling was carried out on cleansed occurrence data for each systems tract, following the method used for latitudinal diversity. SQS was carried out with quorum levels from 0.7 to 0.3.
Collector curves
In order to assess how well sampled the B4 and C6 systems tracts were relative to each other (see Discussion), collector curves, showing the cumulative number of unique collections and the cumulative number of new taxa identified per year for the B4 and C6 systems tracts were built using the year the occurrence was published, which was provided in the PBDB download. The data is contained in the spreadsheet “Collector_curve_data.xlsx” and the code is provided as “collector_curve_code” in this data package.