Data from: Marine animal diversity across latitudinal and temperature gradients during the Phanerozoic
Data files
Apr 29, 2025 version files 79.55 MB
-
marine-lbg-ms-dryad-repo.zip
79.53 MB
-
README.md
21.78 KB
Abstract
The latitudinal biodiversity gradient (LBG) is a fundamental biological pattern seen across taxa and ecosystems today, but its drivers remain uncertain despite intense theoretical and empirical study. Palaeontological studies may add valuable evidence from diversity distributions during intervals when Earth system configurations were different from today, including potential analogues of future climate regimes. However, accurately characterising these distributions is challenging because the geographic scope of fossil record coverage varies through time, introducing biases that have not been quantified by most previous studies. Here, we attempt a comprehensive documentation of latitudinal biodiversity distributions for marine invertebrates through the past 540 million years, explicitly accounting for regional variation in diversity and sampling. We demonstrate large uncertainties when using current fossil data at this scale. Nevertheless, some signals are detectable. We show that marine animal biodiversity declined with increasing palaeolatitude and with decreasing temperature during at least some intervals from the Permian onwards (298.9 Ma). Additionally, we find that the LBG was shallower, on average, when Earth’s climate was hotter, although this signal is weak. We also document a strong, systematic bias due to intense sampling of the fossil record in North America and especially Europe, which may have led previous studies to incorrectly infer a mid-latitude diversity peak during hot intervals of Earth history. Our results provide a baseline for what current fossil databases might tell us about Phanerozoic LBGs of marine animals, and suggest that quantitative consideration of uncertainties will be central to advancing knowledge of geographic variation in diversity through Earth’s history.
Data description
This repository contains the occurrence data for marine animals downloaded from the Paleobiology Database (www.paleobiodb.org) in CSV format, which was used to run this analysis.
Code/software
The code described above was run in R version 4.2.1. All analysis scripts and associated files necessary to run the analysis are archived on Zenodo https://doi.org/10.5281/zenodo.15269136
Access information
The occurrence data for marine animals were downloaded from the Paleobiology Database (www.paleobiodb.org) on 2022-05-23-14-15-24.
Summary of file directory
marine-lbg-ms-dryad-repo
README.txt
-The README file that you are currently reading
PBDB_CSV_data_downloads
2022-05-23-14-15-24 - Subfolder containing occurrence data analysed in the study in CSV files, downloaded from the Paleobiology Database (www.paleobiodb.org) on 23 May 2022. Each CSV file name indicates the taxon set it pertains to.
Arthropoda_data.csv
Bivalvia_data.csv
Brachiopoda_data.csv
Bryozoa_data.csv
Cephalopoda_data.csv
Cetacea_data.csv
Chondrichthyes_data.csv
Chordata^Tetrapoda_data.csv
Conodonta_data.csv
Crinoidea_data.csv
Crustacea_data.csv
Decapoda_data.csv
Echinodermata_data.csv
Echinoidea_data.csv
Gastropoda_data.csv
Graptolithina_data.csv
Ichthyosauromorpha_data.csv
Linguliformea_data.csv
Mollusca_data.csv
Mosasauria_data.csv
Neogastropoda_data.csv
Ostracoda_data.csv
Pinnipedimorpha_data.csv
Porifera_data.csv
Rhynchonelliformea_data.csv
Sauropterygia_data.csv
Sirenia_data.csv
Tetrapoda_data.csv
Thalattosuchia_data.csv
Trilobita_data.csv
Animalia^Chordata_data.csv
Annelida_data.csv
Anthozoa_data.csv
Summary of column heads for PBDB occurrence data CSV files listed above (descriptions taken from PBDB API documentation site at https://paleobiodb.org/data1.2/occs/list_doc.html):
occurrence_no: A positive integer that uniquely identifies the occurrence
record_type: The type of this object: occ for an occurrence.
reid_no: If this occurrence was reidentified, a unique identifier for the reidentification.
flags: This field will be empty for most records. Otherwise, it will contain one or more of the following letters: R This identification has been superceded by a more recent one. In other words, this occurrence has been reidentified. I This identification is an ichnotaxon F This identification is a form taxon
collection_no: The identifier of the collection with which this occurrence is associated.
permissions: The accessibility of this record. If empty, then the record is public. Otherwise, the value of this record will be one of the following:
members: The record is accessible to database members only.
authorizer: The record is accessible to its authorizer group, and to any other authorizer groups given permission.
group(...): The record is accessible to members of the specified research group(s) only.
identified_name: The taxonomic name by which this occurrence was identified. This field will be omitted for responses in the compact vocabulary if it is identical to the value of accepted_name.
identified_rank: The taxonomic rank of the identified name, if this can be determined. This field will be omitted for responses in the compact vocabulary if it is identical to the value of accepted_rank.
identified_no: The unique identifier of the identified taxonomic name. If this is empty, then the name was never entered into the taxonomic hierarchy stored in this database, and we have no further information about the classification of this occurrence. In some cases, the genus has been entered into the taxonomic hierarchy but not the species. This field will be omitted for responses in the compact vocabulary if it is identical to the value of accepted_no.
difference: If the identified name is different from the accepted name, this field gives the reason why. This field will be present if, for example, the identified name is a junior synonym or nomen dubium, or if the species has been recombined, or if the identification is misspelled.
accepted_name: The value of this field will be the accepted taxonomic name corresponding to the identified name.
accepted_attr: The attribution (author and year) of the accepted name
accepted_rank: The taxonomic rank of the accepted name. This may be different from the identified rank if the identified name is a nomen dubium or otherwise invalid, or if the identified name has not been fully entered into the taxonomic hierarchy of this database.
accepted_no: The unique identifier of the accepted taxonomic name in this database.
early_interval: The specific geologic time range associated with this occurrence (not necessarily a standard interval), or the interval that begins the range if late_interval is also given
late_interval: The interval that ends the specific geologic time range associated with this occurrence, if different from the value of early_interval
max_ma: The early bound of the geologic time range associated with this occurrence (in Ma)
min_ma: The late bound of the geologic time range associated with this occurrence (in Ma)
ref_author: The author(s) of the reference from which this data was entered.
ref_pubyr: The year of publication of the reference from which this data was entered
reference_no: The identifier of the reference from which this data was entered
phylum: The name of the phylum in which this occurrence is classified.
phylum_no: The identifier of the phylum in which this occurrence is classified. This is only included with the block classext.
class: The name of the class in which this occurrence is classified.
class_no: The identifier of the class in which this occurrence is classified. This is only included with the block classext.
order: The name of the order in which this occurrence is classified.
order_no: The identifier of the order in which this occurrence is classified. This is only included with the block classext.
family: The name of the family in which this occurrence is classified.
family_no: The identifier of the family in which this occurrence is classified. This is only included with the block classext.
genus: The name of the genus in which this occurrence is classified. If the block subgenus is specified, this will include the subgenus name if any.
genus_no: The identifier of the genus in which this occurrence is classified
subgenus_no: The identifier of the subgenus in which this occurrence is classified, if any.
genus: The name of the genus in which this occurrence is classified. If the block subgenus is specified, this will include the subgenus name if any.
primary_name: The taxonomic name (less species) by which this occurrence was identified. This is often a genus, but may be a higher taxon.
primary_reso: The resolution of the primary name, i.e. sensu lato or n. gen.
subgenus_name: The subgenus name (if any) by which this occurrence was identified
subgenus_reso: The resolution of the subgenus name, i.e. aff. or n. subgen.
species_name: The species name (if any) by which this occurrence was identified
species_reso: The resolution of the species name, i.e. cf. or n. sp.
occurrence_comments: Additional comments about this occurrence, if any.
image_no: If this value is non-zero, you can use it to construct image URLs using taxa/thumb and taxa/icon.
plant_organ: The plant organ, if any, associated with this occurrence. This field will be empty unless the occurrence is a plant fossil.
plant_organ2: An additional plant organ, if any, associated with this occurrence.
abund_value abv abund
The abundance of this occurrence within its containing collection
abund_unit abu abund
The unit in which this abundance is expressed
taxon_environment: The general environment or environments in which this life form is found. See ecotaph vocabulary.
environment_basis: Specifies the taxon from which the environment information is inherited.
motility: Whether the organism is motile, attached and/or epibiont, and its mode of locomotion if any. See ecotaph vocabulary.
motility_basis: Specifies the taxon for which the motility information was set. The taphonomy and ecospace information are inherited from parent taxa unless specific values are set.
life_habit: The general life mode and locality of this organism. See ecotaph vocabulary.
life_habit_basis: Specifies the taxon for which the life habit information was set. See motility_basis above. These fields are only included if the ecospace block is also included.
vision: The degree of vision possessed by this organism. See ecotaph vocabulary.
vision_basis: Specifies the taxon for which the vision information was set. See motility_basis above. These fields are only included if the ecospace block is also included.
diet: The general diet or feeding mode of this organism. See ecotaph vocabulary.
diet_basis: Specifies the taxon for which the diet information was set. See motility_basis above. These fields are only included if the ecospace block is also included.
reproduction: The mode of reproduction of this organism. See ecotaph vocabulary.
reproduction_basis: Specifies the taxon for which the reproduction information was set. See motility_basis above. These fields are only included if the ecospace block is also included.
ontogeny: Briefly describes the ontogeny of this organism. See ecotaph vocabulary.
ontogeny_basis: Specifies the taxon for which the ontogeny information was set. See motility_basis above. These fields are only included if the ecospace block is also included.
ecospace_comments: Additional remarks about the ecospace, if any.
composition: The composition of the skeletal parts of this organism. See taphonomy vocabulary.
architecture jsa ttaph
An indication of the internal skeletal architecture. See taphonomy vocabulary.
thickness: An indication of the relative thickness of the skeleton. See taphonomy vocabulary.
reinforcement: An indication of the skeletal reinforcement, if any. See taphonomy vocabulary.
taphonomy_basis: Specifies the taxon for which the taphonomy information was set. See motility_basis above. These fields are only included if the otaph block is also included.
collection_name: An arbitrary name which identifies the collection, not necessarily unique
collection_subset: If the collection is a part of another one, this field specifies which part
collection_aka: An alternate name for the collection, or additional remarks about it.
lng: The longitude at which the occurrence was found (in degrees)
lat: The latitude at which the occurrence was found (in degrees)
cc: The country in which the collection is located, encoded as ISO-3166-1 alpha-2
state: The state or province in which the collection is located, if known
county: The county or municipal area in which the collection is located, if known
latlng_basis: The basis of the reported location of the collection. Follow this link for a list of basis and precision codes. This field and the next are only included in responses using the pbdb vocabulary.
latlng_precision: The precision of the collection coordinates. Follow the above link for a list of the code values.
n/a prc loc
A two-letter code indicating the basis and precision of the geographic coordinates. This field is reported instead of latlng_basis and latlng_precision in responses that use the compact vocabulary. Follow the above link for a list of the code values.
geogscale: The geographic scale of the collection.
geogcomments: Additional comments about the geographic location of the collection
bin_id_3: The identifier of the level-3 cluster in which the collection or cluster is located
bin_id_2: The identifier of the level-2 cluster in which the collection or cluster is located
bin_id_1: The identifier of the level-1 cluster in which the collection or cluster is located
paleomodel: The primary model specified by the parameter pgm. This field will only be included if more than one model is indicated.
geoplate: The identifier of the geological plate on which the collection lies, evaluated according to the primary model indicated by the parameter pgm. This might be either a number or a string.
paleoage: Indicates whether these paleocoordinates were computed at the early, mid, or late end of the age range for each collection
paleolng: The paleolongitude of the collection, evaluated according to the primary model indicated by the parameter pgm.
paleolat: The paleolatitude of the collection, evaluated according to the primary model indicated by the parameter pgm.
paleoage_b: Alternate age selector
paleolng_b: Paleolongitude corresponding to the alternate age selector
paleolat_b: Paleolatitude corresponding to the alternate age selector
paleoage_c: Alternate age selector
paleolng_c: Paleolongitude corresponding to the alternate age selector
paleolat_c: Paleolatitude corresponding to the alternate age selector
paleomodel2: An alternate model specified by the parameter pgm. This field will only be included if more than one model is indicated. There may also be paleomodel3, etc.
geoplate2: An alternate geological plate identifier, if the pgm parameter indicates more than one model. There may also be geoplate3, etc.
paleoage2: Indicates whether the second paleocoordinates were computed at the early, mid, or late end of the age range for each collection
paleolng2: An alternate paleolongitude for the collection, if the pgm parameter indicates more than one model. There may also be paleolng3, etc.
paleolat2: An alternate paleolatitude for the collection, if the pgm parameter indicates more than one model. There may also be paleolat3, etc.
paleoage2_b: Alternate age selector
paleolng2_b: Paleolongitude corresponding to the alternate age selector
paleolat2_b: Paleolatitude corresponding to the alternate age selector
paleoage2_c: Alternate age selector
paleolng2_c: Paleolongitude corresponding to the alternate age selector
paleolat2_c: Paleolatitude corresponding to the alternate age selector
cc: The country in which the collection is located, encoded as ISO-3166-1 alpha-2
protected: The protected status of the land on which the collection is located, if known.
cx_int_no: The identifier of the most specific single interval from the selected timescale that covers the entire time range associated with the collection or cluster.
time_bins: A list of time intervals into which this occurrence or collection is placed according to the timerule selected for this operation. You can see which rule is selected by including the datainfo parameter. A value of - means that the time range is too large to match any bin under this timerule.
time_contain: List of time intervals into which this occurrence or collection would be placed according to the contain timerule, or - if the range is too large.
time_major: List of time intervals into which this occurrence or collection would be placed according to the major timerule, or - if the range is too large.
time_buffer: List of time intervals into which this occurrence or collection would be placed according to the buffer timerule, or - if the range is too large.
time_overlap: List of time intervals into which this occurrence or collection would be placed according to the overlap timerule.
direct_ma: The direct age (if any) determined for this collection.
direct_ma_error: The uncertainty in the direct age measurement
direct_ma_unit: The unit for the direct age and uncertainty. Values are: Ma, Ka, YBP.
direct_ma_method: The method by which the direct age was obtained.
max_ma: The maximum age (if any) determined for this collection.
max_ma_error: The uncertainty in the maximum age measurement
max_ma_unit: The unit for the maximum age and uncertainty. Values are the same as for adu.
max_ma_method: The method by which the maximum age was obtained.
min_ma: The minimum age (if any) determined for this collection.
min_ma_error: The uncertainty in the minimum age measurement
min_ma_unit: The unit for the minimum age and uncertainty. Values are the same as for adu.
min_ma_method agtm ages
The method by which the minimum age was obtained.
formation: The stratigraphic formation in which the collection is located, if known
stratgroup: The stratigraphic group in which the collection is located, if known
member: The stratigraphic member in which the collection is located, if known
stratscale: The stratigraphic range covered by this collection
zone: The stratigraphic zone in which the collection is located, if known
localsection: The local section in which the collection is located, if known
localbed: The local bed in which the collection is located, if known
localbedunit: The unit of measurement used in the designation of the local bed
localorder: The order in which local beds were described, if known
regionalsection: The regional section in which the collection is located, if known
regionalbed: The regional bed in which the collection is located, if known
regionalbedunit: The unit of measurement used in the designation of the regional bed
regionalorder: The order in which regional beds were described, if known
stratcomments: Additional comments about the stratigraphic context of the collection, if any
lithdescript: Detailed description of the collection site in terms of lithology
lithology1: The first lithology described for the collection site; the database can represent up to two different lithologies per collection
lithadj1: Adjective(s) describing the first lithology
lithification1: Lithification state of the first lithology described for the site
minor_lithology1: Minor lithology associated with the first lithology described for the site
fossilsfrom1: Whether or not fossils were taken from the first described lithology
lithology2: The second lithology described for the collection site, if any
lithadj2: Adjective(s) describing the second lithology, if any
lithification2: Lithification state of the second lithology described for the site. See above for values.
minor_lithology2: Minor lithology associated with the second lithology described for the site, if any
fossilsfrom2: Whether or not fossils were taken from the second described lithology
collection_type: The type or purpose of the collection.
collection_methods: The method or methods employed.
museum: The museum or museums which hold the specimens.
collection_coverage: Fossils that were present but not specifically listed.
collection_size: The number of fossils actually collected.
rock_censused: The amount of rock censused.
collectors: Names of the collectors.
collection_dates: Dates on which the collection was done.
collection_comments: Comments about the collecting methods.
taxonomy_comments: Comments about the taxonomy of what was found.
environment: The paleoenvironment associated with the collection site
environment: The paleoenvironment of the collection site
tectonic_setting: The tectonic setting of the collection site
geology_comments: General comments about the geology of the collection site
pres_mode: This field reports the modes of preservation, occurrence, and mineralization from the 'Preservation' tab on the PBDB collection form.
preservation_quality: Quality of the anatomical detail preserved.
spatial_resolution: Spatial resolution of the preservation information.
temporal_resolution: Temporal resolution of the preservation information.
lagerstatten: Type of lagerstätten found in this collection.
concentration: Degree of concentration of the fossils found in this collection.
orientation: Orientation of the fossil(s) found in this collection.
abund_in_sediment: Abundance in sediment
sorting: Degree of size sorting
fragmentation: Degree of fragmentation
bioerosion: Degree of bioerosion
encrustation: Degree of encrustation
preservation_comments: Preservation comments, if any.
assembl_comps: The size classes found in this collection. The value of this field will be one or more of: macrofossils, mesofossils, microfossils.
articulated_parts: The prevalence of articulated body parts in this collection.
associated_parts: The prevalence of associated body parts in this collection.
common_body_parts: A list of body parts that are common in this collection.
rare_body_parts: A list of body parts that are rare in this collection.
feed_pred_traces: A list of feeding/predation traces found in this collection, if any.
artifacts: A list of artifacts found in this collection, if any.
component_comments: Component comments, if any.
research_group: The research group(s), if any, associated with this collection.
primary_reference: The primary reference associated with this record (as formatted text)
authorizer_no: The identifier of the person who authorized the entry of this record
enterer_no: The identifier of the person who actually entered this record.
modifier_no: The identifier of the person who last modified this record, if it has been modified.
updater_no: The identifier of the person or process who last updated this record, if it has been updated.
authorizer: The name of the person who authorized the entry of this record
enterer: The name of the person who actually entered this record
modifier: The name of the person who last modified this record, if it has been modified.
updater: The name of the person or process who last updated this record, if it has been updated.
created: The date and time at which this record was created.
modified: The date and time at which this record was last modified.
updated: The date and time at which this record was last updated, if it has been updated
We analysed occurrence data for 25,855 genera of marine invertebrate animals (Animalia excluding Chordata) downloaded from the Paleobiology Database (PBDB; paleobiodb.org; a list of data enterers who contributed to this dataset is included as Dataset S1) on 2022-07-12, and made available with our analytical scripts here. Our analytical scripts implement the methods described in Benson et al. (2025; see 'related works').
