Paleogeography modulates marine extinction risk throughout the Phanerozoic
Data files
Feb 09, 2026 version files 289.35 MB
-
computational_details.txt
3.73 KB
-
dat_rotated.csv
264.08 MB
-
Distance_to_track_10_data.csv
2.29 MB
-
Distance_to_track_15_data.csv
2.30 MB
-
Distance_to_track_5_data.csv
2.23 MB
-
Driving_distance_algorithm_10_degrees.R
6.37 KB
-
Driving_distance_algorithm_15_degrees.R
6.37 KB
-
Driving_distance_algorithm_5_degrees.R
6.37 KB
-
extinction_model_code.Rmd
26.05 KB
-
good_dispersing_genera.csv
50.64 KB
-
isolated_coastlines_Kocsisetal._2021.zip
1.24 MB
-
massext_tab.csv
11.47 KB
-
models_koc_10.csv
5.70 MB
-
models_koc_15.csv
5.71 MB
-
models_koc_5.csv
5.65 MB
-
pbdbdata_code_2025.Rmd
15.35 KB
-
Points2nearestcell.Rmd
1.90 KB
-
prepare_extinction_models.Rmd
9.16 KB
-
README.md
9.86 KB
-
Rotation_files.zip
20.21 KB
Abstract
Understanding the factors that have influenced the intensity and selectivity of extinction throughout Earth history is critical for explaining past biodiversity loss and associated implications for the present biotic crisis. Here, we investigate the role of coastline geometry and paleogeographic boundary conditions in shaping extinction risk for shallow-marine-restricted taxa across 540 million years of Earth history. Our findings reveal that interactions between the geographic distributions of taxa and the geometric configurations of continental margins significantly predict relative extinction risk throughout the Phanerozoic: taxa with longer potential dispersal pathways, often associated with east-west oriented coastlines, islands, or inland seaways, consistently exhibit higher extinction risk than taxa with shorter potential dispersal pathways. In contrast to many other predictors of extinction risk, dispersal distance selectivity is amplified during mass extinction events and hyperthermal intervals, suggesting geographic constraints become more important during periods of rapid climate change. Our results provide another potential mechanism for the generally elevated extinction rates during the Paleozoic, an interval characterized by complex inland seas and a preponderance of east-west coastlines. These insights underscore the importance of considering paleogeographic context when interpreting extinction patterns and assessing implications for future biodiversity loss.
Overview
This repository contains all code, data, and computational details used in the analyses for:
Malanoski, C.M., Finnegan, S., Huang, E.C., Blake, L., Mac Niocaill, C., & Saupe, E.E. (2025).
Paleogeography modulates marine extinction risk throughout the Phanerozoic. Science.
This study quantifies how paleogeography influenced marine extinction risk through the Phanerozoic by combining plate-tectonic reconstructions, coastline-derived dispersal metrics, and PBDB occurrence data in stage-resolved extinction models.
Repository Structure
data (Supplement)/
dat_rotated.csv
Distance to track_5_data.csv
Distance to track_10_data.csv
Distance to track_15_data.csv
models_koc_5.csv
models_koc_10.csv
models_koc_15.csv
good_dispersing_genera.csv
massext_tab.csv
code/
Driving_distance_algorithm_5_degrees.R
Driving_distance_algorithm_10_degrees.R
Driving_distance_algorithm_15_degrees.R
pbdbdata_code_2025.Rmd
Points2nearestcell.Rmd
prepare_extinction_models.Rmd
extinction_model_code.Rmd
computational_details/
(Cluster job scripts, runtime details, and environment specifications)
Rotation_files/
(Plate rotation files for each Global Plate Model)
Raw_maps_Kocsisetal.2021/
(Raw paleogeographic map data from Kocsis et al., 2021; download from https://doi.org/10.5281/zenodo.3903163)
isolated_coastlines_Kocsisetal.2021/
(Derived/modified coastline shapefiles generated in this study from the Kocsis et al. maps)
Workflow Summary
1. Coastline Isolation
Raw paleogeographic maps from [Kocsis et al. (2021)] were processed to extract individual coastline polygons.
The original paleogeographic map dataset is available from Zenodo (https://doi.org/10.5281/zenodo.3903163) and should be obtained from that source.
The resulting isolated coastlines (a processed/modified derivative produced by this study), found in isolated_coastlines_Kocsisetal.2021/, form the foundation for spatial modeling of dispersal barriers.
2. Driving Distance Algorithms
Three R scripts —
Driving_distance_algorithm_5_degrees.RDriving_distance_algorithm_10_degrees.RDriving_distance_algorithm_15_degrees.R
— were used to compute the number of “steps” (grid-cell steps) required to traverse 5°, 10°, and 15° across reconstructed paleogeographies.
These models estimate how easily marine organisms could disperse across past continental configurations.
All computations were executed using the Oxford Earth Sciences Goodwin Cluster.
See the computational_details/ folder for runtime specifications, hardware details, and estimated run times.
3. PBDB Data Processing and Rotation
The script pbdbdata_code_2025.Rmd cleans and filters fossil occurrence data from the Paleobiology Database (PBDB).
Occurrences were rotated to paleocoordinates using GPlates v2.3 and rotation files stored in the Rotation_files/ directory.
Output file:
dat_rotated.csv
4. Spatial Matching
Using Points2nearestcell.Rmd, rotated fossil occurrences were matched to the nearest coastline cells based on each paleogeographic model’s resolution.
Outputs include:
Distance to track_5_data.csvDistance to track_10_data.csvDistance to track_15_data.csv
These represent paleogeographic dispersal distances for each occurrence.
5. Model Preparation
The script prepare_extinction_models.Rmd cleans and merges datasets, standardizes variables, and prepares model-ready files for extinction analyses.
Outputs include:
models_koc_5.csvmodels_koc_10.csvmodels_koc_15.csv
6. Extinction Modeling and Analysis
The script extinction_model_code.Rmd runs the final extinction models used in the main text.
These generalized linear mixed models (GLMMs) quantify extinction risk as a function of paleogeographic and ecological predictors.
Results and plots from this script correspond to the figures and tables in the main manuscript.
Additional Data
massext_tab.csv: Table of mass extinction intervals used for stage classification.good_dispersing_genera.csv: List of “good dispersers” excluded in sensitivity analyses.PBDB_citations/: Contains full PBDB contributor and dataset citations, acknowledging the community’s essential data contributions.
Column descriptions for key data files
Below are brief explanations of the column names used in the primary .csv outputs. Only one numerical variant is shown here (e.g., the “5°” files), but the column names are the same for the 10° and 15° versions of each file.
dat_rotated.csv (rotated PBDB occurrences)
collection_no: PBDB collection identifier.collection_name: PBDB collection name/label.accepted_name,accepted_rank: Accepted taxonomic name and rank (PBDB taxonomy resolution).identified_name,identified_rank: Originally identified name and rank (as entered/recorded).early_interval,late_interval: PBDB geologic interval bounds for the collection/occurrence.max_ma,min_ma: Maximum and minimum ages (Ma) associated with the interval assignment.reference_no: PBDB reference identifier.phylum,class,order,family,genus: Taxonomic fields used for filtering, grouping, and modeling.lng,lat: Present-day longitude and latitude (WGS84-style geographic coordinates).paleolng,paleolat: Paleolongitude and paleolatitude for the occurrence under the primary rotation workflow (see code for the specific plate model used for these fields).formation: Formation name (as available in PBDB).lithology1,lithification1: Primary lithology and lithification fields from PBDB.environment: Interpreted depositional environment field from PBDB.zone: Biostratigraphic zone field (if present).clgen: Cleaned/standardized genus label used internally in parts of the workflow.stg: Numeric stage/bin identifier used throughout the repository (matchesmassext_tab.csv).ten: Internal temporal bin label used in parts of the preprocessing workflow (seepbdbdata_code_2025.Rmd/prepare_extinction_models.Rmd).mid: Midpoint age (Ma) for the stage/bin.age: Stage/bin age used in modeling (typically derived frommidand/or interval bounds).plng_pm,plat_pm;plng_mu,plat_mu;plng_go,plat_go;plng_me,plat_me;plng_tc,plat_tc: Paleolongitude/paleolatitude pairs under different Global Plate Models (GPMs). The suffixes correspond to the plate-model abbreviations used in the codebase andRotation_files/.
Distance to track_5_data.csv
Class,Order,Family,Genus: Taxonomic identifiers for the row.stg: Numeric stage/bin identifier.mean_distance_avg: Mean “distance-to-track” (in grid-cell steps) across occurrences/records contributing to that taxon-stage estimate.median_distance_avg: Median “distance-to-track” (in grid-cell steps).Jackknifed_Median_Distance_Avg: Jackknife-estimated median distance metric (used to reduce sensitivity to single influential points).Bootstrapped_Median_Distance_Avg: Bootstrap-estimated median distance metric (distribution-based robustness estimate).
models_koc_5.csv (extinction dataset)
Each row represents a genus-stage observation used in extinction modeling.
ext: Extinction outcome (binary; 1 = extinct in that interval according to the range-based coding used in this study, 0 = not extinct).genus: Genus name (modeling unit).stg: Numeric stage/bin identifier.collection_no,collection_name: PBDB collection metadata carried through joins (useful for traceability).accepted_name,accepted_rank,identified_name: PBDB taxonomy fields retained for provenance and checking.reference_no: PBDB reference identifier.phylum,class,order,family: Higher taxonomy for filtering/grouping/random effects.age: Stage/bin age used in models.plng_pm,plat_pm: Paleocoordinates used for matching occurrences to paleogeographic rasters/coastline cells in this workflow variant.stage_gen: Concatenated stage–genus identifier (used as an internal grouping key in parts of the workflow).mean_distance_avg,median_distance_avg,Jackknifed_Median_Distance_Avg,Bootstrapped_Median_Distance_Avg: Paleogeographic dispersal metrics merged in from the corresponding “Distance to track” outputs.
good_dispersing_genera.csv (dispersal-filter list)
genus,phylum,class,order,family: Taxonomic identifiers for genera classified as “good dispersers” and excluded in sensitivity analyses.
massext_tab.csv (regime definitions)
sys,system: System identifiers (abbreviated and full).series: Series identifier.stage: Stage name (text label).short: Short label used in figures.bottom,top: Lower and upper stage/bin boundaries (Ma).mid: Midpoint age (Ma).dur: Duration (Ma).stg: Numeric stage/bin identifier used throughout the repository.systemCol,seriesCol,col: Color/plotting helper fields used for consistent time-scale graphics.regime: Regime classification (e.g., background vs. mass extinction; see manuscript and code).hyperthermals: Hyperthermal indicator used in extended/sensitivity analyses.
See computational_details/ for a complete session info log and environment configuration.
Contact
For questions or collaborations, please contact:
Cooper M. Malanoski
Department of Earth Sciences, University of Oxford
📧 cooper.malanoski@earth.ox.ac.uk
Last updated: February 2026
