Plankton and benthic foraminiferal dataset for the study of the Eocene-Oligocene transition
Data files
Mar 25, 2024 version files 1.76 MB
-
EOT_forams_Readable.csv
1.76 MB
Jul 29, 2025 version files 1.77 MB
-
EOT_forams_Readable.csv
1.77 MB
-
README.md
5.09 KB
Abstract
The Eocene–Oligocene transition (EOT) was the crucial turning point when Earth’s climate shifted to its current icehouse state. Understanding how the marine biosphere responded during this transition is not well-constrained, appearing as a simple extinction pulse in low temporal resolution global compendia. Here we designed a novel AI-inspired metaheuristics algorithm to construct a high-resolution global species richness history across the EOT for the rich foraminifera fossil record with an imputed ~29,000-year resolution. The revealed diversity dynamics are complex and differ for each foraminiferal group with distinct ecology. Planktonic and shallow-water larger benthic foraminifera show steady diversity levels in the early phases of the transition in the latest Eocene after a long-term reduction, while the deeper-water small benthic foraminifera radiate remarkably and decline over the same interval. In the earliest Oligocene, the planktonic and larger foraminifera suffered major species losses coincident with the first continental-scale ice sheet forming on Antarctica, while small benthic foraminifera diversity held steady, followed by an accelerating lowering as the early Oligocene proceeded. These findings reveal complicated and ecologically differentiated environment-life processes, indicating the importance of high-resolution temporal data for dissecting out ecological responses to major environmental changes.
https://doi.org/10.5061/dryad.jh9w0vtk5
We have submitted our raw data in both computable and readable formats, including a CSV file titled "EOT_forams_Readable.csv". Additionally, we have compiled a comprehensive package, "EOT_forams_CONOP_r1.zip", which encompasses the computable dataset ("EOT_forams_Computable.csv"), data transfer software ("SinoCor2CONOP.exe"), software for CONOP configuration and initialization ("CONMAN.exe" and "CONOPDataInitialization-V2.exe"), and the CONOP.EA packages ("CONOP.EA_executable" and "CONOP.EA_source code"), the benchmark results file ("EOT_forams_CONOP_r1\CONOP.EA_executable\data\CONOP benchmark result.xlsx"), and documentation (README.md and README.txt). This package is for analyzing foraminiferal data during the Eocene-Oligocene transition (EOT) using the CONstrained OPtimization with an evolutionary algorithm (CONOP.EA) computational approach. This collection enables researchers to achieve high-resolution insights into foraminiferal richness changes across this significant geologic period. To complement the dataset and software tools, a demonstration package is included within "EOT_forams_CONOP_r1.zip" (refer to the "CONOP.EA_executable" section), providing examples of computed analyses ("CONOP benchmark result.xlsx" and CONOP.EA files) to aid users in replicating the study's results.
Description of the data and file structure
EOT_forams_Computable.csv and EOT_forams_Readable.csv (dataset)
The files "EOT_forams_Readable.csv" and "EOT_forams_Computable.csv" represent two versions of the same raw dataset, each formatted to meet specific requirements. "EOT_forams_Readable.csv" adheres to Dryad's guidelines, ensuring that there are no empty cells within the dataset ('N/A' and 'null' are used for formatting purposes). On the other hand, "EOT_forams_Computable.csv" is optimized for CONOP computation, with a specific format that accommodates empty cells. Despite these differences in handling empty cells, both files contain identical foraminiferal range data. The dataset is organized such that rows correspond to species names, while every pair of columns indicates the first and last appearance positions (base and top) of the species in each section.
CONOP benchmark result.xlsx
Sp_num: code referring to a species
Event_type: first appearance or last appearance (0 or 1)
Level_placed: position placed (ordering) of a bioevent in the CONOP-derived composite section
Sample_point: Assembled sample level (ordering) from the CONOP-derived composite sequence
Age_Ma: Calibrated age (Ma) for each sample point
Species_Richness: Calculated foraminiferal species richness based on the unbinned method
Sharing/Access information
Data was derived from the following sources:
- OneStratigraphy Database (http://onestratigraphy.ddeworld.org/)
Code/Software
SinoCor2CONOP.exe
This software is used for converting the SinoCor formatted file into a set of CONOP-ready files. Place the SinoCor file in the same directory as the utility to ensure smooth processing. Upon completion, a new folder named after the current operation time will be created, containing essential files such as ".sec" and ".dic" required for further analysis.
CONMAN and CONOPDataInitialization-V2
These programs can configure the CONOP files before initiating the main CONOP computation
- In "CONMAN", follow the "EXPORT" menu instructions sequentially.
- In "CONOPDataInitialization-V2", select the appropriate options to filter data before clicking the "process" button.
CONOP.EA_source code and CONOP.EA_executable
These packages include source code and demo files of "CONOP.EA". Please see "ReadMe.txt"/"ReadMe.md" inside the packages for instructions.
Version changes
26-July-2025: Further validated taxonomic information by three foraminiferal experts: Prof. Bridget Wade (PF), Prof. Laia Alegret (SBF), and Prof. Qinghai Zhang (LBF and SBF). Updated fossil counts in both EOT_forams_Computable.csv and EOT_forams_Readable.csv, now totaling 1,269 species and 18 magnetochrons across 161 published sections. To enhance compatibility and usability, we've updated the software package, now named "EOT_forams_CONOP_r1.zip", in collaboration with Prof. Chao Qian and his students, Ke Xue, Yuchang Wu, and Rongxi Tan. This updated package contains two main folders:
- "CONOP.EA_executable": Includes executable programs ready for immediate use.
- "CONOP.EA_source code": Contains the source code for transparency and customization.
Two execution modes are provided:
- Optimization from initial input data.
- Optimization from pre-existing high-quality solutions (benchmark data included).
Default parameter settings have been configured to facilitate quick initialization and data loading. Users can easily adjust these settings in the "configs\config.yaml" file to enable more intensive analyses.
Data used in this study were manually collected from peer-reviewed publications. Each data record includes clear metadata linking back to the original source in the OneStratigraphy database. Any errors were corrected (e.g., spelling mistakes in species names), and missing information was filled in (e.g., latitude and longitude data). We selected sections/sites containing foraminifera occurrences from the Eocene to the Oligocene. The raw dataset contained 13,138 local bioevents records (i.e., first and last appearance records) and ~60,000 occurrences of 2,988 taxonomic units from 163 published stratigraphic sections, encompassing both calcareous and agglutinated foraminifera. These sections, including drill cores and outcrops, are widely distributed in the present oceans and continents such as Europe, Africa, and Asia. The dataset was first cleaned by excluding open nomenclature, such as sp./spp. (622), aff. (63), question marks for species names (6). Nevertheless, the conferring species (cf.; 175) and the group species (ex gr.; 25) were preserved and assigned to the referenced species. Taxonomic assignments below the species level (i.e., subspecies and variety) were mostly integrated to the species level. All non-foraminifera fossils were removed. The dataset after cleaning was thoroughly examined and verified against other independent data sources, including taxonomic atlases, foraminiferal databases (Mikrotax and WoRMS), and related taxonomic references, and further verified and resolved by a group of foraminiferal taxonomic experts for correctness and consistency: Bridget Wade (PF), Laia Alegret (SBF), Qinghai Zhang (LBF and SBF), and Peiyue Fang (PF, LBF and SBF).
In the present dataset, the drill core Hole 647A has been studied repeatedly, focusing on both SBF and PF for variable use, such as testing biotic response to EOT, studying high-latitude deep-water sedimentary sequence, and stratigraphic correlation. The three reports were integrated into one section by depth.
The final dataset after data cleaning and verification included 9,032 local first and last occurrence records of 1,269 species in 161 published sections.
The Constrained Optimization with an Evolutionary Algorithm (CONOP.EA) compositing method is used to integrate the local biostratigraphic data from all 161 sections/sites and to correct regional diachronism caused by migration, fossil preservation, and sampling biases.