Quantifying the global biodiversity of Proterozoic eukaryotes

Data files

Nov 04, 2024 version files 3.74 MB

Data_S1.xlsx

515.99 KB
Data_S2_20241102.xlsx

3.22 MB
README.md

10.38 KB

Abstract

The global diversity of Proterozoic eukaryote fossils is poorly quantified despite its fundamental importance to the understanding of macroevolutionary patterns and dynamics on the early Earth. Here we report a new construction of fossil eukaryote diversity from the Paleoproterozoic to early Cambrian based on a comprehensive data compilation and quantitative analyses. The resulting taxonomic richness curve verifies Cryogenian glaciations as a major divide that separates the “Boring Billion” and Ediacaran Period, with the former characterized by a remarkable stasis and the latter by greater diversity, more rapid turnover, and multiple radiations and extinctions. These contrasting evolutionary patterns and dynamics provide a framework to test competing hypotheses on biosphere and geosphere co-evolution in the Proterozoic Eon.

https://doi.org/10.5061/dryad.8w9ghx3w6

Data S1: Raw data for all sections and the associated bibliography.

Data S2: Data and analytical results are plotted in figures of the main text and supplementary materials.

Data S3: All R codes and a package of ready-to-run CONOP files. COMAN files prepared for the CONOP runs are in the folder named ready_to_run_CONOP. The CONOP.Para file is ready to run in the Windows system as long as the program and all COMAN files are in the same folder. All R codes associated with the analyses of this study are also included in this folder as an independent file.

Description of the data and file structure

Data S1: This is a Microsoft Excel file. The first sheet is a Table of Contents that lists all 263 sections in geochronological order. The second sheet contains the raw data used in CONOP analysis. The third sheet contains a list of bibliographies.

Section name: Each section name contains several elements related to the section, including era or period, country or region, abbreviation of the era or period, full name or abbreviation of the rock unit, and section location.
Taxa as published or non-fossil events: Taxa names or non-fossil events identified and reported in the original literature.
Taxa accepted: Taxa names accepted in our dataset after scrutiny by the authors.
Depth (m): Stratigraphic distance of each taxon or non-fossil event in the section from the measurement starting point (i.e., 0 meters). Positive and negative values indicate the stratigraphic levels are above and below the measurement starting point, respectively.
Eukaryotic groups: For the convenience of communication, all eukaryotic taxa in our dataset are categorized into 4 groups, i.e., unicellular eukaryotes, multicellular/coenocytic eukaryotes, animals, and trace fossils.
possible or likely eukaryote: In order to test the influence of taxa with controversial eukaryotic interpretations, all taxa in our dataset are tagged as either likely eukaryotes (i.e., uncontested eukaryotes) or possible eukaryotes (i.e., eukaryotic interpretation debated or controversial).
References: Publications where the relevant events were reported. Full information of these publications are found in the bibliography sheet.
n/a: not applicable (for example, the variable "Taxa accepted" does not apply to non-fossil events).

Data S2: This is a Microsoft Excel file that contains all data plotted in the figures.

Shading: In all data sheets, gray shading is used to differentiate data plotted in different curves or different subfigures.
genus diversity: Genus-level richness at each temporal level. If a temporal level is listed in multiple rows, richness data in the last row are used in the plot.
species diversity: Species-level richness at each temporal level. If a temporal level is listed in multiple rows, richness data in the last row are used in the plot.
median_age (Ma): The 50th percentile of the posterior distribution in the Bchron Bayesian age-depth model.
CI_plus (Ma): Upper margin error for the calculation of 95% confidence interval (i.e., 97.5th percentile minus 50th percentile).
CI_minus (Ma): Lower margin error for the calculation of 95% confidence interval (i.e., 50th percentile minus 2.5th percentile).
Species diversity used in cumulative diversity plot: Species diversity was independently calculated for each of the four eukaryotic groups and then stacked on top of each other to show the cumulative diversity on the same timeline.
95% confidence interval from bootstrapping: A 95% confidence interval which was calculated from bootstrap analyses. Only non-singleton fossiliferous temporal levels are tabulated in the data table.
origination rate: The origination rate was calculated from the number of new species in each time bin.
extinction rate: The extinction rate was calculated from the number of extinct species in each time bin.
proportional origination rate: The proportional origination rate was calculated by dividing the number of new species by the total number of species in the corresponding time bin, and reported as a fraction per bin.
proportional extinction rate: The proportional extinction rate was calculated by dividing the number of extinct species by the total number of species in the corresponding time bin, and reported as a fraction per bin.
proportional diversification rates: The proportional diversification rate is the proportional origination rate minus the proportional extinction rate at each temporal level.
proportional turnover rates: The proportional turnover rate is the proportional origination rate plus the proportional extinction rate at each temporal level.
mean longevity (Myr): The mean longevity represents an aggregation comprising the mean longevity of the cohort of species at each temporal level.
Mean longevities of new species (Myr): The mean longevity of only the species that originated at the respective temporal level.
Mean longevity of extinct species (Myr): The mean longevity of only the species that went extinct at the respective temporal level.
species/genus ratio: The species/genus ratio was generated by dividing the species richness by the genus richness at each temporal level.
section no.: Short for section number, which represents the section ID in CONOP analysis.
longitude: The longitude of each section.
latitude: The latitude of each section.
nodes: Nodes represent sections in this study.
edges: Edges represent the relationship or interaction between two nodes in a network analysis. In this study, edges represent shared events between two sections.
number of events: the number of fossil and non-fossil events reported in each section.
nominal age (geological interval): Estimated age for the majority sequence of each section.
X1 and X2: Both represent sections of this study with at least one shared taxon with any other sections.
number of shared events: number of shared events between two studied sections.
Mean longevity of total species with possible eukaryotes (Myr): The mean longevity of all species in our dataset.
Mean longevity of new/extinct species with possible eukaryotes (Myr): The mean longevity of only the species that originated/became extinct at the respective temporal level.
Mean longevity of total species without possible eukaryotes (Myr): The mean longevity of species with uncontested eukaryotic affinities in our dataset.
Mean longevity of new/extinct species without possible eukaryotes (Myr): The mean longevity of only the species with uncontested eukaryotic affinities that originated/became extinct at the respective temporal level.
origination/extinction rate per 0.5/1/2/4/10 Myr: The number of originations/extinctions (i.e., the number of new/extinct species) in each 0.5/1/2/4/10-million-year time bin.
test_#_trial number: 1M/1.2M/1.5 M: independent CONOP runs with the number of trials set at 1 million, 1.2 million, or 1.5 million, respectively.
sample size: Total number of species FAD, LAD, and range-through occurrences in rarefaction analyses.
rarefied richness: Species richness resulting from rarefaction analysis at a given resampling size.
standardized richness: Species diversity estimated by the per-section diversity multiplied by the ratio between average total diversity and average per-section diversity in a 10-myr sliding window.
total species richness: Species diversity of the full dataset at the respective temporal level.
number of supporting sections: The number of sections ranging through the respective temporal level.
culled dataset without sp.: The dataset excluding taxa with ambiguous species assignments (i.e., open nomenclatures such as Genus sp.).
independent CONOP run with culled dataset: An independent CONOP analysis using the culled dataset with ambiguous species assignments removed.
total species diversity of possible eukaryotes: The frequency of species tagged as possible eukaryotes at each temporal level.
total species diversity of likely eukaryotes: The frequency of species tagged as likely eukaryotes at each temporal level.
diversity excluding taxa with stratigraphic range <0.1 Myr: Species diversity after removing taxa with stratigraphic ranges less than 0.1 million years in order to alleviate the Lagerstӓtte effect.
normalized diversity: Species richness values that are linearly normalized to the maximum (set to be 1.0).
coverage area (km^2): Area coverage of siliciclastic rocks from Macro.strat.org.
normalized coverage area: Area coverage values that are linearly normalized to the maximum (set to be 1.0).
diversity_integer: Species diversity at each integer million years (e.g., 600 Ma and 599 Ma).
normalized diversity_integer: Normalized species diversity at each integer million years.
coverage_integer (km^2): Area coverage of siliciclastic rocks at each integer million years.
normalized coverage_integer: Normalized coverage area at each integer million years.
id: The event id in CONOP analysis.
age (Ma): Reported radiometric age.
uncertainty (Ma): The uncertainty of radiometric ages in million years ago (Ma).
n/a: not applicable (e.g., proportional origination or extinction rates are not applicable when standing diversity is zero).

Data S3: This is a zip file that contains all R codes and a package of ready-to-run CONOP files. All data files in the present folder can be opened using regular text editors, such as Notepad in Windows environment and TextEdit in MacOS. These data files have been formatted by the program "CONMAN" and are ready to run with CONOP.Para in Windows environment (but not run in MacOS). If you would like to play with CONOP.Para, make sure all the files and the program CONOP.Para are stored in the same folder, then double click the program CONOP.Para V2.1.10-Exclude-Serial-ZeroMean-0601. Remember to adjust the parameter setting in conop9.cfg, which sets up the environment for CONOP computations.

Sharing/Access information

N/A

Code/Software

Data S3 also contains the R code used in this study.