Diversity of organic-walled microfossils in the phosphates of the ca. 1 Ga Diabaig Formation, Torridon Group, NW Scotland
Data files
Jul 01, 2025 version files 417.63 KB
-
README.md
12.64 KB
-
SupData_1_SHERPA_datasets_(v.2).xls
288.26 KB
-
SupData_2_Fossil_assemblages_and_Eohalothece_(v.2).xls
116.74 KB
Abstract
Precambrian organic-walled microfossils preserved in fine-grained sedimentary rocks constitute the earliest fossil record of eukaryotic life. The Mesoproterozoic-Neoproterozoic transition coincided with major innovations in the evolution of early eukaryotes, including the radiation of crown-group lineages, represented in these rocks by candidate red algae, green algae, and fungi. However, the diversity of these early eukaryotes is yet to be fully explored. Here, we present a systematic description of the microfossil assemblage preserved in exceptional detail within sedimentary phosphatic nodules and bands in the Diabaig Formation of the ca. 1 Ga Torridon Group of northwest Scotland. Recent work has highlighted the lacustrine or estuarine nature of its depositional environment and confirmed that these fossils may include the oldest known non-marine eukaryotes. We identify 11 morphotaxa from new collected material, including the new species Minimarmilla multicatenaria gen. et sp. nov, two undoubted eukaryotes, and two probable eukaryotes. The latter include Pterospermopsimorpha sp., and a new network-forming unnamed taxon. These microfossils present an important window on eukaryotic diversification in non-marine aquatic environments during the Mesoproterozoic–Neoproterozoic.
Dataset DOI: 10.5061/dryad.9ghx3ffvp
Description of the data and file structure
"SupData_1_SHERPA_datasets_(v.2).xls" has 14 sheets with the data used for SHERPA software analysis of the cell shape of multiple Eosynechococcus spp. and Eohalothece lacustrina specimens from different studies, including ours. The first sheet contains the final comparison of counted and analysed fossil cells, which was used to create the Supplementary Information Figure 2. The second sheet has the information of each dataset analysed with SHERPA software. The rest of the sheets have the individual results obtained after each SHERPA analysis performed on a given dataset. Each dataset is labelled by study name as shown in the second sheet.
"SupData_2_Fossil_assemblages_and_Eohalothece_(v.2).xls" contains 3 sheets with fossil taxa data from different assemblages and length vs width comparisons of Eosynechococcus spp. and Eohalothece lacustrina cells. The first sheet shows organic-walled microfossils from 10 Meso-Neoproterozoic assemblages and their biological affinity classification (prokaryotes, eukaryotes, or unknown) using "=COUNTIF(range, criteria)". These fossils and references are in Supplementary Information Table 2. The second sheet contains a bar chart of the classification data obtained in sheet 1 and used for Figure 2. The third sheet has length vs width comparisons and preliminary plots used to create Figure 6.
Files and variables
1) File: SupData_1_SHERPA_datasets_(v.2).xls
Sheet 1: "Final comparison_SI Fig 2"
Description: Summary sheet comparing cell shape classifications across all studies (31 columns)
Variables and descriptions:
- Cell shape/Study: Study identifier for row data
- Cylindrical: Count of cells classified as cylindrical shape
- Ellipsoidal: Count of cells classified as ellipsoidal shape
- Cylindrical + tapered: Count of cells with cylindrical base and tapered ends
- Ellipsoidal + tapered: Count of cells with ellipsoidal base and tapered ends
- Tapered: Count of cells classified as purely tapered shape
- Total cells counted: Total number of cells analysed per study
- Cell shape %/Study: Percentage breakdown section header
- Cylindrical (repeated): Percentage of cylindrical cells
- Ellipsoidal (repeated): Percentage of ellipsoidal cells
- Cylindrical + tapered (repeated): Percentage of cylindrical + tapered cells
- Ellipsoidal + tapered (repeated): Percentage of ellipsoidal + tapered cells
- Tapered (repeated): Percentage of tapered cells
Note: The repeated columns correspond to multiple analysis groups in the comparison sheet. These groups are:
- Section 1 (Columns 1-7): Raw counts. So, "Cell shape/Study" + 5 shape categories + total
- Section 2 (Columns 10-15): Percentages. So, "Cell shape %/Study" + 5 shape categories (see corresponding bar chart)
- Section 3 (Columns 22-27): Set of counts for Eohalothece lacustrina (see corresponding chart pie)
- Section 4 (Columns 31-36): Set of counts for Eosynechococcus moorei (see corresponding chart pie)
- Section 5 (Columns 40-45): Set of counts for Eosynechococcus spp. (see corresponding chart pie)
Each section repeats the same 5 cell shape categories (Cylindrical, Ellipsoidal, Cylindrical + tapered, Ellipsoidal + tapered, Tapered) but for different subsets of the 11 studies.
Sheet 2: "Datasets information"
Description: Metadata about each study dataset (7 columns)
Variables and descriptions:
- Study: Reference to the published study/author
- Name given: Internal dataset name or identifier
- Assemblage: Fossil assemblage or formation name
- Paleoenvironment: Ancient environmental setting (marine, lacustrine, etc.)
- Age: Geological age of the samples
- Material: Type of material analysed (e.g., rock, sediment)
- Cell number: Total number of cells/specimens in dataset
Sheet 3: "Templates for SHERPA"
Description: Template shape data used by SHERPA software for classification (12 columns)
Variables and descriptions:
- File: Template image filename
- Folder: Directory location of template
- Area / µm²: Template area in square micrometers
- Perimeter / µm: Template perimeter in micrometers
- Width / µm: Template width in micrometers
- Height / µm: Template height in micrometers
- Width/Height Ratio: Aspect ratio of template shape
- Smoothness: Contour smoothness measure (estimates outline roughness)
- CDF: Convexity Defection Factor (percentage difference between shape and convex hull)
- PCAF: Percent Concave Area Fraction (compares areas of contour and convex hull)
- CHMDF: Convex Hull Maximum Distance Factor (measures convexity defects)
- Template is convex: Boolean indicating if template shape is convex
Sheets 4-14: Individual Study Results
Description: Detailed SHERPA analysis results for individual studies (40-41 columns each)
Variables and descriptions:
1. Basic Object Properties:
- File: Source image filename
- Folder: Directory path of source image
- Area: Object area in square micrometers
- Perimeter: Object perimeter in micrometers
- Width: Object width along major axis in micrometers
- Height: Object height along minor axis in micrometers
- Costae: Distance between costae (experimental feature, not recommended for routine use)
2. SHERPA Processing Parameters:
- Segmentation Method: Algorithm used to separate object from background (Otsu, RATS, Canny, etc.)
- Optimization: Morphological operators applied to repair segmentation faults (Opening, Closing, etc.)
- Template: Best matching template filename
- Best: Best template match identifier
- Hu Match: Hu invariant matching score between object and template
- Standard Deviation: Standard deviation of intensity within inner 50% of object area
- Width/Height Ratio: Aspect ratio of detected object
3. Shape Quality Indicators:
- Contour: Contour quality measure
- Form Factor: Heuristic descriptor (4π × Area / Perimeter²)
- Quality: Overall quality assessment
- Template Match: Template matching quality score
- Rectangularity: How rectangular the shape is
- Compactness: Shape compactness measure (Perimeter² / (4π × Area))
- Ellipticity: How elliptical the shape is
- Triangularity: How triangular the shape is
- Roundness: How round/circular the shape is
4. Convexity Measures:
- Convexity (two columns): Different convexity measurements
- CDF: Convexity Defection Factor (absolute measure)
- PCAF: Percent Concave Area Fraction (absolute measure)
- CHMDF: Convex Hull Maximum Distance Factor (absolute measure)
- CDF (second instance): CDF comparison with template
- PCAF (second instance): PCAF comparison with template
- Compactness (second instance): Compactness comparison with template
- Convexity (second instance): Convexity comparison with template
5. Analysis Results:
- Ranking: Overall ranking index (lower = better quality)
- Export: Boolean flag for data export selection
6. Cell Shape Classification Results:
- Cylindrical (Cyl final.jpg): Classification as cylindrical cell type
- Cylindrical + tapered (Cyltap final.jpg): Classification as cylindrical with tapered ends
- Ellipsoidal (El final.jpg): Classification as ellipsoidal cell type
- Ellipsoidal + tapered (Eltap final.jpg): Classification as ellipsoidal with tapered ends
- Tapered (Tap final.jpg): Classification as purely tapered cell type
- Total cells: Total count for this analysis
7. Study-Specific Sheet Names:
- Strotther&Wellman_Torridon
- Strotther&Wellman_Nonesuch
- Hofmann
- Golubic&Campbell
- Tang et al
- Miao et al
- Shukla et al
- Rodriguez et al_Torridon
- Loron et al
- Knoll et al
- Rodriguez et al_Draken
Notes for DRYAD Data Dictionary:
- µm = micrometers
- CDF, PCAF, CHMDF are morphological analysis parameters from SHERPA software
- Cell shape categories are the main classification outputs
- Some column headers contain line breaks (shown as \n) in the original data
2) File: SupData_2_Fossil_assemblages_and_Eohalothece_(v.2).xls
Sheet 1: "Fossil count for Fig 2"
Description: Compendium of organic-walled microfossils across 10 Meso-Neoproterozoic assemblages with biological affinity classification (12 columns). Legend: Prokaryote (X), Eukaryote (Y), Unknown (Z)
Variables and descriptions:
- Assemblages: Row identifier/fossil taxon names
- Torridon Group Scotland: Fossil counts from Torridon Group assemblage, Scotland
- Lower Shaler Supergroup Canada: Fossil counts from Lower Shaler Supergroup, Canada
- Mbuji-Mayi Supergroup Democratic Republic of Congo: Fossil counts from Mbuji-Mayi Supergroup, DRC
- Atar/El Mreïti Group Mauritania: Fossil counts from Atar/El Mreïti Group, Mauritania
- Lower Madhubani Group India: Fossil counts from Lower Madhubani Group, India
- Bylot Supergroup Canada: Fossil counts from Bylot Supergroup, Canada
- Lakhanda Group Russia: Fossil counts from Lakhanda Group, Russia
- Liulaobei Formation China: Fossil counts from Liulaobei Formation, China
- Gouhou Formation China: Fossil counts from Gouhou Formation, China
- Mirojedikha Formation Russia: Fossil counts from Mirojedikha Formation, Russia
- Organic-walled microfossils: Summary category or total counts
Sheet 2: "Final Bar Chart_Fig 2"
Description: Bar chart data comparing biological affinity classifications across assemblages (11 columns)
Variables and descriptions:
- Assemblage: Geological formation/group identifier
- Lower Shaler Sg: Classification data for Lower Shaler Supergroup
- Liulaobei Fm: Classification data for Liulaobei Formation
- Atar/El Mreïti Gp: Classification data for Atar/El Mreïti Group
- Lower Madhubani Gp: Classification data for Lower Madhubani Group
- Mbuji-Mayi Sg: Classification data for Mbuji-Mayi Supergroup
- Lower Bylot Sg: Classification data for Lower Bylot Supergroup
- Lakhanda Gp: Classification data for Lakhanda Group
- Mirojedikha Fm: Classification data for Mirojedikha Formation
- Torridon Gp: Classification data for Torridon Group
- Gouhou Fm: Classification data for Gouhou Formation
Sheet 3: "Eosyne comparison_Fig 6"
Description: Cell length vs width comparison data for Eosynechococcus and Eohalothece taxa (25 columns)
Variables and descriptions:
Section 1: General species data (Columns 1-9)
- Species name: Taxonomic name of the species
- Length: Cell length measurements in micrometers
- Width: Cell width measurements in micrometers
- Average (μm): Average dimensions in micrometers
- Average l/w ratio: Average length to width ratio
- No. of measured specimens: Number of specimens measured for each species
- Reference number: Numerical reference identifier
- Reference: Literature citation/source reference
Section 2: Comparative analysis data (Columns 11-15)
- Species name (repeated): Species identifier for comparison section
- Length (repeated): Length data for comparative analysis
- Width (repeated): Width data for comparative analysis
- Number: Specimen count or identifier number
- Aspect ratio l/w: Length to width aspect ratio
Section 3: Eohalothece lacustrina comparative data (Columns 23-25)
- E. lacustrina (Nonesuch): Data for E. lacustrina from Nonesuch Formation
- E. lacustrina (Torridon): Data for E. lacustrina from Torridon Group
- E. lacustrina (This study): Data for E. lacustrina from current study
Notes for DRYAD Data Dictionary:
- Sg = Supergroup, Gp = Group, Fm = Formation (geological unit abbreviations)
- μm = micrometers
- l/w = length to width ratio
- Biological affinity classifications likely include prokaryotes, eukaryotes, unknown
- Data processed using Excel COUNTIF function for taxonomic counting
- References correspond to Supplementary Information Table 2
Code and software
No code used, only SHERPA software.
Access information
Other publicly accessible locations of the data: Contact the authors at any time.
