Data from: Which arthropods have feet and why? Addressing an argument for aquatic fossil scorpions
Data files
Aug 07, 2025 version files 161.74 KB
Abstract
Scorpions are the first arguably terrestrial animals identifiable in the fossil record, with important implications for arachnid evolution and terrestrial ecosystem development. However, ongoing debate persists on whether some fossil scorpions from the Silurian and Devonian were, in fact, aquatic. This study assesses the claim that a digitigrade (‘footless’) posture implies an aquatic habitat for early scorpions by evaluating the distribution of feet among ambulatory arthropods and how their presence correlates with factors including aquatic versus terrestrial habitat as well as body size, number of legs, cuticle mineralization, and time since terrestrialization. The results demonstrate that these variables in isolation are poor predictors of leg posture, but become highly statistically significant in certain combinations. However, with as many lineages diverging from the usual pattern of plantigrady as adhering to it, predictive power is weak. Therefore, all factors influencing plantigrady must be accounted for when discussing its implications in fossil arthropods, and digitigrady alone does not provide compelling support for an aquatic habitat in fossil scorpions. We further argue that other lines of evidence for an aquatic habitat in Palaeozoic scorpions are equivocal and that the weight of phylogenetic and anatomical evidence supports a terrestrial origin for the total group Scorpiones.
Data_1_-_Body_Size_Measurements.csv
Table containing measured body lengths and body widths (excluding appendages) of all taxa included in this study. Mean log body length and mean log body volume per leg were calculated and averaged over all taxa included in a given phylogenetic tree tip (tips correspond to the tree in files S4 and S5) for use in the phylogenetic general linear regression analysis.
Column descriptions:
Class - Taxonomic class of specimen.
Order - Taxonomic order of specimen.
Family - Taxonomic family of specimen; "?" if unidentified to family level.
Genus - Taxonomic genus of specimen; "?" if unidentified to genus level.
Species - Taxonomic species of specimen; "?" if unidentifed to species level.
Specimen number - An internal index used to order datapoints during figure generation.
Adult - Indicates whether specimen was mature; "Y" for yes (mature), "N" for no (immature), "?" for undetermined.
Sex - Sex of specimen, if known; "M" for male, "F" for female, "?" for undetermined.
Feet - Indicates the leg posture of the specimen (digitigrade or plantigrade) if extant, otherwise that the specimen was a fossil.
Habitat - Indicates the habitat of the specimen (aquatic or terrestrial) if extant, otherwise that the specimen was a fossil.
Cuticle - Indicates whether the specimen cuticle was mineralized or non-mineralized, if extant, otherwise that the specimen was a fossil.
Body length [mm] - The measured body length in millimeters of the specimen (single segment for Arthropleura).
Body width [mm] - The measured body width in millimeters of the specimen (single segment for Arthropleura).
Leg count - The number of walking legs of the specimen (single segment for Arthropleura, considering both diplopodous and non-diplopodous conditions).
Estimated body volume - Rough estimate of body volume in cubic millimeters, obtained by multiplying body length by the square of body volume.
Estimated body volume per leg - Ratio of estimated body volume to leg count in cubic millimeters per leg.
Log body length - Base ten logarithm of body length, not applicable for Arthropleura due to single-segment measurements.
Log body volume - Base ten logarithm of body volume, not applicable for Arthropleura due to single-segment measurements.
Log leg count - Base ten logarithm of leg count calculated for mathematical consistency, not applicable for Arthropleura due to single-segment measurements.
Log body volume per leg - Base ten logarithm of body volume per leg.
Mean log body length - Log body length averaged over all specimens included in the given tip of the tree used for phylogenetic logistic regression, listed only for the first member of the tree tip (other members listed with "-"). Fossils were not included in phylogenetic logistic regression models and are therefore listed as "N/A."
Mean log body volume - Log body volume averaged over all specimens included in the given tip of the tree used for phylogenetic logistic regression, listed only for the first member of the tree tip (other members listed with "-"). Fossils were not included in phylogenetic logistic regression models and are therefore listed as "N/A."
Mean log leg count - Log leg count averaged over all specimens included in the given tip of the tree used for phylogenetic logistic regression, listed only for the first member of the tree tip (other members listed with "-"). Fossils were not included in phylogenetic logistic regression models and are therefore listed as "N/A."
Mean log volume per leg - Log volume per leg averaged over all specimens included in the given tip of the tree used for phylogenetic logistic regression, listed only for the first member of the tree tip (other members listed with "-"). Fossils were not included in phylogenetic logistic regression models and are therefore listed as "N/A."
Tree tip - Tip of the tree used for phylogenetic logistic regression in which the specimen is included. Fossils were not included in phylogenetic logistic regression models and are therefore listed as "N/A."
Notes - General comments on specimens and measurements.
Reference (See Data 2)/Provenance - Source of specimen. Data was collected from observations of live animals observed in the wild (vicinity of Traverse City, MI, or Stanford, CA) or captivity (GT Butterfly House & Bug Zoo in Williamsburg, MI) and from specimens illustrated in the literature. Long-form versions of references are provided separately in file Data 2.
Data_2_-_References_for_Body_Size_Measurements.docx
Complete long-form reference list for body size measurements taken from the literature.
Data_3_-_Tree_File_for_Phylogenetic_Regression.nex
NEXUS file for the phylogenetic tree used in the phylogenetic general linear regression models.
Data_4_-Data_for_Phylogenetic_Regression(Dependent_Habitat).csv
Table of habitats, leg postures, and other predictor variable states for all included taxa for use in phylogenetic general linear regression models with habitat as the dependent variable.
Column descriptions:
Taxon - The taxon represented by a tree tip in the phylogenetic regression analyses.
Feet - The leg posture (digitigrade or plantigrade) exhibited by members of the taxon.
Min Legs - The minimum number of walking legs exhibited by any member of the taxon.
Max Legs - The maximum number of walking legs exhibited by any member of the taxon.
Mean log leg count - Base ten logarithm of leg count averaged over all measured specimens included in the taxon. Corresponds to data values in file Data 1.
Habitat - Indicates whether the habitat of members of the taxon is aquatic (0) or terrestrial (1). This is the dependent variable used in the phylogenetic logistic regression that this file corresponds to; hence, it must be formatted numerically.
Cuticle - Indicates whether the cuticle of members of the taxon is mineralized or nonmineralized.
Mean log Body Length - Base ten logarithm of measured body length in millimeters, averaged over all measured specimens included in the taxon. Corresponds to data values in file Data 1.
Mean log Volume - Base ten logarithm of estimated body volume in cubic millimeters, averaged over all measured specimens included in the taxon. Corresponds to data values in file Data 1.
Mean log Volume Per Leg - Base ten logarithm of the ratio of estimated body volume to leg count, averaged over all measured specimens included in the taxon. Corresponds to data values in file Data 1.
Min Time Since Terrestrialization - The minimum time in millions of years since the taxon became terrestrialized; "0" for aquatic taxa.
**Data_5_-Data_for_Phylogenetic_Regression(Dependent_Posture).csv: The same as file Data 4, but reformatted for analyses that treat leg posture as the dependent variable. Formatting differences include a numerical value for the "Feet" variable (0 for digitigrade, 1 for plantigrade) and a categorical value for the "Habitat" variable (aquatic/terrestrial).
Data_6_-_R_Code_for_Phylogenetic_Regression.R: R code that performs phylogenetic general linear regressions using the hylolm package (Ho et al. 2014). Requires files Data 3, Data 4, and Data 5. Scorpions are removed from the data to avoid circular argumentation. Doing so does not meaningfully change the statistical results.
Data_7_-_Statistical_Results_Comparisons.xlsx: Tables of the statistical results generated by the code in file Data 6. The table on the first page is the same as Table S2 of the Supporting Information for the article. Additional tables contain the same data rearranged and colour-labelled for easy comparison between models.
