Data and code from: Alternative pathways into the deep sea: Patterns in Bivalvia
Data files
Nov 04, 2025 version files 3.45 MB
-
all_bivalve_depths.csv
573.41 KB
-
allBivalves_phyloD_FritzAndPurvisD_PresenceDeep.R
6.50 KB
-
allBivalvesPagelsLambdaProportionDeep.R
4.69 KB
-
bivalve-family-tree-budding.tre
5.11 KB
-
Fig1_FigS5_phylobathyMINandMAXtrees.R
34.66 KB
-
Fig2_FigS3_MytLucDepthTransitions.R
19.71 KB
-
Fig3_allBivalvesDeepShallowBarPlotsSimulation.R
18.55 KB
-
FigS4_3depthCat_latCatBars.R
6.02 KB
-
Fisher_ChiPermutationTests_LatCat_LucinidCloseRelatives.R
12.49 KB
-
FisherTests_LatCat_FossilLucinidCloseRelatives.R
4.60 KB
-
fossil_data.csv
25.64 KB
-
KS_Test_MaxDepthLucinids_CloseRelatives.R
13.90 KB
-
lucinid_GenBank_Accession_Table.txt
9.55 KB
-
lucinid_iqtree_run.slurm
351 B
-
lucinidae_5gene_100boot.best_scheme.nex
260 B
-
lucinidae_5gene_partitions.nex
146 B
-
lucinidae_5geneConcat.phy
583 KB
-
lucinidae_depth2025.csv
100.45 KB
-
lucinidae_noVoucher_5gene_Jan2025_100boot.contree
8.56 KB
-
mytilidae_10gene_100boot.best_scheme.nex
412 B
-
mytilidae_10gene_partitions.nex
271 B
-
mytilidae_10geneConcat.phy
1.43 MB
-
mytilidae_depth2025.csv
80.80 KB
-
mytilidae_iqtree_run.slurm
354 B
-
mytilidae_noVoucher_10gene_Jan2025_100boot.contree
10.48 KB
-
mytlidae_GenBank_Accession_Table.txt
16.06 KB
-
README.md
7.78 KB
-
SupplementaryDataset_S1.xlsx
483.83 KB
Dec 05, 2025 version files 3.45 MB
-
all_bivalve_depths.csv
573.40 KB
-
allBivalves_phyloD_FritzAndPurvisD_PresenceDeep.R
6.50 KB
-
allBivalvesPagelsLambdaProportionDeep.R
4.69 KB
-
bivalve-family-tree-budding.tre
5.11 KB
-
Fig1_FigS5_phylobathyMINandMAXtrees.R
34.66 KB
-
Fig2_FigS3_MytLucDepthTransitions.R
19.71 KB
-
Fig3_allBivalvesDeepShallowBarPlotsSimulation.R
18.55 KB
-
FigS4_3depthCat_latCatBars.R
6.02 KB
-
Fisher_ChiPermutationTests_LatCat_LucinidCloseRelatives.R
12.49 KB
-
FisherTests_LatCat_FossilLucinidCloseRelatives.R
4.60 KB
-
fossil_data.csv
25.64 KB
-
KS_Test_MaxDepthLucinids_CloseRelatives.R
13.90 KB
-
lucinid_GenBank_Accession_Table.txt
9.55 KB
-
lucinid_iqtree_run.slurm
351 B
-
lucinidae_5gene_100boot.best_scheme.nex
260 B
-
lucinidae_5gene_partitions.nex
146 B
-
lucinidae_5geneConcat.phy
583 KB
-
lucinidae_depth2025.csv
100.45 KB
-
lucinidae_noVoucher_5gene_Jan2025_100boot.contree
8.56 KB
-
mytilidae_10gene_100boot.best_scheme.nex
412 B
-
mytilidae_10gene_partitions.nex
271 B
-
mytilidae_10geneConcat.phy
1.43 MB
-
mytilidae_depth2025.csv
80.80 KB
-
mytilidae_iqtree_run.slurm
354 B
-
mytilidae_noVoucher_10gene_Jan2025_100boot.contree
10.48 KB
-
mytlidae_GenBank_Accession_Table.txt
16.06 KB
-
README.md
7.78 KB
-
SupplementaryDataset_S1.xlsx
482.66 KB
Abstract
Dataset DOI: 10.5061/dryad.8931zcs4m
SupplementaryDataset_S1.xlsx
All data used in this study, to the exclusion of the molecular data, can be found in this file. The first sheet contains a field key that explains the contents of all other sheets. The second and third sheets present the taxonomic information, bathymetric depths, depth categories, and depth references for all species in Mytilidae and Lucinidae, respectively. The third sheet also includes the biogeographic provinces (according to Huber 2015) that lucinid species occur in, the corresponding climatic categories (tropical vs extratropical vs both), and whether each species was classified as a close relative of a deep-sea taxon under our strict and broad schemes (see Methods). The fourth sheet provides binary depth categories for all bivalve species. The fifth sheet includes the climatic categories for the first fossil occurrences of lucinid genera and the sixth sheet lists the relevant references.
mytilidae_depth2025.csv
CSV file containing the data from the second sheet (“Mytilidae”) of Supplementary Dataset S1, used in relevant R scripts.
lucinidae_depth2025.csv
CSV file containing the data from the third sheet (“Lucinidae”) of Supplementary Dataset S1, used in relevant R scripts.
all_bivalve_depths.csv
CSV file containing the data from the fourth sheet (“bivalve_depths”) of Supplementary Dataset S1, used in relevant R scripts.
fossil_data.csv
CSV file containing the data from the fifth sheet (“LucinidFirstOccurrencesTropical”), used in FisherTests_LatCat_FossilLucinidCloseRelatives.R.
Fig1_FigS5_phylobathyMINandMAXtrees.R
Script for creating Figs. 1 and S5, which respectively depict the minimum and maximum depths for Mytilidae and Lucinidae.
Fig2_FigS3_MytLucDepthTransitions.R
Script for the simulation and permutation tests depicted in Figs. 2 and S3. This script includes the calculation of the number and relative frequency of shallow to deep transitions that occurred in Mytilidae and Lucinidae, based on stochastic character mapping analyses (Fig. 2C). It also includes the generation of 1000 discrete character histories simulated under the state transition rate matrices observed for Mytilidae (Fig. 2A) and Lucinidae (Fig. 2B). Finally, it includes a permutation analysis wherein the number of deep and shallow tips were fixed to the observed values but the transition rate matrices were allowed to vary (Fig. S3).
Fig3_allBivalvesDeepShallowBarPlotsSimulation.R
Script for plotting the phylogenetic distribution of shallow- and deep-sea endemic species across bivalve families, and for comparing the observed number of deep-sea endemics per family to a stochastic expectation. It includes a second simulation for comparing the observed number of genera within families that have at least one deep-sea endemic species to a stochastic expectation. All code for producing Fig. 3 is found in this script.
FigS4_3depthCat_latCatBars.R
Script necessary for producing Fig. S4, depicting the phylogenetic distributions of minimum bathymetry and climatic categories for Mytilidae and Lucinidae.
KS_Test_MaxDepthLucinids_CloseRelatives.R
Script for calculating median minimum depths for two groups of shallow-water lucinids, defined in two ways (see Methods): 1.) species that are closely related to deep-sea endemics and 2.) all other shallow-water lucinids. The script also includes one-sided Kolmogorov-Smirnov tests to assess whether lucinid deep-sea endemic species are more likely to have close relatives reaching greater depths than other shallow-water species.
Fisher_ChiPermutationTests_LatCat_LucinidCloseRelatives.R
Script to conduct Fisher’s Exact Tests using count data to examine if close relatives of deep-sea endemics in Lucinidae are more likely to occur in extratropical waters than other lucinid species. It also checks if the distribution of species across climatic zones differs between the two groups using Pearson’s Chi squared tests based on expected frequencies. Finally, it includes one- and two-sided permutation tests to determine if the patterns observed differ from random expectations; the one-sided test checks if close relatives of deep-sea endemics have a higher proportion of extratropical species than expected if there were no association between climate category and phylogenetic relatedness.
FisherTests_LatCat_FossilLucinidCloseRelatives.R
Script using fossil_data.csv to conduct Fisher’s Exact Tests examining if lucinid genera with extant deep-sea endemics were more likely to have first appeared as fossils in extratropical waters (counts found in Table S3).
bivalve-family-tree-budding.tre
Family-level bivalve phylogeny from Crouch et al. 2021 (https://doi.org/10.1098/rspb.2021.2178). Used in all scripts with “allBivalves” in the title.
allBivalvesPagelsLambdaProportionDeep.R
Script checking if there is phylogenetic signal (Pagel's lambda) in the all-bivalve tree for the proportion of deep sea species in a family.
allBivalves_phyloD_FritzAndPurvisD_PresenceDeep.R
Script checking if there is phylogenetic signal (Fritz & Purvis' D) in the all-bivalve tree for presence/absence of species that are exclusively deep-sea.
mytlidae_GenBank_Accession_Table.txt
NCBI GenBank Accession numbers used for all ten genes included in the alignment file mytilidae_10geneConcat.phy.
mytilidae_10geneConcat.phy
Concatenated alignment of sequences for ten genes (12S, 16S, cytochrome B, cytochrome oxidase 1, ATP synthase subunit 6, NADH dehydrogenase subunits 2 and 4, histone 3, 18S and 28S) for Mytilidae.
mytilidae_10gene_partitions.nex
Data blocks defining start and end positions of each gene within the concatenated alignment for Mytilidae. Used by IQ-TREE2 to determine partitioning schemes and best-fit substitution models for phylogenetic inference.
mytilidae_10gene_100boot.best_scheme.nex
Partitioning scheme and associated substitution models selected by PartitionFinder2 as the best fit for the concatenated mytilid alignment.
mytilidae_iqtree_run.slurm
IQ-TREE2 code for inferring the maximum likelihood tree depicted in Fig. S1.
mytilidae_noVoucher_10gene_Jan2025_100boot.contree
Consensus tree inferred with IQ-TREE2 using the concatenated mytilid alignment.
lucinid_GenBank_Accession_Table.txt
NCBI GenBank Accession numbers used for all five genes included in the alignment file lucinidae_5geneConcat.phy.
lucinidae_5geneConcat.phy
Concatenated alignment of sequences for five genes (16S, cytochrome B, cytochrome oxidase 1, 18S and 28S) for Lucinidae.
lucinidae_5gene_partitions.nex
Data blocks defining start and end positions of each gene within the concatenated alignment for Lucinidae. Used by IQ-TREE2 to determine partitioning schemes and best-fit substitution models for phylogenetic inference.
lucinidae_5gene_100boot.best_scheme.nex
Partitioning scheme and associated substitution models selected by PartitionFinder2 as the best fit for the concatenated lucinid alignment.
lucinid_iqtree_run.slurm
IQ-TREE2 code for inferring the maximum likelihood tree depicted in Fig. S2.
lucinidae_noVoucher_5gene_Jan2025_100boot.contree
Consensus tree inferred with IQ-TREE2 using the concatenated lucinid alignment.
The following text is derived from the Methods found in the associated paper's main text and supplementary materials. References (some of which pertain to justifying methods) have been removed, but may be found in their respective texts.
Database of marine bivalve taxa and bathymetriesWe used a comprehensive dataset of 8402 extant marine bivalve species compiled from Huber 2015 and the subsequent literature (Supplementary Dataset S1). For this dataset, subgenera were elevated to "operational genera" to increase sampling of hypothesized monophyletic groups that can be analyzed for their differences in bathymetric occurrences. Owing to local topological uncertainties, we classified bivalve taxa into 86 operational families for the 100 taxonomic families represented in the initial Crouch et al. (2019) dataset, as follows: Neilonellidae, Tindariidae, Sareptidae and Yoldiidae were treated as Malletiidae sensu lato; Bathyspinulidae, Lametilidae, Phaseolidae, and Siliculidae were treated as Nuculanidae sensu lato; Nucinellidae was treated as Solemyidae; Basterotiidae, Galeommatidae, and Lasaeidae were treated as Galeommatoidea; Cyamiidae, Gaimardiidae, Galatheavalvidae, Sportellidae was treated as Cyamioidea; Xylophagaidae was treated as Teredinidae, and Condylocardiidae was treated as Carditidae. Two entirely shallow-water families, Cardiliidae (4 species) and the monospecific Clistoconchidae, were dropped from the dataset due to uncertainty in their phylogenetic placements, yielding a final dataset of 8327 species classified in 84 operational families.
We used the time-calibrated, family-level phylogeny of Crouch et al. in analyses that spanned Bivalvia. Species with minimum depths < 200 m were considered “shallow” and species with minimum depths > 200 m were “deep” (i.e., “deep-sea endemics”; Supplementary Dataset S1); the attenuation of primary productivity below this depth makes it a biologically meaningful boundary between bivalve faunas. Maximum depths are more likely to be biased by variations in sampling intensity or stray specimens.
Phylogenetic inference for Mytilidae and LucinidaeWe used the taxise function in the PhyLoTaR pipeline to produce a list of all daughter NCBI taxon IDs from a local copy of the NCBI GenBank nucleotide database (accessed January 2024) for Lucinidae and Mytilidae, using a custom perl script to extract species IDs. We cross-referenced this taxon list with our inventory of described mytilid and lucinid genera (see main text for citations), adding sequences from GenBank not discovered via the PhyLoTaR pipeline, and pruning invalid taxa. For each taxon, we combined sequences into a single fasta file, removed duplicate accessions, and parsed sequences by loci using SuperCrunch and a custom locus search term file.
Using SuperCrunch, we assigned reference sequences for the two taxonomic families to perform similarity filtering of the parsed, locus-specific files, aiming to maximize phylogenetic coverage. For each mytilid and lucinid subfamily, one or two reference sequences were selected for their length, prioritizing sequences with museum vouchers. Reference sequences were included for outgroup taxa Pinna and Atrina (Pinnidae) for Mytilidae, and Thyasira (Thyasiridae) for Lucinidae. Reference sequences for 12S, 16S, cytochrome B, cytochrome oxidase 1, ATP synthase subunit 6, NADH dehydrogenase subunits 2 and 4, histone 3, 18S and 28S were selected and used for similarity filtering. Sequences not identified to the specified locus were omitted (e.g. “complete mitochondrial genomes” or “small subunit rRNA”). Due to this limitation, reference sequences for the outgroup taxa were sometimes unavailable, so we instead included sequences from other closely related families: Pinctada fucata (Pteriidae), Crassostrea madrasensis (Ostreidae), Margaritifera margaritifera (Margaritiferidae), and Chlamys islandica (Pectinidae) for ATP synthase subunit 6, NADH dehydrogenase subunit 2, and NADH dehydrogenase subunit 4, respectively. For Lucinidae, a sequence of Parathyasira equalis (Thyasiridae) was included in the reference file for cytochrome B. To increase confidence in the reference sequences for Lucinidae, we prioritized the longest sequences provided by research groups specializing in lucinid systematics, rather than the longest sequences.
Using our reference files, we first performed similarity searches for the sequence data within SuperCrunch using a discontiguous megablast algorithm. Sequences were allowed to match up to fifteen loci. We selected one sequence per species from these locus-specific files (minimum length of 200 base pairs). Genes with fewer than ten sequenced taxa were removed. We also removed the following mytilid species that resolved outside their expected genera and subfamilies in preliminary phylogenetic analyses, had abnormally long branch lengths, or had uncertain identifications: Bathymodiolus elongatus (NCBI TXID 385530), Brachidontes setiger (NCBI TXID 2694962), Brachidontes variabilis (NCBI TXID 102295), Leiosolenus nasutus (NCBI TXID 857499), Modiolatus hanleyi (NCBI TXID 499949) and Modiolus rectus (NCBI TXID 410357). After tree inference, we removed five freshwater mytilid species and three tips resolving outside their respective genera and subfamilies: Lithophaga curtus (NCBI TXID 2590090), Modiolarca impactus (NCBI TXID 2570230), and Mytella brasiliensis (NCBI TXID 356385).
We continued with the SuperCrunch pipeline to construct molecular alignments from the sorted, filtered and trimmed GenBank accessions, resulting in a five-gene matrix for Lucinidae (16S, cytochrome B, cytochrome oxidase 1, 18S and 28S), and a ten-gene matrix for Mytilidae (12S, ATP synthase subunit 6, NADH dehydrogenase subunits 2 and 4, histone 3, and the genes used for Lucinidae). We partitioned these concatenated alignments and determined the best nucleotide substitution models using PartitionFinder2, and used the results to infer 100 non-parametric bootstrap trees for each family. The summarized consensus trees—after minor pruning—were the basis for all figures and downstream analyses.
For both Mytilidae and Lucinidae, we used the resulting relative-rate trees with branch lengths scaled to nucleotide substitution rates in comparative analyses, as the methods used here for the two focal clades do not require time-scaled branch lengths to estimate discrete character transitions. We consider the number of substitutions per site to be the most balanced basis for estimating evolutionary distance across both families, because the subclade affinities of their early-branching members are areas of active research; successfully calibrating these nodes in the future will help to reduce any effects of unevenly constrained branch lengths that can result from less dense calibrations (e.g. skewed likelihood calculations of trait changes and transition rate estimation).
Phylogenetic simulation and permutation tests of piecemeal vs. in-situ diversification
We used the phylogenies of Mytilidae and Lucinidae to estimate both the number of transitions from shallow to deep habitats and the richness of the resulting deep-sea lineages against patterns resulting from stochastic character evolution. We take a two-pronged approach to comparing the evolutionary patterns of deep-sea taxa to a null model for each clade. In one, the "simulation" test, stochastic character histories are modeled on the transition matrix between deep and shallow states, estimated using the phylogenies we inferred with empirical data. In the second, the "permutation" test, we shuffle shallow and deep tip states while holding constant the total number of deep-sea endemics in each clade, thereby allowing the transition rates between the "shallow" and "deep" states to vary by permutation. We compare the outcomes of both procedures to the number of transitions estimated using the observed phylogeny, and the number of deep-sea endemic species directly descending from each transition. Together, these two measures form a solution space for assessing the continuum from piecemeal to in-situ evolution by evaluating whether: 1) “observed” transitions to the deep sea occurred more or less frequently than the null expectations, and 2) these transitions are associated with elevated numbers of deep-sea endemic species.
For the observed values, we estimated the number of transitions and the number of deep-sea tips descending from each one by simulating discrete trait evolution (“shallow” or “deep”) 1000 times along each branch of the mytilid and lucinid phylogenies (Figs. S1-S2; phytools “make.simmap”). We fixed the root state as "shallow" for both families given paleontological evidence (see Results), and fitted transition matrices using the model of character evolution best supported by both phylogenies, the "All Rates Different" model. For each stochastic character map, we identified transitions as branches where the initial state was “shallow” and the terminal state was “deep.” We then tallied the number of tips representing deep-sea endemic species directly subtended by these transitions, excluding tips in nested lineages showing more recent reversals (i.e. “deep” to “shallow” to “deep”). We estimated the median and 95 % confidence interval number of deep-sea endemics per transition across the 1000 simulated character histories described above, ranked by their richness.
For each family, we compared this rank-order profile of transitions to a stochastic pattern of tip states generated from 1000 simulations of discrete character histories under the observed state transition rate matrix. Transition matrices were fitted to each clade using an extended Markov k-state (Mk) model for discrete character evolution assuming the "All Rates Different" different model (see phytools “fitMk”). The ancestral state for each simulation was assumed to be shallow as in the estimates of the observed patterns described above. We then calculated, for each simulation, the locations of transitions and the number of deep-sea tips directly subtending them as described above. Finally, we ranked transitions by deep-sea richness, and calculated the median and 95 % confidence interval for each rank.
We also conducted a permutation-based (tip-shuffling) analysis wherein we shuffled 1000 times the "deep" and "shallow" tip states on each fixed tree. The number of deep and shallow tips was fixed to the observed value but the transition rate matrix was allowed to vary in these permutation tests. Observed diversification histories with ranked transitions having higher deep-sea endemic richness than expected by chance, along with more steeply declining rank-order richness profiles (indicating the concentration of deep-sea endemics among fewer transitions), align more closely with the in-situ model of diversification; piecemeal patterns align with stochastic expectations. Both approaches are necessarily limited by taxonomic sampling, so we also performed a related, hierarchical analysis, described below, to test whether all known deep-sea endemics are concentrated within particular genera in their respective families.
Testing potential drivers of deep-sea entries in LucinidaeWe tested whether the phylogenetic positions of lucinid deep-sea endemics are predicted by the bathymetric and climatic occurrences of their “close relatives.” We defined close relatives in two ways to account for phylogenetic resolution and taxon coverage. In the "strict scheme", taxa were considered close relatives if they form moderate- or high-confidence clades (bootstrap support > 50 %) with deep-sea endemic taxa in the phylogeny represented in Fig. S2 (N = 21 species; marked by asterisks). The "broad scheme" used the close relatives in the first scheme plus the species absent from the molecular phylogeny but belonging to genera that include or are sister to deep-sea endemics (N = 69 species for climate analyses, N = 67 species for depth; Fig. S2). For example, close relatives in Leucosphaerinae include all shallow-water species in Alucinoma, Anodontia, Gonimyrtea, and Myrtina in the broad scheme. Further, as there is at least one deep-sea endemic species in the lucinine genera Cardiolucina, Lamellolucina, and Megaxinus, any shallow-water species in those genera were considered close relatives in this broad scheme, despite the fact that the latter two genera are absent from our molecular phylogeny. Lucinoma includes numerous deep-sea endemic species and is potentially non-monophyletic depending on the placement of Divalucina, so this scheme also considers all shallow-water Lucinoma and Divalucina to be close relatives. Most taxa in Myrteinae are deep-sea endemics and the clade has ~40 operational species, so we consider any shallow-water myrteine (including those in genera lacking molecular data) to be a close relative under the broad scheme. Due to their polyphyly and low resolution, five operational genera (Ferrocina, Gonimyrtea, Parvilucina, Pegophysema, and Myrtea sensu lato (i.e., species excluded from Myrtea sensu stricto) were not considered close relatives despite having endemic deep-sea congenerics (Fig. S2). An exception was made for Gonimyrtea concinna as it falls sister to the endemic deep-sea Ustalucina ferruginea (BSS = 91 %; Fig. S2). Dataset S1 contains designations of close relatives under both schemes.
For both sets of close relatives, we used one-sided Kolmogorov-Smirnov tests to assess whether lucinid deep-sea endemic species are more likely to have close relatives reaching greater depths than other shallow-water species. We used permutation and Fisher's exact tests to assess whether extant close relatives of deep-sea endemics are more likely to be found in extratropical (i.e. colder) waters. We associated each shallow-water species with biogeographic provinces as defined by Huber 2015, coding them as tropical, extratropical, or “both” (one-sided Fisher’s exact tests omitted the "both" category).
We also assessed the climatic context (tropical, extratropical, or both/uncertain) of the first occurrences of lucinid genera with a known fossil record, using a compendium of first and and last occurrences of marine bivalves and the primary literature. Fossil data allowed us to assign climates for 11 of the 22 lucinid genera with at least one extant deep-sea endemic species, and for 56 of the 75 lucinid genera today lacking deep-sea endemics (Supplementary Dataset S1). We excluded genera in "both/uncertain" categories and used a one-sided Fisher’s exact test to evaluate whether lucinid genera with at least one extant deep-sea endemic were more likely to have originated in extratropical settings.
Testing the taxonomic concentration of deep-sea endemics across BivalviaUsing the full database of bivalve bathymetries, we tested whether families containing at least one deep-sea endemic species are phylogenetically clustered on the family-level tree (via the “Fritz & Purvis's D” statistic in caper). We then tested whether the proportion of deep-sea endemics in a family is phylogenetically clustered using Pagel’s λ. To test whether the relative richness of deep-sea endemic species within families differs from the stochastic expectation, we randomly shuffled depth assignments 1000 times, keeping constant the size of every family. We next tested whether deep-sea endemic species are concentrated within certain genera across families by randomly shuffling depth categories for species within families 1000 times, and counting the number of genera with at least one deep-sea endemic.
Changes after Nov 4, 2025: One mytilid taxon (D. pelseneeri) reassigned from shallow to deep. All relevant tables were updated accordingly.
