Global seafloor connectivity over evolutionary time
Data files
Jun 11, 2025 version files 710.34 MB
-
biome-biome_statistics.csv
45.13 KB
-
BiomeID.csv
4.57 KB
-
bl_x.csv
54.85 KB
-
DepthLatID.csv
1.10 KB
-
GIS_transects_ED5.zip
9.24 KB
-
OSM9e_LTT_ED8b.zip
170.75 KB
-
OSM9f_exon_boundaries.txt
41.16 KB
-
OSM9f_p4.txt
91 B
-
OSM9f_RAxML.nwk
193.76 KB
-
Osm9f.csv
730.14 KB
-
OSM9f.nwk
195.72 KB
-
osm9f.phy
706.58 MB
-
osm9fv3.r
50.54 KB
-
README.md
9.68 KB
-
World_osm9_v3.jpeg
2.25 MB
Abstract
Our knowledge of biogeographic patterns and processes in the deep sea has been limited by the lack of integrated datasets that cover its vast extent. Here we analyse a new global dataset of genomic DNA sequences, spanning an entire taxonomic class of benthic invertebrates (Ophiuroidea), to obtain a broad understanding of phylogenetic divergence and biotic movement across all oceans, from coastal margins down to the abyssal plains. We show that regional faunas on the continental shelf are phylogenetically divergent, particularly at temperate and tropical latitudes. In contrast, assemblages in the deep sea are much more connected. Many temperate deep-sea lineages have achieved distribution ranges across the planet, including over the Quaternary period. A close relationship exists between deep-sea faunas of the North Atlantic and Southern Australia, on the opposite sides of the globe. Bathymetric interchange is not only reliant on vertical migration through isothermal polar waters, but also occurs across the thermal depth gradients of tropical regions. The connected nature of deep-sea life should be an important consideration in marine conservation assessments.
This dryad submission contains the aligned DNA nuclear and mitochondrial sequences, the phylogenetic trees, descriptive tables, spatial datasets, and r software code from our biogeographic study on marine benthic connectivity across the global oceans.
Manuscript citation: O'Hara et al. (submitted) Global marine connectivity over evolutionary time. Nature
Description of the data and file structure, Figure numbers refer to the submitted paper.
osm9f.phy: phylip file with aligned nuclear (1487 exons, 260319 bp) and mitochondrial COI (1431 bp) sequence data from the 2699 samples used to create the OSM9f.nwk phylogeny for this study.
OSM9f exon boundaries.txt: text file with a list of the exons (and COI) and the start and end positions in the OSM9f.phy file.
OSM9f_p4.txt: partition file used in the maximum likelihood (RAxML v8.1.20) analysis of the OSM9f.phy file and the creation the OSM9f.nwk phylogenetic tree.
OSM9f.nwk: the chronogram tree in NEWICK format used to create Fig. 1, Extended Data figs 7 & 10, and input to the biogeographic analyses that result in the ordinations (Fig. 2-3, Extended Data fig. 9) and chord diagrams (Fig. 4, Extended Data Fig. 4). This tree was created by RAxML v8.1.20 and converted to a chronogram using the rate-smoothing software TreePL v1.0 (see methods).
Osm9f.csv: a table with sample, taxonomic, locality and environmental data corresponding to each DNA record in OSM9f.phy and each tip on the phylogenetic tree (OSM9f.nwk). The ID column indicates the order samples appear in the tree. Sample columns include Tip (full name of the sample), Sample code (unique sample code), Plate (sequencing plate run), Target (percentage of exon target captured by the sequencing) , Heterozygosity (of the sites), COI_bp (the length of the COI fragment), Genbank COI code (code on Genbank). Taxonomic columns include Species (Genus and species name, including informal undescribed names), Family (taxonomic family), Order (taxonomic order), FamilyID (taxonomic family coded as an integer), FamilyFocalTaxa (label for display on fan-phylogenies). Locality columns include GlobBiome (the biome the sample belongs to, the main descriptor used in the biogeographic analyses), BiomeID (the biome coded as a integer), Region (one of 15 geographic regions), RegionID (the region coded as a integer), Strata (depth strata of the sample, S=0-200m, B=200-3500 m and A = >3500 m), StrataID (depth strata coded as an integer), Locality (geographic description of collection), DepthLat (descriptor used in the 9 state chord diagrams, based on 3 latitude and 3 depth categories, DepthLatID (DepthLat descriptor encoded as an integer), Collection event code (survey and station code), Museum registration (museum acronym and registration code of original specimen), Lat (decimal latitude in degrees of collection site), Lon (decimal longitude in degrees of collection site), Depth (depth of collection site in meters). Environmental data includes Temp (seawater temperature in Celsius), Salinity (in PSU), and oxygen concentration (?mol/kg) derived from trilinear interpolation of World Ocean Atlas (2018) annual data from the sample latitude, longitude and depth. 'NA' in a cell indicates that there is no available data for this record field.
BiomeID.csv: a table with the 37 biome codes and descriptors, including Biome (code), Biome ID (order of biomes according to depth strata and alphabetic region), Order (order of biomes resulting from the CorrHMM analysis), LatOrder (order of biomes according to depth strata and latitude), CountOfTips (number of samples in each biome), Col (colour pattern1), ColM (colour pattern2), Str (label of biome for figures), Lat_avg (mean latitude of all biome samples from data in OSM9f.csv), Lon_avg (mean longitude of all biome samples from data in OSM9f.csv), Depth_avg (mean depth of all biome samples from data in OSM9f.csv), Temp_avg (mean seawater temperature of all biome samples from data in OSM9f.csv), Salinity_avg (mean salinity of all biome samples from data in OSM9f.csv), O2_avg (mean oxygen content of all biome samples from data in OSM9f.csv), Region_str (descriptor for geographic region, for figure legends), Depth_str (descriptor for geographic region, for figure legends), DepthLatGroup (alphanumeric code for depth-latitudinal categories, the first letter indicating depth [A=abyssal, B=Bathyal, and S=Shelf] and the second letter indicating region [P=Polar, T=Temperate, and E=Equatorial], used to create Fig. ED4c-d), DepthLatID (integer code for depth-latitudinal categories), LatGroupOrder (Order for latitudinal groups: 3=Polar, 2=Temperate, and 1=Equatorial]), DepthOrder (Order for depth groups: 3=Shelf, 2=Bathyal, and 1=Abyssal]).
DepthLatID.csv: a table with the 9 depth-latitude biome groups and descriptors with a subset of similar headings to BiomeID.csv.
bl_x.csv: A comma-delimited text-based file containing known biome-species records that were not part of the genetic dataset. These tips are artifically added to the phylogeny at an appropriate taxonomic location to test whether missing data influences the ordination pattern (see Extended data Fig. 9g). Columns include: Var1=tip label, Code=unique code for each record, Taxonomy ID=unique species-level code, cat=code as to whether to include the tip or not (X=include, NA=dont include) modified in the R code, Biome1=code for the biome of the sample, BiomeID=integer factor of the Biome code, Depth1=average depth of this species in this biome, Family=taxonomic family of sample.
OSM9f_RAxML.nwk: a phylogenetic tree in NEWICK format produced from a RAxML v8 maximum likelihood analysis with branch lengths reflecting genetic distance rather than time. This is used to test whether the ordination pattern is biased by the rate-smoothing algorithm that produces a chronogram (see Extended data Fig. 9h)
biome-biome statistics.csv: a table with calculated biome-to-biome GUNN distance measures, including the number of unique nodes in the comparison, geometric mean and standard deviation, median, mean and standard deviation, harmonic mean, geometric mean of nodes < 65 MY, geometric mean of nodes between 3 and 65 MY, jackknifed dataset, dataset with added tips to represent unsampled (but known biome occurrences), geometric mean of node heights calculated from a non-ultrametric (RAxML) tree.
GIS_transects_EDFig5.zip: archive containing ESRI shape and associated files for the 4 South to North transects used to create the latitude-depth plots on Extended Data Fig. 5. EA_buffer.* = East Atlantic, EP_buffer.* = East Pacific, WA_buffer.* = West Atlantic, and WP_buffer.* = West Pacific). Created using qGIS v3.28.3 software.
OSM9e_LTT_ED8b.zip: archive with the 4 files used to create the Lineage Through Time plots, including: 1) OSM9e_DBv1.csv, a text file in comma delimnited format, with 5 columns: label=the sample's text label on the full phylogenetic tree, exon=binary on whether the sample contains nuclear exon data, order=integer facctor indicating which of the 6 ophiuroid taxonomic orders the sample belongs to, family=integer factor indicating which of the 37 ophiuroid taxonomic families the sample belongs to, subord=integer factor indicating which of the 10 ophiuroid taxonomic suborders the sample belongs to. 2) OSM9e_RaxNGbbngg_pinct35_rr_tplo10b.nwk - text file with a representation of the full phylogenetic tree in Newick format. 3) OSM9e_tplo10b_sp3e.nwk - text file with a representation of a species-level tree (with duplicate within-species samples removed) in newick format. 4) OSM9e_tplo10b_sp3e_dlist.txt - a text file with the labels of the duplicate samples that have been removed in the creation of the species level phylogeny (OSM9e_tplo10b_sp3e.nwk) from the full phylogeny (OSM9e_RaxNGbbngg_pinct35_rr_tplo10b.nwk).
World osm9 v3.jpeg: enlarged image of Fig. 1, a map with locations of sample-sites, encoded into genomic (exon capture and transcriptome) and barcode (COI) samples.
Usage notes
Comma-separated (*.csv) files can be read by a text editor, a spreadsheet program such as Microsoft Excel, or via the "read.csv()" function in the R programming language.
Text (*.txt) files can be read by a text editor
Shapefiles (*.shp) are a common format for vector-based geographic information system (GIS) data developed by the company esri. They can be opened and used in any GIS software and in R or Python. A shapefile consists of multiple file types beyond the .shp (specifically, .cpg, .dbf, .prj, .sbn, and .sbx). The user only interacts directly with the .shp file but the other files need to be in the same directory.
NEWICK (*.nwk) are text-based files representing phylogenetic trees, which can be viewed by programs sich as FigTree v1.4.4 (2006-2018) available on UNIX/Linux/Mac OS X/Windows, and can be read into the R computing environment using the function read.newick() in the phytools library.
PHYLIP (*.phy) are text-based files representing DNA squence data, which can be viewed by many DNA sequence alignment editors and read into the R computing environment with the function read.phylip() in the library phylotools.
JPEG (*.jpeg) is a common format for raster image files, which can be viewed by many public domain image viewing programs.
Sharing/Access information
Previous dryad submission(s) with the exon-capture methodology include https://doi.org/10.5061/dryad.9jk90f6.
All data is included in this dryad submission
Code/Software
osm9fv3.r: The R code used to compute all the analyses and create all the figures in the manuscript.
Exon capture next-generation DNA sequence data, including mitochondrial genes, plus some COI-only samples
Phylogenetic analysis, trees and sample descriptions
Evolutionary biology and graphic scripts (R) and data tables
