Biodiversity, biogeography, and connectivity of polychaetes in the world's largest marine minerals exploration frontier
Data files
Apr 12, 2023 version files 3.21 MB
-
CCZ_Polychaeta_16S_alignment_diversity_analyses.fas
-
CCZ_Polychaeta_16S_sequences_connectivity_analyses.fas
-
CCZ_Polychaeta_COI_alignment_diversity_analyses.fas
-
CCZ_Polychaeta_COI_sequences_connectivity_analyses.fas
-
CCZ_Polychaeta_PopGen.html
-
CCZ_Polychaeta_PopGen.Rmd
-
CCZ_Polychaeta_Table_S2.xlsx
-
CCZ_Polychaeta_Table_S3.xlsx
-
README.md
Abstract
The abyssal Clarion-Clipperton Zone (CCZ), Pacific Ocean, is an area of commercial importance owing to the growing interest in mining high-grade polymetallic nodules at the seafloor for battery metals. Research into the spatial patterns of faunal diversity, composition, and population connectivity is needed to better understand the ecological impacts of potential resource extraction. Here, a DNA taxonomy approach is used to investigate regional-scale patterns of taxonomic and phylogenetic alpha and beta diversity, and genetic connectivity, of the dominant macrofaunal group (annelids) across a 6 million km2 region of the abyssal seafloor. We used a combination of new and published barcode data to study 1866 polychaete specimens using molecular species delimitation. Both phylogenetic and taxonomic alpha and beta diversity metrics were used to analyse spatial patterns of biodiversity. Connectivity analyses were based on haplotype distributions for a subset of the studied taxa. DNA taxonomy identified 291–314 polychaete species from the COI and 16S datasets respectively. Taxonomic and phylogenetic beta diversity between sites were relatively high and mostly explained by lineage turnover. Over half of pairwise comparisons were more phylogenetically distinct than expected based on their taxonomic diversity. Connectivity analyses in abundant, broadly distributed taxa suggest an absence of genetic structuring driven by geographical location. Species diversity in abyssal Pacific polychaetes is high relative to other deep-sea regions. Results suggest that environmental filtering, where the environment selects against certain species, may play a significant role in regulating spatial patterns of biodiversity in the CCZ. A core group of widespread species have diverse haplotypes but are well connected over broad distances. Our data suggest that the high environmental and faunal heterogeneity of the CCZ should be considered in policy decisions such as designating protected areas.
Methods
Complete description of the DNA taxonomy pipeline used in the collection of samples new to this study is provided in Glover et al. (2016). Abyssal benthic specimens were collected from UK-1, OMS, and APEI-6 using a variety of oceanographic sampling gear including box cores, epibenthic sledges (EBS), remotely operated vehicle (ROV), and multi-cores. Live-sorted specimens were stored in individual microtube vials containing an aqueous solution of 80% non-denatured ethanol, numbered, barcoded into a database, and kept chilled until return to the Natural History Museum, London, UK.
Extraction of DNA was done with DNeasy Blood and Tissue Kit (Qiagen) using a Hamilton Microlab STAR Robotic Workstation. Approximately 450 bp of 16S, and 650 bp of cytochrome c oxidase subunit I (COI) were amplified using primers listed in Table S1. PCR mixtures contained 1 μl of each primer (10 μM), 2 μl template DNA and 21 μl of Red Taq DNA Polymerase 1.1X MasterMix (VWR) in a mixture of a total of 25 μl. The PCR amplification profile consisted of initial denaturation at 95 °C for 5 min, 35 cycles of denaturation at 94 °C for 45 s, annealing at 55 °C for 45 s, extension at 72 °C for 2 min, and a final extension at 72 °C for 10 min. PCR products were purified using Millipore Multiscreen 96-well PCR Purification System, and sequencing was performed on an ABI 3730XL DNA Analyser (Applied Biosystems) at The Natural History Museum Sequencing Facility, using the same primers as in the PCR reactions. Overlapping sequence fragments were merged into consensus sequences using Geneious (Kearse et al., 2012).
Sequences of the mitochondrial 16S rRNA (16S) and cytochrome c oxidase subunit I (COI) genetic markers were supplemented with sequences from Wiklund et al. (2019), Drennan et al. (2021), Glover et al. (in press), Neal et al. (2022a), Neal et al. (2022b), Janssen et al. (2015), Janssen et al. (2019), Bonifácio et al. (2020), and Bonifácio et al. (2021a,b) available through NCBI GenBank (Altschul et al., 1990) (https://www.ncbi.nlm.nih.gov/genbank/) and BOLD (https://www.boldsystems.org). Full 16S and COI sequence alignments used in analyses can be found in Appendix S2 and S3. Genetic data from a total of 1866 specimens were included in diversity analyses, with 1177 specimens having COI sequences, and 1101 specimens having 16S sequences.
All sequence identities, including published ones, were confirmed using NCBI BLASTn (Altschul et al., 1990) to check for contamination before any analyses. COI sequences were translated into amino acids and checked for stop codons to avoid pseudogenes utilising MEGA X (Stecher et al., 2020). 16S and COI genes were aligned separately using MAFFT v.7 (Katoh & Standley, 2013) and MUSCLE (Edgar, 2004) respectively using default settings. Alignments were further manually edited and clipped in MEGA X. Minimum length coverage was 508 base-pairs for COI and 298 base-pairs for 16S.
Best-fit partitions for each alignment and amino acid substitution models for COI were inferred using PartitionFinder 2.0 (Lanfear et al., 2017), utilising a greedy clustering algorithm (Lanfear et al., 2012), and PhyML v.3 (Guindon et al., 2010). The optimal evolutionary model for both 16S and all codon positions for COI was identified as the General Time Reversible model with corrections for invariant characters and gamma-distributed rate heterogeneity (GTR + I + G). Four outgroups were chosen based on those utilised in annelid phylogenies by (Rousset et al., 2007).
BEAST v2.6.2 (Bouckaert et al., 2019) was used to infer individual Bayesian ultrametric phylogenies for the 16S and COI datasets. Each Bayesian analysis was run for 100,000,000 generations, sampling every 100,000 generations using the optimal evolutionary model, Yule speciation model, and an uncorrelated relaxed clock. Trace plots of Markov Chain Monte Carlo (MCMC) runs were visually inspected in Tracer v.1.7.1 to assess stationarity and appropriate burn-in. After the likelihood of the trees of each chain converged, the first 50,000,000 states were discarded as burn-in.