Infrequent oceanic long-range dispersal and evolution of a top terrestrial arthropod predator in the sub-Antarctic
Data files
Apr 18, 2024 version files 549.40 KB
-
1_consensus_sequences.zip
-
2_alignment_concatenated.zip
-
README.md
Abstract
The UNESCO world heritage sub-Antarctic terrestrial ecosystems are unique. They have been isolated for over 30 million years by constant circum-polar currents and winds, and shaped by climatic cycles that surpass the tolerance limits of many species. Despite this recognition, surprisingly little is known about how these ecosystems acquired their native terrestrial fauna and how it changed over deep time scales. Here the patterns and timing of colonization and speciation in the largest and dominant arthropod predators in the Eastern sub-Antarctic – spiders of the genus Myro – are demonstrated for the first time. Our results indicate that this lineage originated from Australia before the Plio-Pleistocenic glacial cycles and underwent an adaptive radiation on the Crozet archipelago. We discuss the gain and loss of pre-adaptations acting as filter that enabled only one of four Myro species native to the Crozet islands to repeatedly disperse via the Antarctic circum-polar current, resulting in an outstanding distribution range over 9000 kilometres. The results highlight the outstanding role of the volcanic Crozet archipelago for the evolution of arthropod life in the sub-Antarctic, and the potential of terrestrial macro-invertebrates to achieve rare but ecologically influential trans-oceanic dispersal events over thousands of kilometres under hostile conditions.
README: Infrequent oceanic long-range dispersal and evolution of a top terrestrial arthropod predator in the sub-Antarctic
Sequence data Myroinae
Description of the data and file structure
1_consensus_sequences.zip: Raw consensus sequences generated from Nanopore reads after demultiplexing by specimen index and marker.
2_alignment_concatenated.zip: Final concatenated alignment (ITS reduced) used for phylogenetic analyses. The zip archive contains the concatenated alignment (fasta file) and the partition (nexus file).
Methods
Collection and field observation. Fieldwork in Tasmania was done under the research Authority No. FA18285, for Maatsuyker Island FA19197, and for Macquarie Island FA15234 of the DPIPWE (Department of Primary Industries, Parks, Water and Environment, Tasmania). Fieldwork on Crozet and Kerguelen islands (French Southern territories) was done under the project 136-SUBANTECO (French Polar Institute) with the collection permits A2020-89 (26/08/2020) and A-2021-115 (12/10/2021) from Terres Australes et Antarctique Françaises. The collection sites are indicated on the maps in Fig. S3. Specimens from Marion Island were obtained under the permit MARION-E/MA2022-6 by DFFE (Department of Forest, Fisheries and the Environment). Fieldwork in New Zealand was performed under the Research and Collection Authority 71225-RES by the DOC (Department of Conservation, New Zealand).
Specimens of Toxopidae were hand collected from webs and retreats or by turning stones and sifting litter. Myro spp. from Macquarie Island and Maatsuyker Island were retrieved from pitfall trapping with 100% propylene glycol as fixative. As the Heard Island archipelago is rarely visited, we examined and subsampled from a historical specimen of M. kerguelensis collected during an expedition in 1985 and deposited with the Tasmanian Agricultural Insect Collection.
Species were identified using taxonomic literature. We did not use sub-species names (e.g., M. kerguelensis crozetensis), as sub-species taxonomy is highly inconsistent in arachnology.
DNA sequencing approach. For molecular phylogenetics, long fragments of two mitochondrial genes (Cytochrome Oxidase I (COI), and Cytochrome B (CytB)) and the nuclear ribosomal cluster (spanning from highly conserved 18S-28S, including the fast evolving Internal Transcribed Spacer (ITS) regions) were sequenced, providing informativeness at both shallow and deeper phylogenetic levels [1]. These genes are widely used in barcoding, metabarcoding and molecular phylogenetics of spiders, and thus have the benefit of an abundance of legacy data that can be included in the analysis [2-4]. Our approach differed from previous phylogenetic studies on spiders using these amplicons by using new, long amplicon-producing primers (Tab S1) and a Nanopore sequencing approach to capture longer sections of mentioned genes (~1,050 bp of COI, ~750 bp of CytB and ~3,700 bp of the nuclear ribosomal cluster instead of ~300 bp fractions) [1, 5].
DNA extraction. Genomic DNA was extracted from separated legs of specimens that were either stored in absolute ethanol at -70 °C (45 out of 52 specimens, collected between one and three years before extraction) or in 70% ethanol at room temperature (collected between two and ten years before DNA extraction) (S4). For the extraction, the Qiagen Puregene Cell kit was used following the manufacturer’s protocol, with the addition of Proteinase K (ThermoFisher, Waltham, MA) and GlycoBlue (ThermoFisher) to enhance DNA yield. DNA pellets were eluted in 25 µL of Qiagen DNA hydration solution.
Marker amplification. For the amplification of the markers, the primers listed in Table S2 with attached index sequences were used, such that each sample received a unique combination of indexes. The unique indexes were generated using the Barcode Generator (http://comailab.genomecenter.ucdavis.edu/index.php/Barcode_generator, Accessed August 2020) with a minimum distance of 10 bases between each index and a length of 20bp. PCR was performed using the Qiagen Multiplex kit for the two mitochondrial markers and the Qiagen UltraRun LongRange kit for the nuclear ribosomal marker (due to its length exceeding 3000 bp). Each PCR reaction contained 5 µL PCR master mix from the corresponding kit, 3 µL RNase free water, 0.5 µL of each 10 µM forward and reverse primer and 1 µL of template DNA. Thermocycler protocols were chosen following the manufacturer’s recommendations for each kit, and were further adjusted following a preliminary round of PCR optimization. For the amplification of mitochondrial markers the following PCR settings were used: initial denaturing step at 95 °C for 15 min, 30 amplification cycles (denaturing at 94 °C for 30 s, annealing at 48 °C for 90 s, and extension at 72 °C for 90 s), final extension at 68 °C for 15 min. For the nuclear ribosomal marker the settings used were: initial denaturing at 93 °C for 3 min, 40 amplification cycles (denaturing at 93 °C for 30 s and annealing/extension at 68 °C for 2 min), final extension at 72 °C for 10 min. Indexed PCR products were checked on 1 % agarose gels, then pooled in roughly equimolar amounts based on gel band intensity.
Library preparation and sequencing. Two pools were made – one for the mitochondrial and one for the nuclear marker – because, based on previous work, amplicons differing greatly in length were expected to perform differently in library prep. Pools were then cleaned of residual primers using 0.7X magnetic beads following the AMPure XP protocol (Beckman Coulter, Pasadena, CA). Nanopore library prep was performed separately for the mitochondrial and nuclear pools, using the SQK-LSK109 kit (Oxford Nanopore Technologies, Oxford, UK) following Pomerantz et al. [6], and using the Short Fragment Buffer (SFB) for the mitochondrial and the Long Fragment Buffer (LFB) for the nuclear library. Libraries were quantified using a Qubit Fluorometer with the high-sensitivity dsDNA assay (ThermoFisher) and then combined into a final pool containing roughly 20 fmol of the mitochondrial and 30 fmol of the nuclear library. The relatively larger amount of nuclear library was chosen because the shorter mitochondrial fragments were expected to outperform the longer nuclear fragments during sequencing. This final pool was sequenced on a MinION using an R9.4.1 flow cell (Oxford Nanopore Technologies) according to the manufacturer’s protocol.
Approach for historical specimen. Due to the age of the historical specimen of M. kerguelensis from Heard Island (collected 37 years before DNA extraction), with only small amounts of tissue (2 legs) available, we used a DNA extraction method more suitable for degraded DNA following the protocol outlined in Derkarabetian et al. [7], a method originally described in Tin et al. [8]. As the DNA was too fragmented for the long read approach, short (200-300bp) amplicons of COI and 18S gene fragments were amplified and sequenced on an Illumina platform following the protocol by Krehenwinkel et al. [3].
Sequence assembly. Sequences were processed following Pomerantz et al. (2022). In brief, samples were first demultiplexed by index combination using minibar (Krehenwinkel et al. 2019). They were then demultiplexed by marker (COI, CytB or ribosomal cluster) based on inner primer sequences, and consensus sequences were generated, using NGSpeciesID [9]. In many cases, multiple consensus sequences were generated for a given sample due to coamplification of paralogs, NUMTs [10] or contaminants such as fungi. To identify the correct sequence for each sample, all sequences were checked by BLAST search [11] against the NCBI nucleotide database (accessed 10/2022), and erroneous sequences were removed from the data set.
Legacy sequence data were added from Wheeler et al. [2] and from the Barcode of Life project [12] (fractions of COI and 18S genes for 17 specimens; Tab. S1).
Sequence alignment. All samples of each individual marker sequence were aligned with ClustalW MSA with gap opening penalty 15.00 and gap extension penalty 6.66 in MEGA11 [13]. Alignments were visually inspected for obvious alignment errors, which were corrected by hand. The ITS-region contains many homopolymers, which are often subject to sequencing errors, causing misalignments particularly in this region. In such cases, excessive sites were removed. Some hyper-variable sites in the ITS could only be poorly aligned, even between samples of the same species, and were removed. The final concatenated alignment of all markers contains 4487 bp with 997 parsimony-informative and 306 singleton sites.
References
[1] Krehenwinkel, H., Pomerantz, A., Henderson, J.B., Kennedy, S.R., Lim, J.Y., Swamy, V., Shoobridge, J.D., Graham, N., Patel, N.H. & Gillespie, R.G. 2019 Nanopore sequencing of long ribosomal DNA amplicons enables portable and simple biodiversity assessments with high phylogenetic resolution across broad taxonomic scale. GigaScience 8, giz006.
[2] Wheeler, W.C., Coddington, J.A., Crowley, L.M., Dimitrov, D., Goloboff, P.A., Griswold, C.E., Hormiga, G., Prendini, L., Ramírez, M.J. & Sierwald, P. 2017 The spider tree of life: phylogeny of Araneae based on target‐gene analyses from an extensive taxon sampling. Cladistics 33, 574-616.
[3] Krehenwinkel, H., Kennedy, S.R., Rueda, A., Lam, A. & Gillespie, R.G. 2018 Scaling up DNA barcoding–Primer sets for simple and cost efficient arthropod systematics by multiplex PCR and Illumina amplicon sequencing. Methods in Ecology and Evolution 9, 2181-2193.
[4] Azevedo, G.H., Bougie, T., Carboni, M., Hedin, M. & Ramírez, M.J. 2022 Combining genomic, phenotypic and Sanger sequencing data to elucidate the phylogeny of the two-clawed spiders (Dionycha). Mol Phylogenet Evol 166, 107327.
[5] Gajski, D., Wolff, J.O., Melcher, A., Weber, S., Pekár, S., Prost, S., Krehenwinkel, H. & Kennedy, S.R. 2023 A simple, informative and cost-effective PCR-based multiplex protocol for spider taxonomy and phylogeny using third-generation sequencing. Preprint.
[6] Pomerantz, A., Sahlin, K., Vasiljevic, N., Seah, A., Lim, M., Humble, E., Kennedy, S., Krehenwinkel, H., Winter, S. & Ogden, R. 2022 Rapid in situ identification of biological specimens via DNA amplicon sequencing using miniaturized laboratory equipment. Nature Protocols 17, 1415-1443.
[7] Derkarabetian, S., Benavides, L.R. & Giribet, G. 2019 Sequence capture phylogenomics of historical ethanol‐preserved museum specimens: Unlocking the rest of the vault. Molecular ecology resources 19, 1531-1544.
[8] Tin, M.M.-Y., Economo, E.P. & Mikheyev, A.S. 2014 Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics. Plos One 9, e96793.
[9] Sahlin, K., Lim, M.C. & Prost, S. 2021 NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data. Ecology and evolution 11, 1392-1398.
[10] Lopez, J.V., Yuhki, N., Masuda, R., Modi, W. & O'Brien, S.J. 1994 Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat. J Mol Evol 39, 174-190.
[11] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. 1990 Basic local alignment search tool. J Mol Biol 215, 403-410.
[12] Ratnasingham, S. & Hebert, P.D. 2007 BOLD: The Barcode of Life Data System (http://www. barcodinglife. org). Molecular ecology notes 7, 355-364.
[13] Tamura, K., Stecher, G. & Kumar, S. 2021 MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38, 3022-3027.
Usage notes
All files included in this submission can be viewed with text editors. For viewing and analysing the files, however, programmes that can handle sequence alignments, such as MEGA, are recommended.