Skip to main content

Data and code for: The evolution of siphonophore tentilla for specialized prey capture in the open ocean

Cite this dataset

Damian-Serrano, Alejandro; Haddock, Steven; Dunn, Casey (2021). Data and code for: The evolution of siphonophore tentilla for specialized prey capture in the open ocean [Dataset]. Dryad.


Predator specialization has often been considered an evolutionary ‘dead-end’ due to the constraints associated with the evolution of morphological and functional optimizations throughout the organism. However, in some predators, these changes are localized in separate structures dedicated to prey capture. One of the most extreme cases of this modularity can be observed in siphonophores, a clade of pelagic colonial cnidarians that use tentilla (tentacle side branches armed with nematocysts) exclusively for prey capture. Here we study how siphonophore specialists and generalists evolve, and what morphological changes are associated with these transitions. To answer these questions, we: (1) measured 29 morphological characters of tentacles from 45 siphonophore species, (2) mapped these data to a phylogenetic tree, and (3) analyzed the evolutionary associations between morphological characters and prey type data from the literature. Instead of a dead-end, we found that siphonophore specialists can evolve into generalists, and that specialists on one prey type have directly evolved into specialists on other prey types. Our results show that siphonophore tentillum morphology has strong evolutionary associations with prey type, and suggest that shifts between prey types are linked to shifts in the morphology, mode of evolution, and genetic correlations of tentilla and their nematocysts. The evolutionary history of siphonophore specialization helps build a broader perspective on predatory niche diversification via morphological innovation and evolution. These findings contribute to understanding how specialization and morphological evolution have shaped present-day food webs.


*Tentillum morphology* – The morphological work was carried out on siphonophore specimens fixed in 4% formalin from the Yale Peabody Museum Invertebrate Zoology (YPM-IZ) collection (accession numbers in Dryad repository). These specimens were collected intact across many years of fieldwork expeditions, using blue-water diving [@haddock2005scientific], remotely operated vehicles (ROVs), plankton net trawls, and human-operated submersibles. Tentacles were dissected from non-larval gastrozooids, sequentially dehydrated into 100% ethanol, cleared in methyl salicylate, and mounted onto slides with Canada Balsam or Permount mounting media. The slides were imaged as tiled z-stacks using differential interference contrast (DIC) on an automated stage at YPM-IZ (with the assistance of Daniel Drew and Eric Lazo-Wasem) and with laser point confocal microscopy using a 488 nm Argon laser that excited autofluorescence in the tissues. Thirty characters (defined in S1) were measured using Fiji [@collins2007imagej;@schindelin2012fiji]. We did not measure the lengths of contractile structures (terminal filaments, pedicles, gastrozooids, and tentacles) since they are too variable to quantify. We measured at least one specimen for 96 different species (raw data available in Dryad). Of these, we selected 38 focal species across clades based on specimen availability and phylogenetic representation. Three to five tentacle specimens from each one of these selected species were measured to capture intraspecific variation. 

*Siphonophore phylogeny* – While the main goal of this work is not to elucidate a novel phylogeny for Siphonophora, we did expand on the most recent transcriptome based phylogeny [@munro2018improved] to accommodate a larger taxon sampling. In order to do this, we ran a constrained analysis on an extensive 18S+16S dataset. The phylogenetic analysis included 55 siphonophore species and 6 outgroup cnidarian species (*Clytia hemisphaerica*, *Hydra circumcincta*, *Ectopleura dumortieri*, *Porpita porpita*, *Velella velella*, *Staurocladia wellingtoni*). The gene sequences we used in this study are available online (accession numbers in Dryad repository). Some of the sequences we used were accessioned in [@dunn2005molecular], and others we extracted from the transcriptomes in [@munro2018improved]. Two new 16S sequences for *Frillagalma vityazi* (MK958598) and *Thermopalia* sp. (MK958599) sequenced by Lynne Christianson using the primers from [@cunningham1993molecular] (read 3' to 5' F: TCGACTGTTTACCAAAAACATAGC , R:  ACGGAATGAACTCAAATCATGTAAG) were included and accessioned to NCBI. Additional details on the phylogenetic inference methods can be found in the Supplementary Methods. 

Unconstrained ML and Bayesian phylogenies were congruent (S2,S5). Given the broader sequence sampling of the transcriptome phylogeny, we ran constrained inferences (using both ML and Bayesian approaches, which produced fully congruent topologies (S4,S6)) after clamping the 5 nodes (S3, blue circles in Fig. \@ref(figure4)) that were incongruent with the topology of the consensus tree in [@munro2018improved]. This topology was then used to inform a Bayesian relaxed molecular clock time-tree in RevBayes, using a birth-death process (sampling probability calculated from the known number of described siphonophore species) to generate ultrametric branch lengths (S7-8). Scripts and tree files available in the Dryad repository.

*Feeding ecology* – We extracted categorical diet data for different siphonophore species from published sources, including seminal papers [@biggs1977field;@purcell1981dietary;@purcell1984functions;@Mackie:1987uy;@pugh1988two;@purcell1984predation;@andersen1981redescription], and ROV observation data [@choy2017deep;@hissmann2005situ] with the assistance of Elizabeth Hetherington and C. Anela Choy (data available in Dryad repository). In order to detect coarse-level patterns in feeding habits, the data were merged into feeding guilds. For more details on how the diet data was curated and summarized into guilds, please see Supplementary Methods. 

We also extracted copepod prey length data from [@purcell1984functions]. To calculate specific prey selectivities, we extracted quantitative diet and zooplankton composition data from [@purcell1981dietary], matched each diet assessment to each prey field quantification by site, calculated Ivlev’s electivity indices [@jacobs1974quantitative], and averaged those by species (data available in Dryad repository).

*Statistical analyses* – We used a series of phylogenetic comparative methods to test the evolutionary hypotheses presented in this study. We reconstructed ancestral states using ML (R phytools::anc.ML [@revell2012phytools]), and stochastic character mapping (R phytools::make.simmap) for categorical characters. For more details on the data wrangling prior to these analyses, please see the Supplementary Methods. R scripts available in the Dryad repository.

In order to study the evolution of predatory specialization, we reconstructed components of the diet and prey selectivity on the phylogeny using ML (R phytools::anc.ML). To identify evolutionary associations of diet with tentillum and nematocyst characters, we compared the performance of a neutral evolution model to that of a diet-driven directional selection model. First, we collapsed the diet data into the five feeding guilds mentioned above (fish specialist, small crustacean specialist, large crustacean specialist, gelatinous specialist, generalist), based on which prey types they were observed consuming most frequently. Then, we reconstructed the feeding guild ancestral states using the ML function ace (package ape [@paradis2019package]), removing tips with no feeding data. The ML reconstruction was congruent with the consensus stochastic character mapping (S15). Then, using the package *OUwie* [@beaulieu2012ouwie], we fitted an OU model with multiple optima and rates of evolution (OUm) matched to the reconstructed ancestral diet regimes, a single optimum OU model, and a BM null model, inspired by the analyses in [@cressler2015detecting]. We then ranked the models in order of increasing parametric complexity (BM, OU, OUm), and compared the corrected Akaike Information Criterion (AICc) support scores [@sugiura1978further] to the lowest (best) score, using a cutoff of 2 units to determine significantly better support. When the best fitting model was not significantly better than a less complex alternative, we selected the least complex model (S9). In addition, we calculated and reported the model adequacy scores using the R package *arbutus* [@pennell2015model].

In order to study correlations between the rates of evolution between different characters, we fitted a set of evolutionary variance-covariance matrices [@revell2009phylogenetic] (R phytools::evol.vcv). For more details on the data wrangling preceding these analyses, please see Supplementary Methods. To test whether phenotypic integration changed across selective regimes determined by the reconstructed feeding guilds, we carried out character-pairwise variance-covariance analysis comparing alternative models (R phytools::evolvcv.lite), including those where correlations are the same across the whole tree and models where correlations differ between selective regimes (S19). Number of taxa used in each pairwise comparison is reported in S20. Finally, we compared regime-specific variance-covariance matrices to the general matrix and to their preceding regime matrix to identify the changes in character dependences unique to each regime (S21-22). 

We carried out a linear discriminant analysis of principal components (DAPC) using the dapc function (R adegenet::dapc) [@jombart2010discriminant]. This function allowed us to incorporate more predictors than individuals. We generated discriminant functions for feeding guild, and for the presence of copepods, fish, and shrimp (large crustaceans) in the diet (S10-13). From these DAPCs we obtained the highest contributing morphological characters to the discrimination (characters in the top quartile of the weighted sum of the linear discriminant loadings controlling for the eigenvalue of each discriminant). In order to identify the sign of the relationship between the predictor characters and prey type presence in the diet, we then generated generalized logistic regression models (as a type of generalized linear model, or GLM using R stats::glm) and phylogenetic generalized linear models (R phylolm::phyloglm) with the top contributing characters (from the corresponding DAPC) as predictors (S14). We also carried out these GLMs on the Ivlev’s selectivity indices for each prey type calculated from [@purcell1981dietary]. In addition, we ran a series of comparative analyses to address hypotheses of diet-tentillum relationships posed in the literature. Additional details on the DAPC optimization are available in the Supplementary Methods.

Supplementary Methods {-}

Phylogenetic inference: We aligned the sequences using MAFFT [@katoh2002mafft] (alignments available in Dryad). We inferred a Maximum Likelihood (ML) phylogeny (S2) from 16S and 18S ribosomal rRNA genes using IQTree [@nguyen2014iq] with 1000 bootstrap replicates (iqtree -s alignment.fa -nt AUTO -bb 1000). We used ModelFinder [@kalyaanamoorthy2017modelfinder] implemented in IQTree v1.5.5. to assess the relative model fit. ModelFinder selected GTR+R4 for having the lowest Bayesian Information Criterion score. Additionally, we inferred a Bayesian tree with each gene as an independent partition in RevBayes [@hohna2016revbayes] (S5), which was topologically congruent with the unconstrained ML tree. The *alpha* priors were selected to minimize prior load in site variation.

Diet data curation: We removed the gelatinous prey observations for *Praya dubia* eating a ctenophore and a hydromedusa, and for *Nanomia* sp. eating *Aegina* since we believe these are rare events that have a much larger probability of being detected by ROV methods than their usual prey, and it is not clear whether the medusae were attempting to prey upon the siphonophores. Personal observations on feeding (from SHDH, Anela Choy, and Philip Pugh) were also included for *Resomia ornicephala*, *Lychnagalma utricularia*, *Bargmannia amoena*, *Erenna richardi*, *Erenna laciniata*, *Erenna sirena*, and *Apolemia rubriversa*. The feeding guilds declared in this study are: small-crustacean specialist (feeding mainly on copepods and ostracods), large crustacean specialist (feeding on large decapods, mysids, or krill), fish specialist (feeding mainly on actinopterygian larvae, juveniles, or adults), gelatinous specialist (feeding mainly on other siphonophores, medusae, ctenophores, salps, and/or doliolids), and generalist (feeding on a combination of the aforementioned taxa, without favoring any one prey group). These were selected to minimize the number of categories while keeping the most different types of prey separate. The gut content observations on *Forskalia* sp. were synonymized to an arbitrary *Forskalia* species on the tree (*F. tholoides*) for comparative analyses.

Data wrangling for comparative analyses: For comparative analyses, we removed species present in the tree but not represented in the morphology data, and *vice versa*. Although we measured specimens labeled as *Nanomia bijuga* and *Nanomia cara*, we are not confident in some of the species-level identifications, and some specimens were missing diagnostic zooids. Thus, we decided to collapse these into a single taxonomic concept (*Nanomia* sp.). All *Nanomia* sp. observations were matched to the phylogenetic position of *Nanomia bijuga* in the tree. We carried out all phylogenetic comparative statistical analyses in the programming environment R [@team2017r], using the Bayesian ultrametric species tree (S8), and incorporating intraspecific variation estimated from the specimen data as standard error whenever the analysis tool allowed it. R scripts and summarized species-collapsed data available in the Dryad repository. For each character (or character pair) analyzed, we removed species with missing data and reported the number of taxa included. We tested each character for normality using the Shapiro-Wilk test [@shapiro1965analysis], and log-transformed those that were non-normal.

Data wrangling for the variance-covariance analyses: When fitting all variance-covariance terms simultaneously (S16-18), we selected the largest set of characters that would allow the analysis to run without computational singularities. This excluded many of the morphometric characters which are linearly dependent on other characters. Since the functions do not tolerate missing data, we ran the analyses in two ways: One including all taxa but transforming absent states to zeroes, and another removing the taxa with absent states. These analyses could only be carried out on the subset of taxa for which diet data is available, and only among character pairs that are not computationally singular for that taxonomic subset. Gelatinous specialist correlations could only be estimated for a small subset of characters present in *Apolemia* (S21F, S22E, S23D) and should be interpreted with care.

Comparative tools used to test character associations: To test for correlated evolution among binary characters, we used Pagel’s test [@pagel1994detecting]. To characterize and evaluate the relationship between continuous characters, we used phylogenetic generalized least squares regressions (PGLS) [@grafen1989phylogenetic]. To compare the evolution of continuous characters with categorical aspects of the diet, we carried out a phylogenetic logistic regression (R nlme::gls using the 'corBrownian' function for the argument ‘correlation’).

DAPC optimization: Some taxa have inapplicable states for certain absent characters (such as the length of a nematocyst subtype that is not present in a species), which are problematic for DAPC analyses. We tackled this by transforming the absent states to zeroes. This approach allows us to incorporate all the data, but creates an attraction bias between small character states (*e.g.* small tentilla) and absent states (*e.g.* no tentilla). Absent characters are likely to be very biologically relevant to prey capture and we believe they should be accounted for in a predictive approach. We limited the number of linear discriminant functions retained to the number of groupings in each case. We selected the number of principal components retained using the a-score optimization function (R adegenet::optim.a.score) [@jombart2010discriminant] with 100 iterations, which yielded more stable results than the cross validation function (R adegenet::xval). This optimization aims to find the compromise value with highest discrimination power with the least overfitting. The discriminant analysis for feeding guild (7 principal components, 4 discriminants) produced 100% discrimination, and the highest loading contributions were found for the characters (ordered from highest to lowest): Involucrum length, heteroneme volume, heteroneme number, total heteroneme volume, tentacle width, heteroneme length, total nematocyst volume, and heteroneme width (S10).

Usage notes -- Alignment used in concatenated ML phylogenetic inference -- Alignment used in Bayesian partitioned phylogenetic inference
Purcell1981_selectivity.csv -- Selectivity values calculated from the raw diet and co-located plankton tows in Purcell J.E. (1981) Dietary composition and diel feeding patterns of epipelagic siphonophores. Marine Biology.
raw_categorical_data.csv  -- Categorical character presence/absence data. -- Alignment used in Bayesian partitioned phylogenetic inference
literature_diet_data.tsv -- Dietary data derived from the literature including bibliographical source, prey type, sampling method, location, and sample size.
R_Code.R -- Code used to produce all results reported in this article.
PGLS_Pval_preylength.csv -- P-values from the pairwise PGLS analyses between copepod prey length data (from Purcell1984) and continuous characters.

PGLS_SIGN_preylength.csv -- Sign of the relationship of the PGLS analyses between copepod prey length data (from Purcell1984) and continuous characters.
Purcell1984_preylength.csv -- Copepod prey length data from Purcell, J. E. (1984). The functions of nematocysts in prey capture by epipelagic siphonophores (Coelenterata, Hydrozoa). The Biological Bulletin166(2), 310-327. Used to test correlations with heteroneme nematocyst size.
raw_ROV_data.csv -- Dietary data derived from ROV observations published in Choy, C. A., Haddock, S. H., & Robison, B. H. (2017). Deep pelagic food web structure as revealed by in situ feeding observations. Proceedings of the Royal Society B: Biological Sciences284(1868), 20172116

IQtrees.pdf  -- ML contained and unconstrained trees produced by IQtree, plus the constrain applied
RevBayestimetrees.pdf -- Ultrametric time trees produced by RevBayes
RevBayestrees.pdf -- Topology-estimating constrained and unconstrained trees produced by RevBayes
RevBayes_topology_scripts.pdf -- Scripts used in RevBayes for topology inferences (contained & unconstrained) and for time-tree branch-length inference (constrained and unconstrained)
species_mean_data.pdf  -- Continuous character data organized by species means and standard deviations.
character_definitions.pdf  -- Formal definitions of the characters used.
accession_numbers.pdf  -- Accession numbers for the 18S and 16S gene data used in phylogenetic inference.


National Science Foundation, Award: NSF-OCE 1829835

Fulbright España

Fulbright España