Skip to main content

Sexual signals persist over deep time: ancient co-option of bioluminescence for courtship displays in cypridinid ostracods

Cite this dataset

Ellis, Emily et al. (2022). Sexual signals persist over deep time: ancient co-option of bioluminescence for courtship displays in cypridinid ostracods [Dataset]. Dryad.


Although the diversity, beauty, and intricacy of sexually selected courtship displays command the attention of evolutionists, the longevity of these traits in deep time is poorly understood. Population-based theory suggests sexual selection could either lower or raise extinction risk, resulting in high or low persistence of lineages with sexually selected traits. Furthermore, empirical studies that directly estimate longevity of sexually selected traits are uncommon. Sexually selected signals - including bioluminescent courtship - originated multiple times during evolution, allowing empirical study of their longevity after careful phylogenetic and divergence time analyses. Here, we estimate the first transcriptome-based molecular phylogeny and divergence times of Cypridinidae. We report extreme longevity of bioluminescent courtship, a trait important in mate choice and probably under sexual selection. Our relaxed-clock estimates of divergence times coupled with stochastic character mapping show luminous courtship evolved only once in Cypridinidae in a subtribe we name Luxorina at least 151 million years ago (mya) from cypridinid ancestors that used bioluminescence only in anti-predator displays, defining a tribe we name Luminini. This time-calibrated molecular phylogeny of cypridinids will serve as a foundation for integrative and comparative studies on the biochemistry, molecular evolution, courtship, diversification, and ecology of cypridinid bioluminescence. The persistence of luminous courtship for hundreds of millions of years indicates that rates of speciation within the group exceeded extinction risk, allowing for the persistence of a diverse clade of signalling species and that sexual selection did not cause rapid loss of associated traits.


Sampling Strategy and Collection

Based on previous morphological phylogenies and known distributions of species, we aimed to balance cost-efficiency and diversity of sampling by choosing eight primary sampling locations. We targeted species with luminous courtship mainly from five localities: Jamaica, Honduras (Roatan), Belize, Panama, and Puerto Rico. We targeted species outside the courtship signaling clade from three localities: Australia, Japan, and the United States. We collected ostracods via net sampling during courtship displays, sediment sampling, or carrion traps (see Cohen and Oakley (2017) for sampling methods and Table S1 for taxa sampled). We preserved samples in 95% ethanol for vouchers and RNAlater for sequencing. To collect taxa with luminous courtship (where many species are undescribed), we targeted unique courtship displays to isolate individual species because ecologically overlapping courtship signals are distinct enough to distinguish species (Gerrish and Morin 2016). We also measured length and height of carapaces to assign species to genera in the field, prior to molecular analysis, and to affirm single-species identity of pooled individuals (Table S2). The ratio of carapace length to height is usually diagnostic for genera of signaling cypridinids and distinguishes sympatric species in combination with signals (Gerrish and Morin 2016; Hensley et al. 2019; Reda et al. 2019).

Taxon Sampling and Sequencing

We generated new RNA-Seq data from 45 species of cypridinid ostracods and four species of non-cypridinid ostracods. We included outgroups from all four non-cypridinid families of Myodocopida, sequencing one species each of Rutidermatidae and Sarsiellidae, and two of Philomedidae. We also used published sequence data from a cylindroleberid (SRX2085850) (Schwentner et al. 2018), and predicted proteins from the genome of Darwinula stevensoni (ENA PRJEB38362) (Schön et al. 2021). For transcriptome sequencing, we combined RNA from whole bodies of up to 10 pooled individuals (collected by signal but with identity corroborated by morphological features). Pooling is necessary due to the small size of the animals (≤ 2 mm) and low RNA yield. For some samples, we extracted RNA using Trizol, and for the rest, we used Qiagen RNEasy Kits. We either used the Illumina or NEBNext Ultra RNA Library Prep Kits to prepare transcriptome libraries (Table S1).

Quality Control and Dataset Assembly

We trimmed RNA-seq adapters and low-quality bases (<20) using TrimGalore v0.4.1 (Krueger 2012), then assembled trimmed reads using Trinity v2.2.0 (Grabherr et al. 2011). In cases of multiple transcriptomes from the same species, we combined raw, trimmed reads to create single assemblies. We used CroCo v0.1 (Simion et al. 2018) to remove contigs in a given assembly both found and expressed at a higher level in a transcriptome from another species (that was sequenced on the same lane), which implies cross-contamination. We assessed completeness using BUSCO v3.0.1 (Simão et al. 2015) using the arthropod_odb9 single-copy ortholog database. We generally required transcriptomes to have at least 100 complete BUSCO genes for further analyses. However, we made exceptions for three species, (Photeros jamescasei, Jimmorinia gunnari, and Melavargula japonica), to maintain these described and taxonomically important species within the analyses. We produced protein predictions using TransDecoder v3.0.1 (Haas et al. 2013), with default parameters.

Orthology Determination

For analyses that relied on comparing orthologs (excluding paralogs), we used Orthofinder (Emms and Kelly 2019) plus phylopypruner. We grouped genes with Orthofinder 2.5.2 (Emms and Kelly 2019), using an MCL inflation parameter set to 2.1 and used Diamond as the similarity search program. This allowed us to cluster sequences based on similarity into “orthogroups” that contain a mix of orthologs and paralogs. Next, we created gene trees for the separate orthogroups by aligning each using the ‘auto’ method of MAFFT v7.305 (Katoh and Standley 2013) and using maximum likelihood implemented in FastTree v2.1.9 (Price et al. 2010), chosen for its speed and displayed in final results because are congruent with concatenation methods using IQ-TREE2 (Minh et al. 2020), and assuming a JTT+CAT. We used phylopypruner 1.2.3 (Thalén 2018) to remove paralogs, requiring a minimum of 20 species for each gene family, minimum support values of 0.7, and used the LS pruning method. We refer to this resulting set of orthologs as the “ppp orthologs”.

Previously Published Datasets

We also added single-gene mitochondrial data (12S, 16S, CO1), summarized in Table S5 reported previously in conference proceedings (Torres and Gonzalez 2007) or publications (Ogoh and Ohmiya 2004; Wakayama and Abe 2006; Wang et al. 2019; Goodheart et al. 2020; Pham et al. 2020). We then identified these three mitochondrial genes from transcriptomes by finding the longest contigs with high similarity to published complete mitochondrial genomes of V. hilgendorfii and V. tsujii (Ogoh and Ohmiya 2004; Goodheart et al. 2020). We annotated mitochondrial data from transcriptomes using MITOS (Bernt et al. 2013), then aligned each mitochondrial gene using ‘auto’ in MAFFT v7.305 (Katoh and Standley 2013).

Phylogenetic Reconstruction

We compared coalescence and concatenation-based analyses using Maximum Likelihood and Bayesian approaches. For coalescence-based approaches, we used ASTRAL-Pro (Zhang et al. 2020), which assumes the multi-species coalescent to estimate a species tree from multiple gene trees, including paralogs. We used the same gene trees as input for ASTRAL-Pro as phylopypruner (described above). Next, we determined the best-fit partitioning scheme for each concatenated dataset using PartitionFinder v.2.1.1 (Lanfear et al. 2016) with RAxML (Stamatakis 2014) and rcluster (Lanfear et al. 2014). Using optimal partitioning schemes and models, we produced species trees in IQ-TREE v2.0.3, with additional options for -bnni and -alrt (set to 1000). We calculated gene Concordance Factors (gCF) using IQ-TREE (Minh et al. 2020).

Divergence time estimation

We estimated divergence times of nodes in the cypridinid phylogeny using a Bayesian relaxed molecular clock approach in MrBayes (Ronquist and Huelsenbeck 2003), which offers a diversity of models, estimates tree topology and branch lengths in a Bayesian framework, and has a scriptable command line interface, leading to more repeatable science. Because Bayesian methods are computationally demanding (Smith et al. 2018), we chose 15 orthologs to use for divergence time estimation, as follows. We used sortadate (Smith et al. 2018) to rank all gene trees based on their consistency with the species tree inferred from all genes in ASTRAL-Pro and based on the most consistent root-to-tip branch lengths. To test the sensitivity of divergence time estimates to the number of genes selected, we compared results from 15 gene trees to age estimates from 20, 25, and 30 gene trees (Fig. S10). Although ostracods have a dense fossil record, most fossil ostracods are podocopids, and assigning fossils confidently to taxonomic groups is often challenging (Tinn and Oakley 2008), restricting us to two fossil constraints. The first was Myodocopida, for which we used an offset exponential prior with a minimum age of 448.8 Ma and a maximum of 509 Ma (Oakley et al. 2013; Wolfe et al. 2016). We also used a uniform prior between 443.8 and 509 Ma on the root of the tree, which is the common ancestor of Ostracoda. Along with node constraints, we used the Independent Gamma Rate (IGR) model (Lepage et al. 2007; Zhang 2016). We used a prior of exp(10) for variation in the IGR model. For the prior on the overall rate of the molecular clock (substitution rate/site/Myr), we assumed a lognormal distribution with a mean of 0.001 and standard deviation of 0.0007. For the model of amino acid substitution, we assumed fixed-rate models and used the command aamodelpr=mixed, which explores multiple fixed-rate models in proportion to their posterior probabilities. With these models and priors, we ran 500K steps of Markov Chain Monte Carlo (MCMC) in two chains, using a starting phylogeny based on parsimony.

Ancestral state reconstructions

We reconstructed ancestral states for bioluminescence (presence/absence) and bioluminescent courtship (presence/absence) using our tree generated with ASTRAL-Pro. In order to create an ultrametric tree for ancestral state reconstruction analysis, we used mean ages for nodes matched to those from our Bayesian relaxed clock analysis. We set matching branch lengths using the ph_bladj function in the R package phylocomr (Webb et al. 2008; Ooms and Chamberlain 2019). We assessed fit for two models (for each character) using corrected Akaike Information Criterion (AICc), where: (i) transition rates were equal (ER), and (ii) forward and reverse transitions were different between states (all rates different, ARD). The ER model (AICc = 24.02231) was slightly better than the ARD model (AICc = 25.72293) for bioluminescence and for bioluminescent courtship (ER AICc = 23.799; ARD AICc = 24.70163), so we used ER for final ancestral state reconstruction analyses in R using the rayDISC function in corHMM (Beaulieu et al. 2013). To test for significance of the reconstruction at each node, we used proportional likelihood significance tests assuming a log-likelihood difference of 2 or greater represents a significant difference (Pagel 1999).

To estimate timing of cypridinid bioluminescence and luminous courtship, we conducted time-tree stochastic character mapping (ttscm) (Alexandrou et al. 2013), which simulates character evolution with stochastic character mapping (Huelsenbeck et al. 2003) along time-calibrated phylogenies from the MCMC search to represent the posterior distribution of trees. While ancestral state reconstructions depict character state changes only at nodes, stochastic character maps allow state changes along branches. When branches are time-calibrated, this approach provides absolute estimates of the timing of character state changes. We used stochastic character mapping in phytools (Revell 2012) using the same character states and models as described for ancestral state reconstruction. We used 100 time-calibrated trees from the MCMC search to represent the posterior probability distribution of tree topologies and branch lengths and summarized the timing character state changes in histograms.


  • Alexandrou M.A., Swartz B.A., Matzke N.J., Oakley T.H. 2013. Genome duplication and multiple evolutionary origins of complex migratory behavior in Salmonidae. Mol. Phylogenet. Evol. 69:514–523.
  • Beaulieu J.M., O’Meara B.C., Donoghue M.J. 2013. Identifying hidden rate changes in the evolution of a binary morphological character: the evolution of plant habit in campanulid angiosperms. Syst. Biol. 62:725–737.
  • Bernt M., Donath A., Jühling F., Externbrink F., Florentz C., Fritzsch G., Pütz J., Middendorf M., Stadler P.F. 2013. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol. Phylogenet. Evol. 69:313–319.
  • Cohen A.C., Oakley T.H. 2017. Collecting and processing marine ostracods. J. Crustacean Biol.
  • Emms D.M., Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20:238.
  • Gerrish G.A., Morin J.G. 2016. Living in sympatry via differentiation in time, space and display characters of courtship behaviors of bioluminescent marine ostracods. Mar. Biol. 163:190.
  • Goodheart J.A., Minsky G., Brynjegard-Bialik M.N., Drummond M.S., Munoz J.D., Fallon T.R., Schultz D.T., Weng J.-K., Torres E., Oakley T.H. 2020. Laboratory culture of the California Sea Firefly Vargula tsujii (Ostracoda: Cypridinidae): Developing a model system for the evolution of marine bioluminescence. Sci. Rep. 10:10443.
  • Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B.W., Nusbaum C., Lindblad-Toh K., Friedman N., Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29:644–652.
  • Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Couger M.B., Eccles D., Li B., Lieber M., MacManes M.D., Ott M., Orvis J., Pochet N., Strozzi F., Weeks N., Westerman R., William T., Dewey C.N., Henschel R., LeDuc R.D., Friedman N., Regev A. 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8:1494–1512.
  • Hensley N.M., Ellis E.A., Gerrish G.A., Torres E., Frawley J.P., Oakley T.H., Rivers T.J. 2019. Phenotypic evolution shaped by current enzyme function in the bioluminescent courtship signals of sea fireflies. Proc. Biol. Sci. 286:20182621.
  • Huelsenbeck J.P., Nielsen R., Bollback J.P. 2003. Stochastic mapping of morphological characters. Syst. Biol. 52:131–158.
  • Katoh K., Standley D.M. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30:772–780.
  • Krueger F. 2012. Trim Galore: a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries. URL http://www. bioinformatics. babraham. ac. uk/projects/trim_galore/. (Date of access: 28/04/2016).
  • Lanfear R., Calcott B., Kainer D., Mayer C., Stamatakis A. 2014. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol. Biol. 14:82.
  • Lanfear R., Frandsen P.B., Wright A.M., Senfeld T., Calcott B. 2016. PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Molecular Biology and Evolution. 34:772–773.
  • Lepage T., Bryant D., Philippe H., Lartillot N. 2007. A general comparison of relaxed molecular clock models. Mol. Biol. Evol. 24:2669–2680.
  • Minh B.Q., Hahn M.W., Lanfear R. 2020. New Methods to Calculate Concordance Factors for Phylogenomic Datasets. Mol. Biol. Evol. 37:2727–2733.
  • Oakley T.H., Wolfe J.M., Lindgren A.R., Zaharoff A.K. 2013. Phylotranscriptomics to bring the understudied into the fold: monophyletic ostracoda, fossil placement, and pancrustacean phylogeny. Mol. Biol. Evol. 30:215–233.
  • Ogoh K., Ohmiya Y. 2004. Complete mitochondrial DNA sequence of the sea-firefly, Vargula hilgendorfii (Crustacea, Ostracoda) with duplicate control regions. Gene. 327:131–139.
  • Ooms J., Chamberlain S. 2019. phylocomr: Interface to “Phylocom.” Comprehensive R Archive Network (CRAN).
  • Pagel M. 1999. The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies. Systematic Biology. 48:612–622.
  • Pham H.T.M., Tanaka H., Karanovic I. 2020. Molecular and Morphological Diversity of Heterodesmus Brady and Its Phylogenetic Position within Cypridinidae (Ostracoda). Zoological Science. 37:240.
  • Price M.N., Dehal P.S., Arkin A.P. 2010. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 5:e9490.
  • Reda N.J., Morin J.G., Torres E., Cohen A.C., Schawaroch V., Gerrish G.A. 2019. Maristella, a new bioluminescent ostracod genus in the Myodocopida (Cypridinidae). Zool. J. Linn. Soc.
  • Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3:217–223.
  • Ronquist F., Huelsenbeck J.P. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19:1572–1574.
  • Schön I., Rodriguez F., Dunn M., Martens K., Shribak M., Arkhipova I.R. 2021. A Survey of Transposon Landscapes in the Putative Ancient Asexual Ostracod Darwinula stevensoni. Genes. 12.
  • Schwentner M., Richter S., Rogers D.C., Giribet G. 2018. Tetraconatan phylogeny with special focus on Malacostraca and Branchiopoda: highlighting the strength of taxon-specific matrices in phylogenomics. Proc. Biol. Sci. 285.
  • Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31:3210–3212.
  • Simion P., Belkhir K., François C., Veyssier J., Rink J.C., Manuel M., Philippe H., Telford M.J. 2018. A software tool “CroCo” detects pervasive cross-species contamination in next generation sequencing data. BMC Biol. 16:28.
  • Smith S.A., Brown J.W., Walker J.F. 2018. So many genes, so little time: A practical approach to divergence-time estimation in the genomic era. PLoS One. 13:e0197433.
  • Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30:1312–1313.
  • Thalén F. 2018. PhyloPyPruner: Tree-based Orthology Inference for Phylogenomics with New Methods for Identifying and Excluding Contamination. .
  • Tinn O., Oakley T.H. 2008. Erratic rates of molecular evolution and incongruence of fossil and molecular divergence time estimates in Ostracoda (Crustacea). Mol. Phylogenet. Evol. 48:157–167.
  • Torres E., Gonzalez V.L. 2007. Molecular Phylogeny of Cypridinid Ostracodes and the Evolution of Bioluminescence. Proceedings of the 14th International Symposium on Bioluminescence and Chemiluminescence: Chemistry, Biology, and Applications.:269–272.
  • Wakayama N., Abe K. 2006. The evolutionary pathway of light emission in myodocopid Ostracoda. Biol. J. Linn. Soc. Lond. 87:449–455.
  • Wang X., Xu Q., Xiao J., Miao X., Liu P., Wang Z. 2019. First record of the complete mitochondrial genome of Cypridina dentata (Myodocopida: Cypridinidae). Mitochondrial DNA Part B. 4:1607–1608.
  • Webb C.O., Ackerly D.D., Kembel S.W. 2008. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics. 24:2098–2100.
  • Wolfe J.M., Daley A.C., Legg D.A., Edgecombe G.D. 2016. Fossil calibrations for the arthropod Tree of Life. Earth-Sci. Rev.
  • Zhang C. 2016. Molecular Clock Dating using MrBayes. arXiv [q-bio.PE].
  • Zhang C., Scornavacca C., Molloy E.K., Mirarab S. 2020. ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy. Mol. Biol. Evol. 37:3292–3307.

Usage notes

List of analysis directories included *.zip files with reference to which figures these analyses were used to generate in the manuscript and in the supplement (Supplement - Revision1 - Aug_25.pdf).

Transcriptome analyses:

  • Transcriptome assemblies (CDS and AA) -
  • ASTRAL-Pro with all orthofinder orthologs (Fig 1, S1, S2) -
  • IQ-TREE concatenated orthofinder/phylopypruner (Fig S3) and concordance analysis (Fig S4) -

Time Tree analyses:

  • Sortadate and Bayesian relaxed molecular clock analysis (Fig S5, S6, S7, S8) -
  • Time tree stochastic character mapping (Fig 2b,c) -

Combined transcriptome and mitochondrial phylogeny:

  • Combined mt and transcriptome (Figs 3, S9, S10) and mt only (S11) -
  • Ancestral State reconstructions (Figs S12, S13) -


National Science Foundation, Award: DEB-1457754

National Science Foundation, Award: DEB-1146337

National Science Foundation, Award: DEB-1457462

National Science Foundation, Award: DEB-1457439

National Science Foundation, Award: BIO PRFB-1711201

National Science Foundation, Award: BIO PRFB-1515576

National Science Foundation, Award: BIO PRFB-1702011