Pervasive convergent evolution of sperm conjugation across the Arthropoda tree of life
Data files
May 12, 2026 version files 886.50 MB
-
README.md
10.42 KB
-
Supplementary_Data_1.xlsx
135.60 KB
-
Supplementary_Data_10.trees
609.97 MB
-
Supplementary_Data_2.xlsx
48.37 KB
-
Supplementary_Data_3.tre
680.87 KB
-
Supplementary_Data_4.txt
8.89 KB
-
Supplementary_Data_5.txt
15.29 KB
-
Supplementary_Data_6.xlsx
1.03 MB
-
Supplementary_Data_7.nex
264.99 MB
-
Supplementary_Data_8.xml
855.83 KB
-
Supplementary_Data_9.log
8.76 MB
Abstract
Theory suggests that ejaculates should evolve heightened functionality through integration of their component parts: spermatozoa, seminal fluid, and ejaculate structures. Here, we exhaustively review the vast literature on sperm ultrastructure in arthropods to examine sperm conjugation - a form of social cooperation among sperm - in addition to its relationship with an ostensible antecedent: membrane-bound, extracellular sperm-associated material (SAM). Our reconstructions suggest that sperm conjugation first arose on the branch leading to insects and remipedes during the Cambrian or Ordovician Periods (i.e., 452.6 – 508.5 million years ago). Since then, arthropods and related ecdysozoans have spent an estimated two-thirds of their time with sperm conjugation, which has been evolutionarily lost and gained approximately 45 times each. We show that most evolutionary derivations of conjugation occurred following the origin of SAM, with SAM in "proto-conjugates" facilitating subsequent diversification of multiple conjugate types. Finally, comparative analyses of proteomes of the convergently-derived spermatostyles (specialized rods of SAM to which sperm attach) of true bugs and beetles indicate parallel utilization of a common genetic toolkit that may draw upon deep homology for the evolution of complex ejaculate adaptations. Our analyses reveal an ancient, pervasive and dynamic history of evolutionary experimentation with ejaculate form and function.
Dataset DOI: 10.5061/dryad.79cnp5jbm
Description of the data and file structure
Data on sperm conjugation presence/absence, sperm associated material presence/absence, and multistate conjugate type were mined from the literature for 621 species of arthropods and related outgroups. The phylogeny was obtained using a custom supertree approach, and divergence time estimates were inferred using BEAST version 2.6.3. with several fossil calibrations priors. The resulting ultrametric tree was paired with the phenotypic data to reconstruct sperm conjugate diversification using phylogenetic software in R and the stand-alone software BayesTraits. Ancestral state reconstructions revealed that spermatostyles, a unique ejaculate structure, have evolved independently in both true bugs and beetles, which motivated our pursuit of comparative proteomics. Briefly, we obtained de novo proteomes in four species of true bugs using bottom-up tandem mass spectrometry and species-specific de novo protein annotations using Trinity and Trinotate.
Files and variables
File: Supplementary_Data_1.xlsx
Description: Sperm phenotypic data for 621 tip species across three Excel Spreadsheets. Sheet one contains the sperm phenotypic data variables as described here in the README file. Sheet two contains brief column descriptions pertaining to the data in Sheet one. Sheet three contains the full citations for all research studies underpinning the phenotypes listed in Sheet one. Note entries in Sheet 1 that include a question mark '?' refer to character states that are partially ambiguous; brief comments are provided in the 'Comments' column of Sheet 1.
Variables
- Tip_label: The tip label, typically the species name of the species sampled for sperm, but it may be the name of a congener or near relative
- Phylum: Name of phylum. We sampled 615 arthropods and 6 Ecdysozoa outgroup species.
- Subphylum: Subphylum name. Most of our sampling is within the four subphyla of arthropods. (see Brusca et al. 2023 and WoRMS (Ahyong et al. 2026) for more details).
- Class: Class name. (see Brusca et al. 2023 and WoRMS (Ahyong et al. 2026) for more details).
- Order: Order name. (see Brusca et al. 2023 and WoRMS (Ahyong et al. 2026) for more details).
- Family: Family name. (see Brusca et al. 2023 and WoRMS (Ahyong et al. 2026) for more details).
- Genus: Genus name. (see Brusca et al. 2023 and WoRMS (Ahyong et al. 2026) for more details).
- Species: Species name. (see Brusca et al. 2023 and WoRMS (Ahyong et al. 2026) for more details).
- Conjugation: Conjugation present/absent (0 absent, 1 present)
- SAM: SAM present/absent (0 absent, 1 present)
- ConjugationANDSAM: Both binary Conjugation and SAM collapse into 3 states as conjugation without SAM was not observed; (0 unconjugated sperm, 1 unconjugated sperm with SAM, 2 conjugated sperm [with SAM])
- ConjugationMulti: Conjugation presence and type. Unconjugated species are coded as state 1. Conjugation when present occurs in 5 discrete types (states 2-6: 2 paired, 3 ensheathed, 4 aggregate, 5 rouleaux, 6 spermatostyle)
- SpermDataComments: Additional aspects of sperm morphology
- TractRegion: Region(s) of male reproductive tract isolated for sperm morphology in original article
- SpermRef: Original reference
- Comments: Notes from the literature survey
File: Supplementary_Data_2.xlsx
Description: Phylogenomic and phylogenetic references supporting the placement of all species in the phylogeny across two Excel Spreadsheets. Sheet one contains the phylogenetic data variables as described here in the README file. Sheet two contains the full citations for all research studies cited in Sheet one.
Variables
- Tip: The tip label, typically the species name of the species sampled for sperm, but it may be the name of a congener or near relative
- Higher-level clade: The more inclusive clade (i.e., a more ancestral node and its descendants)
- Higher-level clade placement: Phylogenetic position of the higher-level clade.
- Reference for placement within higher-level clade: Phylogenetic research study or studies that justify the placement of the tip within the clade (i.e., its lower-level position).
- Comments: Phylogenetic details to provide further context for the placement of the tip in the tree.
File: Supplementary_Data_3.tre
Description: Time-calibrated ultrametric phylogeny of 621 arthropods and outgroup species used for all final analyses. The phylogeny is in Nexus-format and can be read by many programs, including the tree viewing software FigTree.
File: Supplementary_Data_4.txt
Description: Ancestral state character estimates of binary conjugation (presence/absence) for the phylogeny based on N = 1000 stochastic mappings of conjugation under an 'all rates different' evolutionary model.
Variables
- Node: The node number in the final ultrametric phylogeny.
- Unconjugated: The posterior probability of unconjugated sperm.
- Conjugated: The posterior probability of conjugated sperm.
File: Supplementary_Data_5.txt
Description: Ancestral state character estimates of multistate conjugate type for the phylogeny based on N = 1000 stochastic mappings of conjugation presence and type under an 'all rates different' evolutionary model. Note polymorphic conjugates were addressed using Bayesian priors giving each co-occurring state equal prior probability.
Variables
- Node: The node number in the final ultrametric phylogeny.
- Unconjugated: The posterior probability of unconjugated sperm.
- Paired: The posterior probability of paired conjugates.
- Ensheathed: The posterior probability of ensheathed conjugates.
- Aggregate: The posterior probability of aggregate conjugates.
- Rouleaux: The posterior probability of rouleaux conjugates.
- Spermatostyle: The posterior probability of spermatostyle conjugates.
File: Supplementary_Data_6.xlsx
Description: Final list of retained spermatostyle proteins for four true bug species. Each spermaostyle proteome is listed on a separate Excel Spreadsheet and labeled by species name. Each Spreadsheet includes the variables described in the README below. Note columns reporting protein abundance values (i.e., spectral intensities per sample) include positive numbers and zero as well as blanks, which refer to proteins that were identified but lack abundance data.
Variables
- Orthogroup_Hemiptera: Orthology relationships based on analysis of N = 4 Hemiptera references.
- Orthogroup_HemipteraAndGyrinidae: Orthology relationships based on analysis of N = 4 Hemiptera references and N = 6 Gyrinidae references.
- SingleCopy: Whether or not the orthogroup is SingleCopy or not. 'Y' indicates yes; 'N' indicates no.
- Protein GroupProtein ID: PEAKS protein group identifier.
- Accession: Accession identifier derived from the reference FASTA file provided to PEAKS.
- -10lgP: Protein score under a false detection rate of 1 %.
- Coverage (%): The percentage of the protein model covered by the spectral data.
- Coverage (%) Replicate 1: The percent coverage in replicate one of two.
- Coverage (%) Replicate 2: The percent coverage in replicate two of two.
- Intensity Replicate 1: The protein intensity in replicate one of two.
- Intensity Replicate 2: The protein intensity in replicate two of two.
- Avg_Intensity: The average protein intensity across replicates.
- Normalized_Intensity: The normalized intensity using Wisconsin double standardization.
- CumulAbundance: The total cumulative abundance of the protein in a given proteome.
- #Peptides: The number of peptides that matched the protein model.
- #Unique: The number of unique peptides that matched the protein model.
- #Spec Replicate 1: The number of spectra found in replicate one of two.
- #Spec Replicate 2: The number of spectra found in replicate two of two.
- PTM: The post-translational modification(s) found by PEAKS.
- Avg. Mass: The average mass of the protein in kiloDaltons.
- ConservedwGyrinidae: Whether or not the protein family (=orthogroup) is shared with all six Gyrinidae, allowing for one missing species. 'Y' indicates yes; 'N' indicates no.
- SpeciesSpecificOG: Whether or not the protein family (=orthogroup) is specific to a given species of Hemiptera.
- Description: The gene metadata from de novo annotation of protein models using Trinotate.
File: Supplementary_Data_7.nex
Description: Nexus file for Mesquite with the supertree topology of 621 arthropod and outgroup species with branches color-coded by taxonomic order. Made using Mesquite version 3.6.1.
File: Supplementary_Data_8.xml
Description: BEAST version 2.6.3 xml file with all priors and settings used to infer node ages for 621 species of Arthropoda and related outgroup Ecdysozoa.
File: Supplementary_Data_9.log
Description: BEAST version 2.6.3 log file.
File: Supplementary_Data_10.trees.zip
Description: BEAST version 2.6.3 10,000 post burn-in posterior trees in Nexus-format.
Code/software
The free evolutionary software, Mesquite, was used to generate a supertree topology for all sampled species, and the Bayesian phylogenetic software package, BEAST, was used to infer nodal ages (i.e., branch lengths in millions of years). Phylogenetic analyses of sperm phenotypic data were either conducted in R using stochastic character mapping with phytools or in BayesTraits. The RNAseq data were analyzed using Trinity and Trinotate with their associated dependencies. Spermatostyle proteins were identified using custom references in the proteomic software PEAKS. Comparisons among spermatostyle proteomes were based on de novo orthology using OrthoFinder, and Microsoft Excel was used to compare the final lists of retained proteins.
Access information
Other publicly accessible locations of the data:
- NCBI Short Read Archive (PRJNA1270706 and PRJNA1063663)
- ProteomeXchange (PXD064531 and PXD048928)
Data was derived from the following sources:
- The sperm ultrastructure literature, including SpermTree (https://spermtree.org/) for relevant articles and primary data for 611/621 tip species
- Novel slide preparations of sperm isolated from male seminal vesicles for 10/621 tip species
