Skip to main content

Around the world in 10 million years: rapid dispersal of a kleptoparasitoid spider wasp (Pompilidae: Ceropales)

Cite this dataset

Rodriguez, Juanita et al. (2021). Around the world in 10 million years: rapid dispersal of a kleptoparasitoid spider wasp (Pompilidae: Ceropales) [Dataset]. Dryad.


Aim: Our aim was to estimate the historical biogeography of the kleptoparasitoid genus Ceropales and to determine the processes leading to its current worldwide distribution. We tested hypotheses of dispersal and vicariance scenarios underlying its widespread distribution.

Location: Worldwide.

Methods: Data from two nuclear markers (the D2–D3 regions of the 28S ribosomal RNA and long-wavelength rhodopsin) and one mitochondrial marker (cytochrome c oxidase I) for 52 specimens of Ceropales were used to reconstruct a dated phylogeny based on Bayesian inference. Two calibration points were used from previous analyses including all pompilids under a lognormal relaxed molecular clock to estimate lineage divergence times. We compared the fit of 12 biogeographical models, modifying the base BioGeoBEARS models to include a dispersal adjacency matrix. Base BioGeoBEARS models allow different cladogenetic processes: DEC (subset sympatry, narrow vicariance), DIVALIKE (narrow and wide vicariance), BAYAREALIKE (widespread sympatry), and +J versions of these that allow jump dispersal. Using the model with the best AIC score, we performed Biogeographic Stochastic Mapping (BSM) in order to infer biogeographic processes. We simulated 200 BSM using the DEC+J model and the consensus tree for the BEAST analysis.

Results: The origin of crown-group Ceropales was in the early Miocene, ca. 10.6 Ma (15.7–6.5 95% HPD), and eight dispersal events explain its widespread distribution. A constrained model, where only adjacent areas were allowed for dispersal had the highest likelihood under DEC+J model.

Main Conclusions: The widespread distribution of Ceropales can be explained by eleven jump-dispersal events that took place in a period of ca. 10 million years. Two separate dispersals at different times happened from the Eurasia to the Nearctic. These probably occurred across the Bering land bridge in the late Miocene and Pliocene. Dispersal from North and Mesoamerica to South America took place four independent times from the late Miocene to close to present time. Dispersal to the Ethiopian region from Eurasia occurred in the late Miocene and Pliocene. Dispersal back to Eurasia from the Ethiopian region took place three times independently in the Pliocene to close to present time. Dispersal to the Australian region took place from the late Miocene to the Pleistocene.


DNA was extracted from pinned specimens using the Roche DNA Isolation Kit for Cells and Tissues (Roche Molecular Systems, Inc., Pleasanton, USA). The specimen pin was removed and specimens placed in a proteinase K and lysis buffer solution from 24–48 hours. Further steps of the extraction followed the instruction manual. Amplification and sequencing of long-wavelength rhodopsin (LWRh), and the D2–D3 regions of the 28S ribosomal RNA (28S) followed the methods outlined in Pilgrim & Pitts (2006). Amplification of cytochrome c oxidase I followed the methods in Rodriguez et al. (2014) (for primers used see Table S1 in Appendix S1). Raw sequencing reads were processed and assembled with Sequencher 4.1 (Gene Codes Corp., Ann Arbor, MI, USA).

Phylogenetic and Dating Analyses

Alignment of each marker was performed with ClustalW (Thompson et al., 1994) implemented in Geneious 6.1 (created by Biomatters, available at and manually refined. Introns were removed from the LWRh alignment. All markers were concatenated into a supermatrix and the model of molecular evolution for each marker and codon position (the latter for LWRh and COI) was determined using PartitionFinder 1.01 (Lanfear et al., 2012; The combined-data phylogeny was inferred through a Bayesian analysis in MrBayes 3.2 (Huelsenbeck & Ronquist, 2001) by running two independent MCMC (Markov Chain Monte Carlo) for 100,000,000 generations. Chain convergence and ESS (Estimated Sample Size) were assessed in Tracer 1.5 (Rambaut et al., 2003). The first 10% generations were removed as burn-in. Single-gene topologies were also estimated under the best-fit model and visually inspected to detect possible topological incongruencies. An uncorrelated lognormal relaxed-clock model (Drummond et al., 2006; Drummond & Rambaut, 2007) was used for divergence time estimation in beast 1.7.5 (Drummond et al., 2012) using the partitioned supermatrix. Substitution models were unlinked among partitions (see Table S2 in Appendix S2), with the underlying clock and trees linked. Because of the lack of fossils in Ceropalinae, we performed a secondary calibration using previously published ages for Ceropalinae and Ceropales (Waichert et al. 2015). Hard minimum ages were placed as priors for the age of crown-group Ceropalinae (lognormal distribution, mean in real space=8.5, stdev=1, offset=13.3) and Ceropales (lognormal distribution, mean in real space=8.3, SD=1, offset=5.3) by setting the offset of the lognormal distribution to the lower bound of the 95% HPD (Highest Posterior Density) reported by Waichert et al. (2015). This approach allowed for the highest density of the prior probability to be placed close to the minimum age encountered by Waichert et al. (2015), allowing the MCMC to search for older dates at a lower density. The MCMC search was run for 100,000,000 generations. Chain convergence and ESS were assessed with Tracer 1.5. Independent runs were assembled with LogCombiner 1.7.5 (Drummond et al., 2012). The first 10% of the generations were discarded as burn-in.

Ancestral Area Estimation

Areas were established as North and Mesoamerica, South America, Eurasia, Africa and Australia (Figure 2). Because of the young age of Ceropales, were established as such to allow for modelling of dispersal across the Isthmus of Panama, North Atlantic Land Bridge, Bering Land Bridge, Mediterranean Sea, and through southeast Asia.  We performed an event-based likelihood ancestral-range estimation implementing three models of range evolution in BioGeoBEARS (Matzke, 2014). The models evaluated were: 1) DEC, an implementation of Lagrange's model of dispersal–extinction–cladogenesis (Dispersal Extinction Cladogenesis; Ree and Smith, 2008); 2) DEC+J, includes jump dispersal or allopatric founder-event speciation (Matzke, 2014); 3) a likelihood-based implementation of the process assumptions of dispersal–vicariance analysis (DIVA), originally parsimony-based (Ronquist, 1996); 4) DIVALIKE+J, including jump-dispersal (Matzke, 2014); 5) BAYAREAlike, a likelihood implementation of the process assumptions of BayArea, originally Bayesian (Landis et al., 2013); and 4) BAYAREAlike+J, allowing jump-dispersal (Matzke, 2014). All six models were evaluated under a constrained analysis, where dispersal to non-adjacent areas was not allowed and an unconstrained analysis allowing all possible dispersal events; we therefore evaluated 12 possible scenarios. These constraints were placed mainly because 1) there is a very low probability of interoceanic dispersal, 2) the small ranges of spider wasps are dictated by their limited dispersal ability, and 3) the age of Pompilidae, which suggests that younger areas do not include disjunct distributions because they have been isolated since the origin of the group (see Rodriguez et al., 2015). The log-likelihood of each pair of nested models (i.e., DEC and DEC+j, DIVAlike and DIVAlike+j, BAYAREAlike and BAYAREAlike+j) was compared using a likelihood-ratio test. In addition, the AIC (Akaike Information Criterion, Burnham & Anderson, 1998) was calculated for all models. Using the model with the best AIC score, we performed Biogeographic Stochastic Mapping (BSM) in BioGeoBEARS (Matzke, 2014) to simulate possible biogeographic processes. We simulated 200 BSM using the DEC+J model and the consensus tree for the BEAST analysis. Simulated processes were extracted from the simulation to determine the process with the highest support.

Usage notes

BAYAREALIKE+J_200BSM_stochastic_maps.pdf: The 200 simulations of possible biogeographic processes performed through Biogeographic Stochastic Mapping (BSM) in BioGeoBEARS. 

Cero28SCOIOpsNOintrPOFeb28names.nex: A concatenated supermatrix of all molecular markers used in this study (28S, COI and Opsin). A MrBayes block defines character blocks and models of molecular evolution used. 

CeroLessInformative_2.xml: BEAST .xml file generated using BEAUti. Specifies the data matrix, partitions, models, priors and mcmc chain settings.

Cero-FINAL.tre: Ceropales chronogram produced from the BEAST analysis (see methods). This is also input for BioGeoBEARS script.

CeropalesAreasConstrained_byContinent_Apr2020.R: BioGeoBEARS R script modified from Nicholas J. Matzke, License: GPL-3, used for model estimation and generating AIC statistics for each model. This should run using the .tre file provided and the files"multipliers_new_continent.txt" which includes dispersal multipliers used for the analysis and "distributionCeropales_LessAreas_continents.txt" which includes taxa distributions.

Ceropales-BSM-JBI-revision-continents.R: BioGeoBEARS R script used for Biogeographic Stochastic Mapping and summarising events per node.

Ceropales-node-numbers.pdf: Ceropales phylogeny plotted with node numbers (these are referred to in the file "Event-simulation-results.xlsx")

Event-simulation-results.xlsx: Results from BSM summarised per node and statistics for each of them.