Species delimitation is a major research focus in evolutionary biology because accurate species boundaries are a prerequisite for the study of speciation. New species delimitation methods (SDMs) can accommodate non-monophyletic species and gene tree discordance as a result of incomplete lineage sorting via the coalescent model, but do not explicitly accommodate gene flow after divergence. Approximate Bayesian Computation (ABC) can incorporate gene flow and estimate other relevant parameters of the speciation process while testing alternative species delimitation hypotheses. We evaluated the accuracy of BPP, SpeDeSTEM, and ABC for delimiting species using simulated data and applied these methods to empirical data from lizards of the Liolaemus darwinii complex. Overall, BPP was the most accurate, ABC showed an intermediate accuracy, and SpeDeSTEM was the least accurate under most simulated conditions. All three SDMs showed lower accuracy when speciation occurred despite gene flow, as found in previous studies, but ABC was the method with the smallest decrease in accuracy. All three SDMs consistently supported the distinctness of southern and northern lineages within L. darwinii. These SDMs based on genetic data should be complemented with novel SDMs based on morphological and ecological data to achieve truly integrative and statistically robust approaches to species discovery.
Alignment of cyt b
Text file in nexus format containing the alignment of cytochrome b mtDNA data of the Liolaemus darwinii complex (77 taxa, 713 bp)
cytb.nex
Alignment of A1D
Text file in nexus format containing the alignment of the anonymous nuclear locus A1D of the Liolaemus darwinii complex (80 taxa, 700 bp)
lda1d.nex
Alignment of A9C
Text file in nexus format containing the alignment of the anonymous nuclear locus A9C of the Liolaemus darwinii complex (80 taxa, 481 bp)
lda9c.nex
Alignment of B6B
Text file in nexus format containing the alignment of the anonymous nuclear locus B6B of the Liolaemus darwinii complex (80 taxa, 415 bp)
ldb6b.nex
Simulation code for evaluation of species delimitation methods
Compressed file containing a Perl script (speciation.pl), associated files, and executables for simulating sequence data and summary statistics used as input of species delimitation methods
Scripts.zip
Input and output of SpeDeSTEM analyses
Compressed file containing the input and output files of SpeDeSTEM analyses based on the empirical data from the Liolaemus darwinii complex
Spedestem_files.tgz
Input and output of BPP analyses
Compressed file containing input and output files of BPP analyses based on empirical data from the Liolaemus darwinii complex
BPP_files.tgz
Input and output of ABC analyses
Compressed file containing simulation code for generating summary statistics used in ABC analyses, observed summary statistics for species delimitation (obs12sust_SD) and parameter estimation (obs44sust_PE) based on empirical data from the Liolaemus darwinii complex, and output of ABC analyses (R_output_SD and R_output_PE) in R
ABC_files.tgz
Table S1
Summary statistics (SuSt) of the priors, the observed data, and the posterior of ABC analyses for species delimitation. Observed SuSt were obtained from the sequence data of four loci for the L. darwinii complex. Mean and standard deviation (StDv) of SuSt were calculated across simulated or observed loci. Abbreviations: π = mean of pairwise differences, S = number of segregating sites, MFS = mutation frequency spectrum. Details about the computation of these SuSt are given in user guide of the program popABC.
TableS1.doc
Table S2
Summary statistics (SuSt) of the prior, the observed data, and the posterior of ABC analyses for parameter estimation. Observed SuSt were obtained from the sequence data of four loci for the L. darwinii complex. Mean and standard deviation (StDv) of SuSt were calculated across simulated or observed loci. Abbreviations: π = mean of pairwise differences, MFS = mutation frequency spectrum, NmS = Nm-like based on the number of segregating sites, privS = number of private segregating sites, 25%Q = 25% quartile, 75%Q = 75% quartile.
TableS2.doc
Table S3
Model selection with AIC criteria of alternative demographic models estimated with IMa2. Models are nested within the full migration model and are ranked based on AIC values. Best model is bolded. AIC = Akaike information criterion, k = number of parameters, Δi = AIC - AICmin, ωi = model probability. Subscripts of M (migration) parameters represent the directionality of gene flow between populations going backwards in time: 0 = L. darwinii-N, 1= L. darwinii-S, 2 = L. laurenti, and 3 = L. grosseorum. Parameters values constrained in nested models are enclosed in squared brackets.
TableS3.doc
Figure S1
Species tree of the L. darwinii complex based on a Bayesian analysis in *BEAST. Numbers above branches are posterior probabilities of clades. The scale bar represents substitutions per site.
FigureS1.pdf
Appendix 1
List of specimens sequenced for this study. Code represents the numbers used to identify sampled localities in Fig. 1. GenBank accession numbers are given for each locus; italicized numbers indicate sequences used in species-tree analysis. Acronyms: BYU = Bean Life Science Museum, Brigham Young University (USA); FML = Fundación Miguel Lillo (Argentina); IMCN = Instituto y Museo de Ciencias Naturales, Universidad Nacional de San Juan (Argentina); LJAMM-CNP = herpetological collection of the Centro Nacional Patagónico (CENPAT-CONICET, Argentina); and MLP S. = herpetological collection of the Museo de La Plata (Argentina).
Appendix1.docx
Appendix S1
Figure 1. Number of references published between 1981 and 2010 retrieved from the ISI Web of Science that contained the keyword "species delimitation". Figure 2. Number of references published between 1992 and 2010 retrieved from the ISI Web of Science that contained the keyword "approximate bayesian computation".
AppendixS1.pdf