Skip to main content
Dryad

Data from: Rarity and incomplete sampling in DNA-based species delimitation

Abstract

DNA-based species delimitation may be compromised by limited sampling effort and species rarity, including “singleton” representatives of species, which hampers estimates of intra- versus interspecies evolutionary processes. In a case study of southern African chafers (beetles in the family Scarabaeidae), many species and subclades were poorly represented and 48.5% of species were singletons. Using cox1 sequences from >500 specimens and ∼100 species, the Generalized Mixed Yule Coalescent (GMYC) analysis as well as various other approaches for DNA-based species delimitation (Automatic Barcode Gap Discovery (ABGD), Poisson tree processes (PTP), Species Identifier, Statistical Parsimony), frequently produced poor results if analyzing a narrow target group only, but the performance improved when several subclades were combined. Hence, low sampling may be compensated for by “clade addition” of lineages outside of the focal group. Similar findings were obtained in reanalysis of published data sets of taxonomically poorly known species assemblages of insects from Madagascar. The low performance of undersampled trees is not due to high proportions of singletons per se, as shown in simulations (with 13%, 40% and 52% singletons). However, the GMYC method was highly sensitive to variable effective population size (NeNe), which was exacerbated by variable species abundances in the simulations. Hence, low sampling success and rarity of species affect the power of the GMYC method only if they reflect great differences in NeNe among species. Potential negative effects of skewed species abundances and prevalence of singletons are ultimately an issue about the variation in NeNe and the degree to which this is correlated with the census population size and sampling success. Clade addition beyond a limited study group can overcome poor sampling for the GMYC method in particular under variable NeNe. This effect was less pronounced for methods of species delimitation not based on coalescent models.