The assumption of strictly neutral evolution is fundamental to the multispecies coalescent model and permits the derivation of gene tree distributions and coalescent times conditioned on a given species tree. In this study, we conduct computer simulations to explore the effects of violating this assumption in the form of species-specific positive selection when estimating species trees, species delimitations, and coalescent parameters under the model. We simulated datasets under an array of evolutionary scenarios that differ in both speciation parameters (i.e., divergence times, strength of selection) and experimental design (i.e., number of loci sampled) and incorporated species-specific positive selection occurring within branches of a species tree to identify the effects of selection on multispecies coalescent inferences. Our results highlight particular evolutionary scenarios and parameter combinations in which inferences may be more, or less, susceptible to the effects of positive selection. In some extreme cases, selection can decrease error in species delimitation and increase error in species tree estimation, yet these inferences appear to be largely robust to the effects of positive selection under many conditions likely to be encountered in empirical datasets.
FigureS1
FIGURE S1. The effects of species-specific positive selection on effective population size and divergence time estimates of the shallow (top) and moderate-depth (bottom) species tree models. Results are shown for (a-c), (d-f), (g-i), and (j-l) for simulated datasets consisting of 1-locus (a, d, g, j), 2-loci (b, e, h, k), and 10-loci (c, f, i, l). The mean (points) and standard deviation (error bars) of parameter estimates based on 200 replicates are shown for three different Species-AB divergence times: recent (blue), medium (black), and ancient (red). Each panel is split into two subpanels representing 5 (left of dotted line) or 20 (right of dotted line) haplotypes sampled per species. A color gradient ranging from white to dark gray is used to indicate the different percentages of loci under selection: 0% (neutral, white), 10%, 20%, 50% and 100% (dark gray). For simulations with selection, we varied the strength of selection: weak (“W”, s = 0.01), strong (“S”, s = 0.10), and very strong (“VS”, s = 0.5) selection coefficients.
FigureS2
FIGURE S2. The effects of species-specific positive selection on effective population size and divergence time estimates of the deep species tree model. Results are shown for (a-c), (d-f), (g-i), and (j-l) for simulated datasets consisting of 1-locus (a, d, g, j), 2-loci (b, e, h, k), and 10-loci (c, f, i, l). The mean (points) and standard deviation (error bars) of parameter estimates based on 200 replicates are shown for three different Species-AB divergence times: 0.001 (blue), 0.005 (black), and 0.009 (red). Each panel is split into two subpanels representing 5 (left of dotted line) or 20 (right of dotted line) haplotypes sampled per species. A color gradient ranging from white to dark gray is used to indicate the different percentages of loci under selection: 0% (neutral, white), 10%, 20%, 50% and 100% (dark gray). For simulations with selection, we varied the strength of selection: weak (“W”, s = 0.01), strong (“S”, s = 0.10), and very strong (“VS”, s = 0.5) selection coefficients.
FigureS3
FIGURE S3. The effects of species-specific positive selection on posterior probabilities of species hypotheses of the shallow and moderate-depth species tree models. Results are shown for simulated datasets consisting of 1-locus (a, d, g, j), 2-loci (b, e, h, k), and 10-loci (c, f, i, l). The mean (points) and standard deviation (error bars) of parameter estimates based on 200 replicates are shown for three different Species-AB divergence times: recent (blue), medium (black), and ancient (red). Each panel is split into two subpanels representing 5 (left of dotted line) or 20 (right of dotted line) haplotypes sampled per species. A color gradient ranging from white to dark gray is used to indicate the different percentages of loci under selection: 0% (neutral, white), 10%, 20%, 50% and 100% (dark gray). For simulations with selection, we varied the strength of selection: weak (“W”, s = 0.01), strong (“S”, s = 0.10), and very strong (“VS”, s = 0.5) selection coefficients.
FigureS4
FIGURE S4. The effects of species-specific positive selection on posterior probabilities of species hypotheses of the deep species tree model. Results are shown for simulated datasets consisting of 1-locus (a, d, g, j), 2-loci (b, e, h, k), and 10-loci (c, f, i, l). The mean (points) and standard deviation (error bars) of parameter estimates based on 200 replicates are shown for three different Species-AB divergence times: 0.001 (blue), 0.005 (black), and 0.009 (red). Each panel is split into two subpanels representing 5 (left of dotted line) or 20 (right of dotted line) haplotypes sampled per species. A color gradient ranging from white to dark gray is used to indicate the different percentages of loci under selection: 0% (neutral, white), 10%, 20%, 50% and 100% (dark gray). For simulations with selection, we varied the strength of selection: weak (“W”, s = 0.01), strong (“S”, s = 0.10), and very strong (“VS”, s = 0.5) selection coefficients.
FigureS5
FIGURE S5. The effects of selection on species tree estimates for the moderate species tree simulations. Violin plots show the distribution of posterior probabilities of the correct rooted species topology (PABC, blue) and incorrect topology (PBCA, yellow) across 200 replicates (mean shown in black) for datasets consisting of 1-locus (bottom), 2-loci (middle), and 10-loci (top) that were simulated with either 5 (left) or 20 samples per species (right) under three different Species-AB divergence times (from left to right): 0.0001 (blue), 0.0005 (black), and 0.0009 (red). A color gradient ranging from white to dark gray is used to indicate the different percentages of loci under selection: 0% (neutral, white), 10%, 20%, 50% and 100% (dark gray). For simulations with selection, we varied the strength of selection: weak (“W”, s = 0.01, light red), strong (“S”, s = 0.10, medium red), and very strong (“VS”, s = 0.5, dark red) selection coefficients.
FigureS6
FIGURE S6. The effects of selection on species tree estimates for the deep species tree simulations. Violin plots show the distribution of posterior probabilities of the correct rooted species topology (PABC, blue) and incorrect topology (PBCA, yellow) across 200 replicates (mean shown in black) for datasets consisting of 1-locus (bottom), 2-loci (middle), and 10-loci (top) that were simulated with either 5 (left) or 20 samples per species (right) under three different Species-AB divergence times (from left to right): 0.001 (blue), 0.005 (black), and 0.009 (red). A color gradient ranging from white to dark gray is used to indicate the different percentages of loci under selection: 0% (neutral, white), 10%, 20%, 50% and 100% (dark gray). For simulations with selection, we varied the strength of selection: weak (“W”, s = 0.01, light red), strong (“S”, s = 0.10, medium red), and very strong (“VS”, s = 0.5, dark red) selection coefficients.