Skip to main content
Dryad logo

Data from: Plotting for change: an analytic framework to aid decisions on which lineages are candidate species in phylogenomic species discovery


Georges, Arthur et al. (2021), Data from: Plotting for change: an analytic framework to aid decisions on which lineages are candidate species in phylogenomic species discovery, Dryad, Dataset,


A recent study argued that coalescent-based models of species delimitation mostly delineate population structure not species, and called for the validation of candidate species using biological information additional to the genetic information, such as phenotypic or ecological data. Here we introduce a framework to interrogate genomic datasets and coalescent-based species trees for the presence of candidate species in situations where additional biological data are unavailable, unobtainable, or uninformative. For de novo genomic studies of species boundaries, we propose six steps: (a) visualize genetic affinities among individuals to identify both discrete and admixed genetic groups from first principles, and to hold aside individuals involved in contemporary admixture for independent consideration; (b) apply phylogenetic techniques to identify lineages; (c) assess diagnosability of those lineages as potential candidate species; (d) interpret the diagnosable lineages in a geographic context (sympatry, parapatry, allopatry); (e) assess significance of difference or trends in the context of sampling intensity; and (f) adopt a holistic approach to available evidence to inform decisions on species status in the difficult cases of allopatry. We use this approach to distinguish candidate species from within-species lineages for a widespread species complex of Australian freshwater fishes (Retropinna spp.). Our framework addresses two cornerstone issues in systematics that are often not explicitly discussed in genomic species discovery: diagnosability and how to determine it, and what criteria should be used to decide whether diagnosable lineages are conspecific or represent different species.


Sequencing for the SNP dataset was performed using DArTseq™ (Diversity Arrays Technology Pty Ltd, Canberra Australia). We extracted genomic DNA for the Sanger sequencing from muscle tissue from each specimen using the DNeasy Tissue Kit (QIAGEN Inc., Chatsworth CA) or by phenol-chloroform extraction. The allozyme dataset for the original study on the composite taxon MTV (TAS, ADM, and MDB) was extended to include a further 149 smelt from 16 additional sites from the relevant region of observed admixture (southern Victoria and Murray River).  Refer to Materials and Methods of the main body of the paper and in Supplementary Materials.

Usage Notes

Refer to the README file.


Australian Research Council, Award: LP140100521