Data from: HExT, a software supporting tree-based screens for hybrid taxa in multilocus datasets, and an evaluation of the homoplasy excess test
Schneider, Kevin, University of Graz
Koblmüller, Stephan, University of Graz
Sefc, Kristina Maria, University of Graz
Published Sep 15, 2016 on Dryad.
Cite this dataset
Schneider, Kevin; Koblmüller, Stephan; Sefc, Kristina Maria (2016). Data from: HExT, a software supporting tree-based screens for hybrid taxa in multilocus datasets, and an evaluation of the homoplasy excess test [Dataset]. Dryad. https://doi.org/10.5061/dryad.20t14
The homoplasy excess test (HET) is a tree-based screen for hybrid taxa in multilocus nuclear phylogenies. Homoplasy between a hybrid taxon and the clades containing the parental taxa reduces bootstrap support in the tree. The HET is based on the expectation that excluding the hybrid taxon from the data set increases the bootstrap support for the parental clades, whereas excluding non-hybrid taxa has little effect on statistical node support. To carry out a HET, bootstrap trees are calculated with taxon-jackknife data sets, that is excluding one taxon (species, population) at a time. Excess increase in bootstrap support for certain nodes upon exclusion of a particular taxon indicates the hybrid (the excluded taxon) and its parents (the clades with increased support).
We introduce a new software program, hext, which generates the taxon-jackknife data sets, runs the bootstrap tree calculations, and identifies excess bootstrap increases as outlier values in boxplot graphs. hext is written in r language and accepts binary data (0/1; e.g. AFLP) as well as co-dominant SNP and genotype data.
We demonstrate the usefulness of hext in large SNP data sets containing putative hybrids and their parents. For instance, using published data of the genus Vitis (˜6,000 SNP loci), hext output supports V. × champinii as a hybrid between V. rupestris and V. mustangensis.
With simulated SNP and AFLP data sets, excess increases in bootstrap support were not always connected with the hybrid taxon (false positives), whereas the expected bootstrap signal failed to appear on several occasions (false negatives). Potential causes for both types of spurious results are discussed.
With both empirical and simulated data sets, the taxon-jackknife output generated by hext provided additional signatures of hybrid taxa, including changes in tree topology across trees, consistent effects of exclusions of the hybrid and the parent taxa, and moderate (rather than excessive) increases in bootstrap support. hext significantly facilitates the taxon-jackknife approach to hybrid taxon detection, even though the simple test for excess bootstrap increase may not reliably identify hybrid taxa in all applications.
Tropheus AFLP data
A nexus-format file with the AFLP data of Tropheus spp. sampled from Lake Tanganyika. Sampling information is given in Egger et al. (2007; BMC Evolutionary Biology, 7, 137).
This file contains SNP genotype data of North American canids used in a homoplasy excess test for hybrid signal. The data were collected by vonHoldt et al. (2011; Genome Research 21, 1294-1305). The first line contains sample names. Row names indicate SNP loci. SNP genotypes are scored as 0 (homozygous for one allele), 1 (heterozygous), 2 (homozygous for the other allele).