Data from: Gaps, an elusive source of phylogenetic information


Saurabh, Kumar; Holland, Barbara R.; Gibb, Gillian C.; Penny, David (2012), Data from: Gaps, an elusive source of phylogenetic information, Dryad, Dataset,


Morrison (2009) raises a very fundamental question, “Why would phylogeneticists ignore computerized sequence alignment?” While well aware of the difficulties, he considers the whole issue is a ‘gaping hole that needs to be filled’. Particularly with the expansion of genomic-scale data there are many advantages to using automated alignment for phylogenetic analyses, the most obvious being that it is much more efficient and potentially less prone to experimenter bias. So yes, it is obviously desirable to automate data preparation as far as possible, but the question remains whether we are yet at the stage that automated sequence alignment can obtain the full and correct phylogenetic information in the data. In this paper we use an example shorebird dataset to explore three related questions regarding the interplay between alignment and phylogeny estimation: 1) are gap-rich alignments reliable for phylogenetic inference? 2) How much phylogenetic information is contained in gaps as compared to sequences? 3) Are models of the insertion/deletion process essential, and if so at what phylogenetic depths? We report that there is considerable information created by the indel (insertion/deletion) process that is potentially available for phylogenetic inference. Ideally, we should be able to independently obtain the same tree from both sequences and from gaps; however there is still considerable variability in the alignments produced by different programs. We predict that better and more computationally tractable models of the indel process will be required before the information in gaps can be fully exploited for phylogenetic inference.

