Data from: Gaps, an elusive source of phylogenetic information

Saurabh, Kumar, Massey University

Holland, Barbara R., Massey University

Gibb, Gillian C., Massey University

Penny, David, Massey University

Published Mar 23, 2012 on Dryad. https://doi.org/10.5061/dryad.mg76th0n

Cite this dataset

Saurabh, Kumar; Holland, Barbara R.; Gibb, Gillian C.; Penny, David (2012). Data from: Gaps, an elusive source of phylogenetic information [Dataset]. Dryad. https://doi.org/10.5061/dryad.mg76th0n

Abstract

Morrison (2009) raises a very fundamental question, “Why would phylogeneticists ignore computerized sequence alignment?” While well aware of the difficulties, he considers the whole issue is a ‘gaping hole that needs to be filled’. Particularly with the expansion of genomic-scale data there are many advantages to using automated alignment for phylogenetic analyses, the most obvious being that it is much more efficient and potentially less prone to experimenter bias. So yes, it is obviously desirable to automate data preparation as far as possible, but the question remains whether we are yet at the stage that automated sequence alignment can obtain the full and correct phylogenetic information in the data. In this paper we use an example shorebird dataset to explore three related questions regarding the interplay between alignment and phylogeny estimation: 1) are gap-rich alignments reliable for phylogenetic inference? 2) How much phylogenetic information is contained in gaps as compared to sequences? 3) Are models of the insertion/deletion process essential, and if so at what phylogenetic depths? We report that there is considerable information created by the indel (insertion/deletion) process that is potentially available for phylogenetic inference. Ideally, we should be able to independently obtain the same tree from both sequences and from gaps; however there is still considerable variability in the alignments produced by different programs. We predict that better and more computationally tractable models of the indel process will be required before the information in gaps can be fully exploited for phylogenetic inference.

Usage notes

Alignment files

This .rar file contains all the alignments used to create Table 1, including 51 alignments in total. The files are organized in sub-folders firstly by taxonomic group: sub-order Charadrii, suborder Scolopaci, or shorebirds (which includes Charadrii, Scolopaci and Lari). Within taxonomic group files are organized by locus: Mt exon, RAG1 exon, beta-fib intron 7, and beta-fib intron 7 gap data. Note that the first three are sequence alignments but the last is a coding of the gaps in beta-fib intron 7. Within each of these folders are alignments for each of the methods: clustal, mafft, muscle, sate and t-coffee.

Supplementary data

Includes extra details on the datasets used and supplementary figure 1