Data from: In silico phylogenomics using complete genomes: a case study on the evolution of hominoids
Costa, Igor Rodrigues; Prosdocimi, Francisco; Jennings, W. Bryan (2017), Data from: In silico phylogenomics using complete genomes: a case study on the evolution of hominoids, Dryad, Dataset, https://doi.org/10.5061/dryad.jn8nt
The increasing availability of complete genome data is facilitating the acquisition of phylogenomic datasets, but the process of obtaining orthologous sequences from other genomes and assembling multiple sequence alignments remains piecemeal and arduous. We designed software that performs these tasks and outputs anonymous loci (AL) or anchor loci (AE/UCE) datasets in ready-to-analyze formats. We demonstrate our program by applying it to the hominoids. Starting with human, chimpanzee, gorilla, and orangutan genomes, our software generated an exhaustive dataset of 292 ALs (~1 kb each) in ~3 hours. Analyses of our AL dataset not only validated the program by yielding a portrait of hominoid evolution in agreement with previous studies, but the accuracy and precision of our estimated ancestral effective population sizes and speciation times represent improvements. We also used our program with a published set of 512 vertebrate-wide AE 'probe' sequences to generate datasets consisting of 171 and 242 independent loci (~1 kb each) in 11 and 13 minutes, respectively. The former dataset consisted of flanking sequences 500 bp from adjacent AEs, while the latter contained sequences bordering AEs. Although our AE datasets produced the expected hominoid species tree, coalescent-based estimates of ancestral population sizes and speciation times based on these data were considerably lower than estimates from our AL dataset and previous studies. Accordingly, we suggest that loci subjected to direct or indirect selection may not be appropriate for coalescent-based methods. Complete in silico approaches, combined with the burgeoning genome databases, will accelerate the pace of phylogenomics.