Skip to main content
Dryad logo

Data for: Rapid alignment updating with Extensiphy

Citation

Field, Jasper T. (2021), Data for: Rapid alignment updating with Extensiphy, Dryad, Dataset, https://doi.org/10.6071/M38T0T

Abstract

1. High throughput sequencing has become commonplace in evolutionary studies. Large, rapidly collected genomic datasets are used to capture biodiversity and for monitoring global and national scale disease transmission patterns, among many other applications. Updating homologous sequence datasets with new samples is cumbersome, requiring excessive program runtimes and data processing. We describe Extensiphy, a bioinformatics tool to efficiently update multiple sequence alignments with whole-genome short-read data. Extensiphy performs reference based sequence assembly and alignment in one process while maintaining the alignment length of the original alignment. Input data- types for Extensiphy are any multiple sequence alignment in fasta format and whole-genome, short-read fastq sequences. 

2. To validate Extensiphy, we compared its results to those produced by two other methods that construct whole-genome scale multiple sequence alignments. We measured our comparisons by analyzing program runtimes, base-call accuracy, dataset retention in the presence of missing data and phylogenetic accuracy. 

3. We found that Extensiphy rapidly produces high-quality updated sequence alignments while preventing alignment shrinkage due to missing data. Phylogenies estimated from alignments produced by Extensiphy show similar accuracy to other commonly used alignment construction methods. 

4. Extensiphy is suitable for updating large sequence alignments and is ideal for studies of biodiversity, ecology and epidemiological monitoring efforts.

Methods

All data collection and data processing is described in Rapid Alignment Updating with Extensiphy. These are the files of all start and end points for both empirical and simulated data.

Usage Notes

The README.txt file included with these data describes the folder structure and what each file or folder contains.

Funding

National Science Foundation, Award: 1759846