Skip to main content
Dryad

ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization

Data files

Jul 05, 2023 version files 7.97 GB

Abstract

Motivation

Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends and is not able to analyze the largest available datasets in a reasonable time.  

Results

ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10,000 species or datasets with more than 100,000 genes in <2 days. 

Availability and implementation

ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP