Skip to main content
Dryad

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES

Data files

Jul 20, 2024 version files 48.39 GB
Jan 09, 2025 version files 51.29 GB

Select up to 11 GB of files for download

Abstract

Current genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging for many scientists due to the limited automation, significant domain expertise, and substantial computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully-automated pipeline to infer species trees starting from raw genome assemblies (those lacking prior annotations). In contrast to the prominent approach, ROADIES randomly selects segments of the input genomes to generate gene trees. This eliminates the need to choose any single reference species or perform the cumbersome steps of gene annotations and whole genome alignments. ROADIES also leverages existing discordance-aware methods that allow multi-copy genes, eliminating the need to infer orthology. Using the genomic datasets from large-scale sequencing efforts across four diverse life forms (placental mammals, pomace flies, birds, and budding yeasts), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art studies that involved domain experts but in a fraction of the time and effort. With its speed, accuracy, and automation, ROADIES has the potential to vastly simplify species tree inference, making it accessible to a broader range of scientists and applications.