Skip to main content
Dryad

A comparison of phylogenomic inference pipelines for low-coverage whole-genome sequencing in Formica ants

Data files

Abstract

A rapid proliferation in the availability of whole genome sequences (WGS), often with relatively low read depth, offers an unprecedented opportunity for phylogenomic advances using publicly available data, but there are several key challenges in applying these data. Using low-coverage WGS data for Formica ants, we conducted detailed comparisons on two different analytical pipelines (reference-based vs. de novo genome assembly), four types of datasets (5kbp-window, ultra-conserved element [UCE], single-copy ortholog [BUSCO] and mitogenome), and a series of analytical procedures (e.g., concatenation vs. coalescent analyses) to identify which are robust to typical WGS data. The results show that at shallow scale of phylogenetic relationships of closely related species 5kbp-windows from the reference-based pipeline and UCEs from the de novo assemblies are more advantageous than the BUSCOs in recovering informative markers for phylogenetic inference. Compared to concatenation analyses, coalescent analyses often resulted in disparate deeper relationships in the phylogeny. This study uncovers obvious mito-nuclear discordance, and demonstrates genome-wide gene conflicts in phylogenetic signals, both pointing to possible incomplete lineage sorting and/or hybridization during the early, rapid radiation of Formica ants. Divergence dating analyses show that different types of data often resulted in inconsistent time estimates, with older ages estimated for deep nodes using the mitogenomic and 5kbp-window datasets. A taxon sampling covering the diversity of a lineage is essential to accurately estimate its divergence time. The strengths and weaknesses of different analytical pipelines and strategies are discussed. Findings from this study provide valuable insights for large-scale phylogenomic projects using WGS data.