Skip to main content
Dryad

Lpnet: Reconstructing phylogenetic networks from distances using integer linear programming

Cite this dataset

Guo, Mengzhen; Grünewald, Stefan (2022). Lpnet: Reconstructing phylogenetic networks from distances using integer linear programming [Dataset]. Dryad. https://doi.org/10.5061/dryad.51c59zwc3

Abstract

We present Lpnet, a variant of the widely used Neighbor-net method that approximates pairwise distances between taxa by a circular phylogenetic network. We first apply standard methods to construct a binary phylogenetic tree and then use integer linear programming to compute optimal circular orderings that agree with all tree splits. This approach achieves an improved approximation of the input distance for the clear majority of experiments that we have run for simulated and real data. We release an implementation in R that can handle up to 94 taxa and usually needs about one minute on a standard computer for 80 taxa. For larger taxa sets, we include a top-down heuristic which also tends to perform better than Neighbor-net.

Methods

Two parts of data were used in our paper about Lpnet algorithm.

Random matrix:

10000 distance matrixes for 30 taxa which use random numbers between 0 and 1 from the uniform distribution as pairwise distances. And then, we add the smallest constant to all distances to guarantee the triangle inequality in every distance matrix. We provide 10000 distance matrixes.

Simulation sequences:

We randomly generate 10000 trees for 30 taxa by using the function sim.taxa from the R package TreeSimGM. We let the parameter waiting time until speciation for sim.taxa be exponentially distributed and normalize the maximum pairwise distance to 1 for every simulation tree. Then we use the software Dawg to simulate DNA sequences of length 10000bp from the simulation trees under Jukes-Cantor model. We provide origin simulation trees, distance matrixes, and simulation sequences.

Usage notes

We upload .txt files which can be opened with any text editor.