PhyloFusion- Fast and easy fusion of rooted phylogenetic trees into rooted phylogenetic networks

Zhang, Louxin1; Cetinkaya, Banu2; Huson, Daniel 2

Published Jul 10, 2025 on Dryad. https://doi.org/10.5061/dryad.k3j9kd5h5

Data files

Jul 10, 2025 version files 3.54 MB

Background.zip
4.78 KB
Figure_10.zip
858.07 KB
Figure_11.zip
496.14 KB
Figure_2.zip
1.32 MB
README.md
3.82 KB
Table_1.zip
850.63 KB

Abstract

Unrooted phylogenetic networks are commonly used to represent evolutionary data in the presence of incompatibilities. While rooted phylogenetic networks provide a more explicit framework for depicting evolutionary histories involving reticulate events, they remain rarely reported in biological literature, likely due to the lack of widely adopted computational tools.

Here, we introduce PhyloFusion, a fast and user-friendly method for constructing rooted phylogenetic networks from sets of rooted phylogenetic trees. The algorithm accommodates trees with unresolved nodes—often resulting from the contraction of low-support edges—as well as some degree of missing taxa. We demonstrate its application to the analysis of functionally related gene groups and show that it can efficiently handle datasets comprising tens of trees and hundreds of taxa.

https://doi.org/10.5061/dryad.k3j9kd5h5

Description of the data and file structure

Background.zip: contains a single file "Background.tre" that represents the large background tree from which the problem instances used in Figure 10 were generated, in Newick format.

Figure_2.zip: For each of the six named gene classes (atp, ndh, pet, psa, psb, photosynthesis), we provide three files:

name.stree6 - SplitsTree 6 file containing PhyloFusion analysis of gene "name"
name.tre - the input trees for the named gene set
name.pdf - the PDF showing the network

The directory also contains a SplitsTree 6 file called "all-iqtree.stree6" that contains all 78 gene trees computed using IQ-TREE.

Figure_10.zip: Contains two subfolders, "phylofusion" and "fhynch", each with 12 output files that include input trees and output networks. Files follow a structured naming p_tX_cY_mZ.txt, where p denotes PhyloFusion (replaced accordingly for FHyNCH), tX represents the number of input trees, cY indicates the percentage of contracted internal edges, and mZ represents the percentage of missing taxa. Each parameter setting was run with 10 replicates, and the final plot in the manuscript represents the average hybridization number over these runs.

Figure_11.zip: Contains three files with input trees and output networks for PhyloFusion, FHyNCH, and TreeKnit. The taxa range from 10 to 400, with each tool run once for each number of taxa, using 2 input trees per run.

Table_1.zip: Contains nine CSV files, named mX_cY.cvs (with X=0,10,20 and Y=0,10,20), each with 54 rows and the following nine columns:

taxa - number of taxa
reticulation - number reticulations
trees - number of input trees
missing - percent missing taxa - this equals the value of X in the file name m*X_*cY.cvs
contracted - percent contracted edges - this equals the value of Y in the file name m*X_*cY.cvs
LGT_number - replicate id used by Bernardini et al., 2024.
H_PhyloFusion - hybridization number achieved by PhyloFusion
time_PhyloFusion - number of seconds wall-clock time used by PhyloFusion
H_Fhynch - hybridization number achieved by FHyNCH (reported in Bernardini et al., 2024)

The input data is a subset of the dataset simulated in FHyNCH (Bernardini et al., 2024).

In addition, there are two subdirectories.

The subdirectory "PhyloFusion Networks" contains one file per input datapoint indicated in the CSV files. The naming convention follows that used in (Bernardini et al., 2024). For example, tree_set_newick_L20_R10_T20_MisL0_ConE20_LGT_1 contains the network computed for the dataset that has 20 taxa (L20), 10 reticulations (R10), 20 trees (T20), 0% missing taxa (MisL0), 20% contracted edges (E20) and replicate id 1 (LGT*_*1).

The subdirectory "plots" contains nine plots (PNG format) in which the hybridization numbers achieved by PhyloFusion and FHyNCH are compared. The naming convention is that file m10_c20.png contains a scatterplot comparison on all datasets that have 10% missing taxa (m10) and 20% contracted edges (c20).

Data was derived from the following sources:

Trees in SplitsTree input files: Computed from alignments obtained from: https://zenodo.org/record/2613673
Trees used in Table 1: FHyNCH (Bernardini et al., 2024)

Code/Software

The program SplitsTree 6 was used to run the PhyloFusion algorithm (Huson and Bryant, 2024)

The SplitsTree program tools/sample-trees was used to sample trees from the source tree given in Background.tre.

The program IQ-tree (version 2.3.6, Minh et al., 2020) was used to compute the trees in Figure_2.zip.

PhyloFusion- Fast and easy fusion of rooted phylogenetic trees into rooted phylogenetic networks

Data files

Abstract

README: PhyloFusion- Fast and easy fusion of rooted phylogenetic trees into rooted phylogenetic networks

Description of the data and file structure

Data was derived from the following sources:

Code/Software

Methods

Works referencing this dataset