PhyloFusion- Fast and easy fusion of rooted phylogenetic trees into rooted phylogenetic networks
Data files
Jul 10, 2025 version files 3.54 MB
-
Background.zip
4.78 KB
-
Figure_10.zip
858.07 KB
-
Figure_11.zip
496.14 KB
-
Figure_2.zip
1.32 MB
-
README.md
3.82 KB
-
Table_1.zip
850.63 KB
Abstract
Unrooted phylogenetic networks are commonly used to represent evolutionary data in the presence of incompatibilities. While rooted phylogenetic networks provide a more explicit framework for depicting evolutionary histories involving reticulate events, they remain rarely reported in biological literature, likely due to the lack of widely adopted computational tools.
Here, we introduce PhyloFusion, a fast and user-friendly method for constructing rooted phylogenetic networks from sets of rooted phylogenetic trees. The algorithm accommodates trees with unresolved nodes—often resulting from the contraction of low-support edges—as well as some degree of missing taxa. We demonstrate its application to the analysis of functionally related gene groups and show that it can efficiently handle datasets comprising tens of trees and hundreds of taxa.
https://doi.org/10.5061/dryad.k3j9kd5h5
Description of the data and file structure
Background.zip
: contains a single file "Background.tre" that represents the large background tree from which the problem instances used in Figure 10 were generated, in Newick format.
Figure_2.zip
: For each of the six named gene classes (atp, ndh, pet, psa, psb, photosynthesis), we provide three files:
- name.stree6 - SplitsTree 6 file containing PhyloFusion analysis of gene "name"
- name.tre - the input trees for the named gene set
- name.pdf - the PDF showing the network
The directory also contains a SplitsTree 6 file called "all-iqtree.stree6" that contains all 78 gene trees computed using IQ-TREE.
Figure_10.zip
: Contains two subfolders, "phylofusion" and "fhynch", each with 12 output files that include input trees and output networks. Files follow a structured naming p_tX_cY_mZ.txt, where p denotes PhyloFusion (replaced accordingly for FHyNCH), tX represents the number of input trees, cY indicates the percentage of contracted internal edges, and mZ represents the percentage of missing taxa. Each parameter setting was run with 10 replicates, and the final plot in the manuscript represents the average hybridization number over these runs.
Figure_11.zip
: Contains three files with input trees and output networks for PhyloFusion, FHyNCH, and TreeKnit. The taxa range from 10 to 400, with each tool run once for each number of taxa, using 2 input trees per run.
Table_1.zip
: Contains nine CSV files, named mX_cY.cvs (with X=0,10,20 and Y=0,10,20), each with 54 rows and the following nine columns:
- taxa - number of taxa
- reticulation - number reticulations
- trees - number of input trees
- missing - percent missing taxa - this equals the value of X in the file name m*X_*cY.cvs
- contracted - percent contracted edges - this equals the value of Y in the file name m*X_*cY.cvs
- LGT_number - replicate id used by Bernardini et al., 2024.
- H_PhyloFusion - hybridization number achieved by PhyloFusion
- time_PhyloFusion - number of seconds wall-clock time used by PhyloFusion
- H_Fhynch - hybridization number achieved by FHyNCH (reported in Bernardini et al., 2024)
The input data is a subset of the dataset simulated in FHyNCH (Bernardini et al., 2024).
In addition, there are two subdirectories.
The subdirectory "PhyloFusion Networks" contains one file per input datapoint indicated in the CSV files. The naming convention follows that used in (Bernardini et al., 2024). For example, tree_set_newick_L20_R10_T20_MisL0_ConE20_LGT_1
contains the network computed for the dataset that has 20 taxa (L20), 10 reticulations (R10), 20 trees (T20), 0% missing taxa (MisL0), 20% contracted edges (E20) and replicate id 1 (LGT*_*1).
The subdirectory "plots" contains nine plots (PNG format) in which the hybridization numbers achieved by PhyloFusion and FHyNCH are compared. The naming convention is that file m10_c20.png contains a scatterplot comparison on all datasets that have 10% missing taxa (m10) and 20% contracted edges (c20).
Data was derived from the following sources:
- Trees in SplitsTree input files: Computed from alignments obtained from: https://zenodo.org/record/2613673
- Trees used in Table 1: FHyNCH (Bernardini et al., 2024)
Code/Software
The program SplitsTree 6 was used to run the PhyloFusion algorithm (Huson and Bryant, 2024)
The SplitsTree program tools/sample-trees was used to sample trees from the source tree given in Background.tre.
The program IQ-tree (version 2.3.6, Minh et al., 2020) was used to compute the trees in Figure_2.zip.
The sequence data for Figure 2 was downloaded from Gruenstaeudl, Plant Systematics and Evolution (2019) and then trees were computed using IQ-Tree (Minh et al, 2020). Phylogenetic networks were computed using our implementation of the PhyloFusion method.
Synthetic trees used in Figure 10 and 11 where generated using the SplitsTree app tool 'sample-trees' (Huson & Bryant, 2024).
Data used in Figure 12 is from (Bernardini et al, 2024).