Detection of ghost introgression requires exploiting topological and branch length information
Data files
Jan 09, 2024 version files 824.16 MB
-
data.tar.gz
818.15 MB
-
README.md
1.26 KB
-
Supplementary_materials_to_Pang_and_Zhang.pdf
6 MB
Abstract
In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression—the transfer of genetic material from extinct or unsampled lineages to extant species—emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving three species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP using multilocus sequence alignments directly—hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.
README: Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information
The dataset includes both a supplemental material text file as well as data-files regarding the simulated and empirical results, organized into three categories:
- script: this folder contains scripts for simulations and HyDe_jackknife.
- simulated results: this folder contains the output of simulations using HyDe, PhyloNet/MPL, and BPP.
empirical results: this folder contains the results about real-world datasets of Thuja and Jaltomata
Thuja:
sequence: multiple sequence alignments and constructed gene trees.
HyDe: the analysis files for HyDe
PhyloNet: the analysis files for PhyloNet/MPL
BPP: model (inflow/outflow/ghost) comparison results using BPP
Jaltomata:
sequence: multiple sequence alignments for HyDe and BPP, as well as the corresponding constructed gene trees for PhyloNet/MPL
net-event1/net-event2/net-event3: the analysis results for three gene flow events labelled in Figure 6.
HyDe: the analysis files for HyDe
PhyloNet: the analysis files for PhyloNet/MPL
BPP: model (inflow/outflow/ghost) comparison results using BPP