Detection of ghost introgression requires exploiting topological and branch length information
Cite this dataset
Pang, Xiaoxu; Zhang, Da-Yong (2024). Detection of ghost introgression requires exploiting topological and branch length information [Dataset]. Dryad. https://doi.org/10.5061/dryad.zs7h44jfz
Abstract
In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression—the transfer of genetic material from extinct or unsampled lineages to extant species—emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving three species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP using multilocus sequence alignments directly—hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.
README: Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information
The dataset includes both a supplemental material text file as well as data-files regarding the simulated and empirical results, organized into three categories:
- script: this folder contains scripts for simulations and HyDe_jackknife.
- simulated results: this folder contains the output of simulations using HyDe, PhyloNet/MPL, and BPP.
empirical results: this folder contains the results about real-world datasets of Thuja and Jaltomata
Thuja:
sequence: multiple sequence alignments and constructed gene trees.
HyDe: the analysis files for HyDe
PhyloNet: the analysis files for PhyloNet/MPL
BPP: model (inflow/outflow/ghost) comparison results using BPP
Jaltomata:
sequence: multiple sequence alignments for HyDe and BPP, as well as the corresponding constructed gene trees for PhyloNet/MPL
net-event1/net-event2/net-event3: the analysis results for three gene flow events labelled in Figure 6.
HyDe: the analysis files for HyDe
PhyloNet: the analysis files for PhyloNet/MPL
BPP: model (inflow/outflow/ghost) comparison results using BPP
Funding
National Natural Science Foundation of China