Skip to main content
Dryad

Detection of ghost introgression requires exploiting topological and branch length information

Cite this dataset

Pang, Xiaoxu; Zhang, Da-Yong (2024). Detection of ghost introgression requires exploiting topological and branch length information [Dataset]. Dryad. https://doi.org/10.5061/dryad.zs7h44jfz

Abstract

In recent years, the study of hybridization and introgression has made significant progress, with ghost introgressionthe transfer of genetic material from extinct or unsampled lineages to extant species—emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving three species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP using multilocus sequence alignments directly—hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.

README: Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information

The dataset includes both a supplemental material text file as well as data-files regarding the simulated and empirical results, organized into three categories:

  1. script: this folder contains scripts for simulations and HyDe_jackknife.
  2. simulated results: this folder contains the output of simulations using HyDe, PhyloNet/MPL, and BPP.
  3. empirical results: this folder contains the results about real-world datasets of Thuja and Jaltomata

    1. Thuja:

      sequence: multiple sequence alignments and constructed gene trees.

      HyDe: the analysis files for HyDe

      PhyloNet: the analysis files for PhyloNet/MPL

      BPP: model (inflow/outflow/ghost) comparison results using BPP

    2. Jaltomata:

      sequence: multiple sequence alignments for HyDe and BPP, as well as the corresponding constructed gene trees for PhyloNet/MPL

      net-event1/net-event2/net-event3: the analysis results for three gene flow events labelled in Figure 6.

      HyDe: the analysis files for HyDe

      PhyloNet: the analysis files for PhyloNet/MPL

      BPP: model (inflow/outflow/ghost) comparison results using BPP

Funding

National Natural Science Foundation of China