Skip to main content

Data from: Efficient summary statistics for detecting lineage fusion from phylogeographic datasets

Cite this dataset

Garrick, Ryan; Hyseni, Chaz; Arantes, Ísis (2021). Data from: Efficient summary statistics for detecting lineage fusion from phylogeographic datasets [Dataset]. Dryad.


Aim: Lineage fusion (merging of two or more populations of a species resulting in a single panmictic group) is a special case of secondary contact. It has the potential to counteract diversification and speciation, or to facilitate it through creation of novel genotypes. Understanding the prevalence of lineage fusion in nature requires reliable detection of it, such that efficient summary statistics are needed. Here we report on simulations that characterized the initial intensity and subsequent decay of signatures of past fusion for 17 summary statistics applicable to DNA sequence haplotype data.

Location: Global.

Taxon: Diploid out-crossing species.

Methods: We considered a range of scenarios that could reveal the impacts of different combinations of read length versus number of loci (arrangement of DNA sequence data), and whether or not pre-fusion populations experienced bottlenecks coinciding with their divergence (historical context of fusion). Post-fusion gene pools were sampled along 10 successive time points representing increasing lag times following merging of sister populations, and summary statistic values were recalculated at each.

Results: Many summary statistics were able to detect signatures of complete merging of populations after a sampling lag time of 1.5 Ne generations, but the most informative ones included two neutrality tests and four diversity metrics, with ZnS being particularly powerful. Correlation was relatively low among the two neutrality tests and two of the diversity metrics. There were clear benefits of many short (200-bp × 200) loci over a handful of long (4-kb × 10) loci. Also, only the latter genetic dataset type was showed impacts of bottlenecks during divergence upon the number of informative summary statistics.
Main conclusions: This work contributes to identifying cases of lineage fusion, and advances phylogeography by enabling more nuanced reconstructions of how individual species, or multiple members of an ecological community, responded to past environmental change.


Simulated DNA sequence data, generated using the software DIY-ABC v2.1.0 (Cornuet et al. 2014, Bioinformatics, 30:1187–1189).

Usage notes

See "read me" files (README_for_Garrick_etal_2020_Simulated_Data.txt, and README_for_Garrick_etal_2020_Supplementary_Tables.txt).



National Science Foundation, Award: 1738817