Hierarchical heuristic species delimitation under the multispecies coalescent model with migration
Data files
Sep 13, 2023 version files 191.44 KB
-
hhsd-testdata.zip
-
README.md
Sep 10, 2024 version files 2.63 MB
-
2024Kornai-HHSD-Dryad.zip
-
README.md
Abstract
The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as population split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (πππ) and implement them in a python pipeline called hhsd. We characterize the behavior of the πππ under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.
README: Contents
BPP_files_SI
BPP Files needed to reproduce the analyses in the theory and methods section
Empirical_Giraffe
Control files and empirical data for conducting hierarchical heuristic species delimitation on complex of five Giraffe populations.
Original publication:
Alice Petzold, Alexandre Hassanin,
A Comparative Approach for Species Delimitation Based on Multiple Methods of Multi-Locus DNA Sequence Analysis: A Case Study of the Genus Giraffa (Mammalia,Cetartiodactyla),
PLOS ONE, Volume 15, Issue 2, February 2020, Pages e0217956.
https://doi.org/10.1371/journal.pone.0217956.
Data published at: https://osf.io/9wv86/
Folder contents:
Imap_Giraffe.txt
Imap fileMSA_Giraffe
MSAcf_giraffe_merge.txt
hhsd control file for merge analysiscf_giraffe_split.txt
hhsd control file for split analysis
Empirical_Milksnake
Control files and empirical data for conducting hierarchical heuristic species delimitation on a complex of seven Milksnake populations, and exploring five potential alternative delimitations of two populations.
Original publication:
E Anne Chambers, David M Hillis,
The Multispecies Coalescent Over-Splits Species in the Case of Geographically Widespread Taxa,
Systematic Biology, Volume 69, Issue 1, January 2020, Pages 184β193,
https://doi.org/10.1093/sysbio/syz042
Data published at: https://doi.org/10.5061/dryad.7hs34mj
Folder contents:
cf_milksnake_merge.txt
hhsd control file for merge analysiscf_milksnake_split.txt
hhsd control file for split analysisImap_Lampropeltis.txt
Imap fileMSA_Lampropeltis
MSA
Empirical_Milksnake_Eastwest
control files and empirical data for conducting species delimitations on the 5 alternative East-West splits of milksnake populations. Original publication is the same as above
Folder contents:
EW.sh
bash script for running hhsd analyses for the 5 alternative splits.trigentalt.txt
sequence alignmenttrigent{n}alt.Imap.txt
for n in the range 1-5, each represents an alternative mapping of individuals to populationscf_milksnake_EW.txt
hhsd control file
Empirical_Sunfish
Control files and empirical data for conducting hierarchical heuristic species delimitation on complex of five six Sunfish populations.
Original publication:
Daemin Kim, Bruce H Bauer, Thomas J Near,
Introgression and Species Delimitation in the Longear Sunfish Lepomis megalotis (Teleostei: Percomorpha: Centrarchidae),
Systematic Biology, Volume 71, Issue 2, March 2022, Pages 273β285,
https://doi.org/10.1093/sysbio/syab029
Data published at: http://dx.doi.org/10.5061/dryad.dbrv15f05
Folder contents:
Imap_sunfish.txt
Imap fileMSA_sunfish
MSAcf_sunfish_merge.txt
hhsd control file for merge analysiscf_sunfish_split.txt
hhsd control file for split analysis
Mathematica_hhsd
Mathematica files needed to reproduce the calculations and plots in the theory and methods section
Simulated_ABCD
Control file and simulated data for demonstrating the basic behaviour and control of the program.
Folder contents:
MyImap.txt
Imap fileMySeq
MSAcf_sim_merge.txt
hhsd control file for merge analysiscf_sim_split.txt
hhsd control file for merge analysis
Simulated_XABCD
Control file and simulated data for demonstrating behaviour with paraphyletic species
Original publication:
Adam D LeachΓ©, Tianqi Zhu, Bruce Rannala, Ziheng Yang,
The Spectre of Too Many Species,
Systematic Biology, Volume 68, Issue 1, January 2019, Pages 168β181,
https://doi.org/10.1093/sysbio/syy051
Data published at: https://doi.org/10.5061/dryad.t66gq81
Folder contents:
starting_imap.text
Imap filesequences.txt
MSAsimulated_merge_analysis.txt
hhsd control file for merge analysissimulated_split_analysis.txt
hhsd control file for split analysis
Version History:
- 08-09-2024: Revision 1:
- Added new folders corresponding to the theoretical bpp and mathematica analyses in the accepted version of the paper.
- updated all
hhsd
control files to correspond with updates to the the syntax of thegdi_threshold
parameter.
- 13-09-2023:
- Original Upload.
Methods
The data are analyzed using our new python pipeline called HHSD.