Data from: Phylogeny inference under time-decaying migration and varying information content
Data files
Nov 28, 2023 version files 236.44 GB
-
8_Locus-length-number-combs-not-included_filepart1.zip
18.36 GB
-
8_Locus-length-number-combs-not-included_filepart2.zip
39.87 GB
-
8_Locus-length-number-combs-not-included_filepart3.zip
35.02 GB
-
Empirical-data.zip
4 GB
-
README.md
1.28 KB
-
Simdat_1_Simulation-trees.zip
5.38 KB
-
Simdat_2_Control-files.zip
238.05 KB
-
Simdat_3_Simulation-sequences.zip
199.54 MB
-
Simdat_4_Denim-XMLs.zip
178.04 MB
-
Simdat_5_Starbeast3-XMLs.zip
189.86 MB
-
Simdat_6_Results_DENIM_filepart1.zip
46.85 GB
-
Simdat_6_Results_DENIM_filepart2.zip
45.89 GB
-
Simdat_6_Results_DENIM_filepart3.zip
2.92 GB
-
Simdat_6_Results_StarBeast3.zip
42.97 GB
-
Simdat_7_R-scripts.zip
41.17 KB
-
Simdat_9_Readme-and-SimulatedMigs.zip
2.71 KB
Sep 20, 2024 version files 273.79 GB
Abstract
Postspeciation gene flow is widespread across the Tree of Life but is ignored as a cause of gene tree discordance under the standard multispecies coalescent. Failure to account for migration can lead to the misestimation of effective population sizes, divergence times, and topology. Isolation-with-migration and multispecies coalescent-with-introgression models accommodate migration but involve additional parameters that limit their computational viability with even moderate sized molecular datasets. Problematically, very large datasets may be required for reliable parameter estimation under these models. This study evaluates IM-based phylogeny inference using simulated and empirical datasets with a focus on incorporating gradually time-decaying migration, reflecting realistic expectations of reduced gene flow among more divergent lineages. We compare the performance of DENIM (an IM model) and StarBeast3 (an MSC model) using sequences simulated with continuous low-level migration and no migration on a ten-taxon ultrametric tree. Our results reveal that DENIM significantly improves phylogenetic accuracy in the presence of incomplete lineage and low-level, time-decaying migration, providing robust estimates with fewer loci and without requiring a predefined topology. Both DENIM and StarBeast3 benefit from rapidly decaying migration, but DENIM is more accurate and demonstrates faster convergence per generation across all migration scenarios despite its higher parameter complexity. Varying locus length and number had limited detectable effects on phylogenetic accuracy, although increasing sequence length improves convergence considerably. Empirical validation with autosomal sequences from the Anopheles gambiae group of African mosquitoes demonstrates confirms DENIM’s efficiency in inferring migration, and delivering node height and topology estimates consistent with previous studies. This study affirms DENIM as a robust method for mitigating migration-induced distortions in the species tree while offering enhanced computational efficiency.