Data from: A 4-lineage statistical suite to evaluate the support of large-scale retrotransposon insertion data to reconstruct evolutionary trees
Data files
Nov 10, 2025 version files 75.06 KB
-
Appendix_2.csv
68.23 KB
-
Appendix_3.csv
4.95 KB
-
README.md
1.89 KB
Abstract
Retrophylogenomics makes use of genome-wide retrotransposon presence/absence insertion patterns to resolve questions in phylogeny and population genetics. In the genomics era, evaluating high-throughput data requires the associated development of appropriately powerful statistical tools. The currently used KKSC 3-lineage statistical test for estimating the significance of retrophylogenomic data is limited by the number of possible tree topologies it can assess in one step. To improve on this, we have extended the analysis to simultaneously compare 4-lineages, enabling us to evaluate ten distinct presence/absence insertion patterns for 26 possible tree topologies plus 129 trees with different incidences of hybridization or introgression. The new tool provides statistics for cases involving multiple ancestral hybridizations/introgressions, ancestral incomplete lineage sorting, bifurcation, and polytomy. The test is embedded in a user-friendly web R-application (http://retrogenomics.uni-muenster.de:3838/hammlet/) and is available for use by the scientific community.
Appendix_1.pdf: Detailed description of mathematical models
Appendix_2.csv: Supplementary Tables S1-S7
S1: Comparison of the stepwise method with chi-square distribution against a 50% chi-square distribution with a 50% point mass 0 mixture criterion for frequency of correct recognition of models in the Poisson randomized sets.
S2: Comparison of the reverse method and chi-square distribution against binary chi-square distribution according to degrees of freedom plus point mass 0 mixture and binary chi-square distribution according to degrees of freedom plus point mass 1 mixture criteria for frequency of correct recognition of models in the Poisson randomized set.
S3: Testing the reverse method and chi-square distribution criterion for frequency of correct recognition of models in the Poisson randomized set.
S4: Testing the stepwise method and chi-square distribution criterion for frequency of correct recognition of models in the Poisson randomized set.
S5: Testing the reverse method and empirical distribution criterion (eCDF) for frequency of correct recognition of models in the Poisson randomized set.
S6: Testing the stepwise method and empirical distribution criterion (eCDF) for frequency of correct recognition of models in the Poisson randomized set.
S7: Testing the dependence of the number of user makers on the frequency of correct recognition of models in the Poisson randomized set.
S2Appendix_3.csv: Supplementary Tables S8-S9
S8: Simple conflict-free patterns.
S9: Formulas for generating data for KKSC comparisons of resolved tree topologies (without hybridization).
Appendix_4.pdf: Sources of genomes used and RepeatMasker reports
