Supplementary material for: PhyloCoalSimulations: A simulator for network multispecies coalescent models, including a new extension for the inheritance of gene flow
Data files
May 18, 2023 version files 163.54 MB
-
fig_level2_network.jl
3.61 KB
-
figures.jl
1.78 KB
-
gtrees_4tax-changing_PhyloNetworks.tre
1.49 MB
-
Manifest.toml
22.84 KB
-
ntwk_level_2.tre
191 B
-
Project.toml
442 B
-
README.md
3.27 KB
-
validation_distances_level2net_figure.R
5.39 KB
-
validation_distances_level2net_rho0.zip
130.91 MB
-
validation_distances_level2net_rho1.zip
31.07 MB
-
validation_distances.Rmd
7.14 KB
-
validation_qCF.jl
8.76 KB
-
validation_qCF.Rmd
4.03 KB
Abstract
We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages, or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example.
We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.
This repository contains code to reproduce the figures and validation studies in: John Fogg, Elizabeth S. Allman, and Cécile Ané (2023). "PhyloCoalSimulations: A simulator for the network multispecies coalescent, extended with correlated inheritance of gene flow".
All gene tree simulations used PhyloCoalSimulations, a Julia package available freely and open-source.
Supplementary Material
`supplementarymaterial.pdf` contains an appendix and supplementary figures S1-S6.
Code to reproduce analyses
The code uses Julia and R. Files `Project.toml` and `Manifest.toml` record the Julia packages used and their specific version. To reproduce the environment, activate this folder and run `instantiate` in package mode within Julia.
Fig. 1: node mapping
`figures.jl`: Julia code to simulate a gene tree with degree-2 nodes for mapping of the gene tree into the species network, and to create the first 2 panels of Fig.1, output as `fig_nodemapping*.pdf`
Fig. 2: validation of quartet concordance factors
- `validation_qCF.jl`: Julia code to reproduce the simulations in Fig.2. Running this code will create 3 output files: `qCF_4taxa.csv` for the left network a), and `qCF_case_{1,2}.csv` for the right network b) on 6 taxa. It will also create `net4.pdf` and `net3.pdf`, showing the 4-taxon and 6-taxon networks respectively.
- `validation_qCF.Rmd`: R code to create Fig.2, taking as input the CSV files from above.
Fig. 3: level-2 network
- `fig_level2_network.jl`: Julia code to create Fig.3 showing the level-2 network that was used to validate the distribution of pairwise distances, using either rho=0 or 1 (independent or common inheritance). output: file `fig_level2net.pdf`.
- `ntwk_level_2.tre`: file containing the Newick description of that network, which can be visualized with julia package PhyloPlots.
Figures 4 and supplementary figures: validation of pairwise distances
Figure S2, on a 4-taxon species tree:
- `gtrees_4tax-changing_PhyloNetworks.tre` contains the 10k simulated gene trees.
- `validation_distances.Rmd`: R code to create Fig.S2, taking as input the gene trees in `gtrees_4tax-changing_PhyloNetworks.tre`.
Figure 4 and supplementary figures S3-S5, on a 6-taxon network with 2 reticulations:
- folders `validation_distances_level2net_rho0` and `validation_distances_level2net_rho1`: input files for the figures as compressed `.RData` files:
- The `samp_big*_d_xy.RData` files contain the pairwise distances from the 100k simulated gene trees between taxa x and y (used for histograms)
- the `d_*.RData` files contain the pairwise distances drawn from their theoretical distributions, summarized by their frequency in 100,000 small bins (for the theoretical density curve)
- the `sampleMeans*_dxy.Rdata` files contain the mean (over 100 replicates) of the 1000 ordered distances between taxa x and y (dbar_i in the paper) from 1000 simulated gene trees in each replicate (used for QQ plots).
- `Dmatrix.Rdata` contains the 6×6 matrix of *minimum* pairwise distances between all pairs of taxa on the network.
- `validation_distances_level2net_figure.R`: R code to create Fig.4 and supplementary figures, for the distances from the 6-taxon level-2 network in Fig.3. Takes as input files in folders above. output: files `fig_pairwisedist_level2net_rho*.pdf`.
Software archive
- `PhyloCoalSimulations-code-1d266fd.zip`: archive of the PhyloCoalSimulations package from GitHub, main branch (for the code), at commit 1d266fd, which is 1 commit ahead of version v0.1.2.
- `PhyloCoalSimulations-documentation-1d266fd.zip`: archive of the PhyloCoalSimulations package's documentation, from the gh-pages branch, at commit 1d266fd.
- Fogg, John; Allman, Elizabeth; Ané, Cécile (2023), Supplementary material for: PhyloCoalSimulations: A simulator for network multispecies coalescent models, including a new extension for the inheritance of gene flow, , Article, https://doi.org/10.5281/zenodo.7837962
- Fogg, John; Allman, Elizabeth S; Ané, Cécile (2023), PhyloCoalSimulations: A Simulator for Network Multispecies Coalescent Models, Including a New Extension for the Inheritance of Gene Flow, Systematic Biology, Journal-article, https://doi.org/10.1093/sysbio/syad030
- Fogg, John; Allman, Elizabeth S.; Ané, Cécile (2023). PhyloCoalSimulations: A simulator for network multispecies coalescent models, including a new extension for the inheritance of gene flow [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2023.01.11.523690
