Skip to main content

Data from: A non-parametric analytic framework for within-host viral phylogenies and a test for HIV-1 founder multiplicity

Cite this dataset

Lewitus, Eric; Rolland, Morgane (2019). Data from: A non-parametric analytic framework for within-host viral phylogenies and a test for HIV-1 founder multiplicity [Dataset]. Dryad.


Phylogenetics is a powerful tool for understanding the diversification dynamics of viral pathogens. Here we present an extension of the spectral density profile of the modified graph Laplacian, which facilitates the characterization of within-host molecular evolution of viruses and the direct comparison of diversification dynamics between hosts. This approach is non-parametric and therefore fast and model-free. We used simulations of within-host evolutionary scenarios to evaluate the efficiency of our approach and to demonstrate the significance of interpreting a viral phylogeny by its spectral density profile in terms of diversification dynamics. The key features that are captured by the profile are positive selection on the viral gene (or genome), temporal changes in substitution rates, mutational fitness, and time between sampling. Using sequences from individuals infected with HIV-1, we showed the utility of this approach for characterizing within-host diversification dynamics, for comparing dynamics between hosts, and for charting disease progression in infected individuals sampled over multiple years. We furthermore propose a heuristic test for assessing founder heterogeneity, which allows us to classify infections with single and multiple HIV-1 founder viruses. This non-parametric approach can be a valuable complement to existing parametric approaches.

Usage notes

Figure S1

Boxplots for maximum pairwise genetic dissimilarity for alignments simulated under various (left) dN/dS, (middle) γ distributions, and (right) ti/tv. Dashed lines indicate pairwise significant differences in mean values (P < 0.05).



A heuristic test for founder heterogeneity on time-scaled trees. (A) Boxplot of lnλ∗ for acutely infected participants with founder homogeneity (green) and heterogeneity (brown) (Keele et al. 2008). (B) Density plot of lnλ∗ for all participants, nominally colored to show which tails of the distribution are predominantly occupied by participants with founder homogeneity and heterogeneity. (C) Barplot of lnλ∗, adjusted so that the minimum value is zero. Fill colors represent the inferred classification based on the λ∗ test of founder heterogeneity; border colors represent the classification given in (Keele et al. 2008). ±σ2/2 of the median are shown with dashed lines on (B,C).