Evaluation of recombination detection methods for viral sequencing

Jaya, Frederick 1 ; Brito, Barbara2; Darling, Aaron3

Published Nov 27, 2023; Updated Jan 28, 2024 on Dryad. https://doi.org/10.5061/dryad.d7wm37q6f

Data files

Nov 27, 2023 version files 6.96 MB

README.md

610 B
sim.tar.gz

6.96 MB

Jan 28, 2024 version files 41.21 MB

performance.tar.gz

6.96 MB
README.md

727 B
scale.tar.gz

34.25 MB

Abstract

Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations, or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods have been developed over the past two decades, however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed five recombination detection methods (PhiPack (Profile), 3SEQ, GENECONV, VSEARCH (UCHIME), and gmos) to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Further, we provide a practical example for the analysis and validation of empirical data. We find that recombination detection methods need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.

Evaluation of recombination detection methods for viral sequencing

Data files

Abstract

README

Files