Skip to main content
Dryad

Supporting data and code for: Phylogenetic identification of influenza virus candidates for seasonal vaccines

Data files

Jun 16, 2023 version files 116.05 MB
Aug 16, 2023 version files 112.93 MB
Dec 18, 2023 version files 112.93 MB

Abstract

The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75–0.89 (Area under the curve AUC 0.83–0.91) over 2016–2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent three years also do well at this task. We identify similar candidate strains to those proposed by the World Health Organization, suggesting that this approach can help inform vaccine strain selection.