Skip to main content

Viral reference genomes to disentangle the recombinant phylogenetic history of the potyviruses

Cite this dataset

Rasmussen, David (2023). Viral reference genomes to disentangle the recombinant phylogenetic history of the potyviruses [Dataset]. Dryad.


Potyviruses are a large genus of plant-infecting RNA viruses in the family Potyviridae. Due to their rapid diversification and frequent recombination, reconstructing the phylogenetic history of the potyviruses has proven difficult. Phylogenies reconstructed from different protein-coding regions of the viral genome often reveal conflicing or discordant relationships. But the extent to which discordance is due to interspecific recombination versus phylogenetic noise or errors in reconstruction is unclear.    

To explore the recombinant history of the potyviruses, we assembled a dataset containing referece genomes for 131 species of potyviruses. High-quality, full-length reference genomes for all species were obtained form NCBI GenBank. Viral genomes were carefully aligned at the codon-level and screened for recombination. The full alignment was then partitioned into several sub-alignments between each detected recombination event, such that each sub-aligment corresponds to a non-recombinant block (NRB) free of detected recombination events. Local phylogenetic trees for each NRB were then reconstructed to explore how phylogenetic relationships varied across different regions of the potyvirus genome.  

We then used our program Espalier to disentangle the phylogenetic history of the potyviruses. Espalier reconciles and removes discordances between phylogenetic trees that are likely attributable to phylogenetic error while retaining recombination events that are strongly supported by the sequence data. Applying Espalier to the potyviruses revealed that most phylogenetic discordace between local trees is likely attributable to phylogenetic noise. Removing the discordance attributable to phylogenetic error allows us to much more clearly visualize the phylogenetic history of the potyviruses.


Viral genomes were obtained for 131 Potyvirus species for which high-quality, full-length genomes were publically available. Viral genomes were retrieved from NCBI GenBank using the accession numbers provided by Gadhave et al. (2020). Full genome nucleotide sequences were aligned using MAFFT version 7 (Katoh, 2013).

K. R. Gadhave, S. Gautam, D. A. Rasmussen, and R. Srinivasan. Aphid transmission of potyvirus: The largest plant-infecting RNA virus genus. Viruses, 12(7):773, 2020. 

K. Katoh and D. M. Standley. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution, 30(4):772–780, 2013.


National Institute of Food and Agriculture, Award: 2019-67021-29932