Skip to main content
Dryad

Data from: Hepatitis C virus genotype 1 and 2 recombinant genomes and the phylogeographic history of the 2k/1b lineage

Data files

Sep 09, 2019 version files 27.70 MB

Abstract

Recombination is an important driver of genetic diversity, though it is relatively uncommon in hepatitis C virus (HCV). Recent investigation of sequence data acquired from HCV clinical trials produced 21 full-genome recombinant viruses belonging to three putative inter-subtype forms 2b/1a, 2b/1b, and 2k/1b. The 2k/1b chimera is the only known HCV circulating recombinant form (CRF), provoking interest in its genetic structure and origin. Discovered in Russia in 1999, 2k/1b cases have since been detected throughout the former Soviet Union, Western Europe, and North America. Although 2k/1b prevalence is highest in the Caucasus mountain region (i.e., Armenia, Azerbaijan, and Georgia), the origin and migration patterns of CRF 2k/1b have remained obscure due to a paucity of available sequences. We assembled an alignment which spans the entire coding region of the HCV genome containing all available 2k/1b sequences (>500 nucleotides; n=109) sampled in 19 countries from public databases (102 individuals), additional newly sequenced genomic regions (from 48 of these 102 individuals), unpublished isolates with newly sequenced regions (5 additional individuals), and novel complete genomes (2 additional individuals) generated in this study. Analysis of this expanded dataset reconfirmed the monophyletic origin of 2k/1b with a recombination breakpoint at position 3,187 (95% confidence interval: 3,172–3,202; HCV GT1a reference strain H77). Phylogeography is a valuable tool used to reveal viral migration dynamics. Inference of the timed history of spread in a Bayesian framework identified Russia as the ancestral source of the CRF 2k/1b clade. Further, we found evidence for migration routes leading out of Russia to other former Soviet Republics or countries under the Soviet sphere of influence. These findings suggest an interplay between geopolitics and the historical spread of CRF 2k/1b.