Zoonotic infectious diseases such as influenza continue to pose a grave threat to human health. However, the factors that mediate the emergence of RNA viruses such as influenza A virus (IAV) are still incompletely understood. Phylogenetic inference is crucial to reconstructing the origins and tracing the flow of IAV within and between hosts. Here we show that explicitly allowing IAV host lineages to have independent rates of molecular evolution is necessary for reliable phylogenetic inference of IAV and that methods that do not do so, including ‘relaxed’ molecular clock models, can be positively misleading. A phylogenomic analysis using a host-specific local clock model recovers extremely consistent evolutionary histories across all genomic segments and demonstrates that the equine H7N7 lineage is a sister clade to strains from birds—as well as those from humans, swine and the equine H3N8 lineage—sharing an ancestor with them in the mid to late 1800s. Moreover, major western and eastern hemisphere avian influenza lineages inferred for each gene coalesce in the late 1800s. On the basis of these phylogenies and the synchrony of these key nodes, we infer that the internal genes of avian influenza virus (AIV) underwent a global selective sweep beginning in the late 1800s, a process that continued throughout the twentieth century and up to the present. The resulting western hemispheric AIV lineage subsequently contributed most of the genomic segments to the 1918 pandemic virus and, independently, the 1963 equine H3N8 panzootic lineage. This approach provides a clear resolution of evolutionary patterns and processes in IAV, including the flow of viral genes and genomes within and between host lineages.
Figure 1 GTR.zip
Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files for the HSLC analyses (GTR+gamma substitution model), (3) the maximum clade credibility (MCC) tree files, and (4) a PDF of the full tree figures corresponding to the 8 panels in Figure 1.
Figure 1, GTR.zip
Figure 1 3rd sites.zip
Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files for the HSLC analyses, and (3) the maximum clade credibility (MCC) tree files for the analysis of 3rd codon position sites of the Figure 1 data sets.
Figure 1, 3rd sites.zip
Figure 1, SRD.zip
Contains (1) the BEAST XML input files for the HSLC analyses of the Figure 1 data under the SRD06 substitution model and (2) the resulting maximum clade credibility (MCC) tree files. The alignments used are the same as those in Figure 1 GTR.zip.
Figure 1, subsampled.zip
Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files for the HSLC analyses, and (3) the maximum clade credibility (MCC) tree files for H3 and N8 data sets. The “stemfalse” files assume that the stem branch of the equine H3N8 clade evolved at the avian substitution rate, while the other files assume this branch evolved at the equine rate.
UCLD relaxed clock.zip
Contains (1) the BEAST XML input files for the UCLD relaxed clock analyses (SRD06 substitution model), and (2) the resulting maximum clade credibility (MCC) tree files. The alignments used are the same as those in Figure 1 GTR.zip.
Randomly sampled sequences.zip
Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files for the HSLC analyses (SRD06 substitution model), and (3) the maximum clade credibility (MCC) tree files for alignments of randomly sampled sequences (created independently of the Figure 1 data sets).
randomly sampled sequences.zip
Figure 1 data plus new sequences.zip
These files are based on the Figure 1 data sets with the addition of the three newly sequenced complete genomes from this study (A/chicken/Japan/1925, A/duck/Manitoba/1953, and A/equine/Detroit/3/1964). For PB1, several sequences from South America were also added. Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files for the HSLC analyses (SRD06 substitution model), (3) the maximum clade credibility (MCC) tree files, and (4) a PDF of the full tree figures corresponding to the 8 panels in Figure 1 (except with the additional sequences and using the SRD06 substitution model).
H3 and N8.zip
Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files for the HSLC analyses, and (3) the maximum clade credibility (MCC) tree files for H3 and N8 data sets. The “stemfalse” files assume that the stem branch of the equine H3N8 clade evolved at the avian substitution rate, while the other files assume this branch evolved at the equine rate.
Figure3.zip
Contains (1) the nexus format sequence alignments, (2) the BEAST XML input files and (3) the resulting maximum clade credibility (MCC) tree files for the HA and NA diversity analyses reported in Figure 3.
Figure 3.zip
Giant Alignment.zip
Contains the FASTA format sequence alignments for the IAV sequence data preparation.