Data from: Identification of combinatorial host-specific signatures with a potential to affect host adaptation in influenza A H1N1 and H3N2 subtypes
Khaliq, Zeeshan, Science for Life Laboratory
Leijon, Mikael, National Veterinary Institute
Belák, Sándor, OIE Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden
Komorowski, Jan, Science for Life Laboratory
Published Jun 24, 2017 on Dryad.
Cite this dataset
Khaliq, Zeeshan; Leijon, Mikael; Belák, Sándor; Komorowski, Jan (2017). Data from: Identification of combinatorial host-specific signatures with a potential to affect host adaptation in influenza A H1N1 and H3N2 subtypes [Dataset]. Dryad. https://doi.org/10.5061/dryad.kc097
Background: The underlying strategies used by influenza A viruses (IAVs) to adapt to new hosts while crossing the species barrier are complex and yet to be understood completely. Several studies have been published identifying singular genomic signatures that indicate such a host switch. The complexity of the problem suggested that in addition to the singular signatures, there might be a combinatorial use of such genomic features, in nature, defining adaptation to hosts. Results: We used computational rule-based modeling to identify combinatorial sets of interacting amino acid (aa) residues in 12 proteins of IAVs of H1N1 and H3N2 subtypes. We built highly accurate rule-based models for each protein that could differentiate between viral aa sequences coming from avian and human hosts. We found 68 host-specific combinations of aa residues, potentially associated to host adaptation on HA, M1, M2, NP, NS1, NEP, PA, PA-X, PB1 and PB2 proteins of the H1N1 subtype and 24 on M1, M2, NEP, PB1 and PB2 proteins of the H3N2 subtypes. In addition to these combinations, we found 132 novel singular aa signatures distributed among all proteins, including the newly discovered PA-X protein, of both subtypes. We showed that HA, NA, NP, NS1, NEP, PA-X and PA proteins of the H1N1 subtype carry H1N1-specific and HA, NA, PA-X, PA, PB1-F2 and PB1 of the H3N2 subtype carry H3N2-specific signatures. M1, M2, PB1-F2, PB1 and PB2 of H1N1 subtype, in addition to H1N1 signatures, also carry H3N2 signatures. Similarly M1, M2, NP, NS1, NEP and PB2 of H3N2 subtype were shown to carry both H3N2 and H1N1 host-specific signatures (HSSs). Conclusions: To sum it up, we computationally constructed simple IF-THEN rule-based models that could distinguish between aa sequences of avian and human IAVs. From the rules we identified HSSs having a potential to affect the adaptation to specific hosts. The identification of combinatorial HSSs suggests that the process of adaptation of IAVs to a new host is more complex than previously suggested. The present study provides a basis for further detailed studies with the aim to elucidate the molecular mechanisms providing the foundation for the adaptation process.
Separate alignments for each protein of subtypes H1N1 and H3N2 and for humans and avians hosts
Multiple alignments of each protein of both the H1N1 and H3N2 subtypes. A separate file for sequences coming from human and avian hosts is present. All the sequences were obtained from the NCBI database. These were used to create phylogenetic trees (Figure 3 and additional file 5 in the paper)
Combined alignment for each protein with data from all hosts and subtypes
Multiple alignments for all the protein. Each file has sequences coming from both human and avian hosts and both subtypes H1N1 and H3N2. All the sequences were obtained from the NCBI database. They were used to infer phylogenies (Figure 4 and additional file 6 in the paper).
Separate trees for each host, protein and subtype
Phylogenetic trees in Newick format. For each protein there is a separate tree which has sequences from a single host and a single subtype. The nodes in the tree are marked with the top 5 rules for the respective protein.The nodes of the trees are accession IDs of the sequences used in the creation of the phylogenies followed by the rules numbers that are supported by the sequence. The sequences used were obtained from NCBI database. The trees were created using FastTree 2.1.8.
Phylogenetic trees in Newick format. For each protein there is a tree which has sequences from both the avian and human hosts and from from both H1N1 and H3N2 subtypes. The nodes are the accession IDs of the sequences followed by the host and subtype information. The original sequences were taken from NCBI database. The trees were created using FastTree 2.1.8.