Phylogenetic tree of S-protein genes of selected betacoronaviruses
Data files
Apr 17, 2020 version files 275.08 KB
-
S-protein-genes-tree.nex
Abstract
The emergence of SARS-CoV-2 has resulted in more than 200,000 infections and nearly 9,000 deaths globally so far. This novel virus is thought to have originated from an animal reservoir, and acquired the ability to infect human cells using the SARS-CoV cell receptor hACE2. In the wake of a global pandemic it is essential to improve our understanding of the evolutionary dynamics surrounding the origin and spread of a novel infectious disease. One way theory predicts selection pressures should shape viral evolution is to enhance binding with host cells. We first assessed evolutionary dynamics in select betacoronavirus spike protein genes to predict where these genomic regions are under directional or purifying selection between divergent viral lineages at various scales of relatedness. With this analysis, we determine a region inside the receptor-binding domain with putative sites under positive selection interspersed among highly conserved sites, which are implicated in structural stability of the viral spike protein and its union with human receptor hACE2. Next, to gain further insights into factors associated with coronaviruses recognition of the human host receptor, we performed modeling studies of five different coronaviruses and their potential binding to hACE2. Modeling results indicate that interfering with the salt bridges at hot spot 353 could be an effective strategy for inhibiting binding, and hence for the prevention of coronavirus infections. We also propose that a glycine residue at the receptor binding domain of the spike glycoprotein can have a critical role in permitting bat variants of the coronaviruses to infect human cells.
Methods
The most similar genomes to SARS-CoV-2 MN908947 were retrieved using BLASTp vs the NR database of GenBank. Genomes were then aligned using MAUVE and the S-protein gene was trimmed. The extracted genomic sections were aligned using a translation align option of Geneious with a MAFFT plugin. The phylogenetic reconstruction of S-protein genes was performed with PhyML, using a GTR+I+G model, using 100 non-parametric bootstrap replicates.