Why Do Genes and Mutations Matter in SARS-CoV-2?

Why should we concern ourselves with genetics and mutations in SARS-CoV-2? There are many reasons why these are of interest, including the rate at which mutations are occurring. One reason is that mutations can result in immune evasion. Changes can occur in recognition sites (epitopes) for antibodies and for cytotoxic T cells, and these could, in theory, make vaccines less effective. In fact, our immune response tends to select for exactly these kinds of mutational changes in the virus. Another reason for our interest is that mutational signatures allow us to identify chains of transmission and origins of clusters of related mutants. Comparisons of genome sequences between current and previously circulating strains  may allow us to trace the current pandemic back to a common ancestral virus and, perhaps, identify its natural source, making it easier to anticipate and better deal with the next “spillover” event. Mutations may allow a virus to become more or less virulent or transmissible. Knowledge of the sequence of viral genes can lead to the prediction of the 3D structure of viral proteins, such as the RNA-dependent RNA polymerase and viral proteases, and inform the development of effective antiviral drugs. Sequence analysis will also allow us to characterize drug resistant mutants and develop alternative drugs. Let’s consider some of these points in a little more detail.

First, though, RNA viruses generally have a high mutation rate. Some mutations have no effect on the eventual protein sequence and are called silent or synonymous. Others alter the amino acid sequence of the protein and can result in a neutral effect or an increase or decrease in protein function. These are called non-synonymous, and most have no effect on protein function. Their RNA-dependent RNA polymerases have a high intrinsic error rate. SARS-CoV-2 seems to have a mutation rate much lower than many RNA viruses. This is because it has a proofreading nuclease that removes misincorporated bases. This is both good and bad news. It’s good because it makes the virus less likely to generate escape mutants from the immune response or from antiviral drugs. It’s potentially bad news because it could cause difficulties for drugs that target the viral polymerase by causing misincorporation of modified nucleotides, such as remdesivir. Fortunately in spite of this, remdesivir causes chain termination of RNA synthesis in an in vitro SARS-CoV-2 RNA replication system three bases downstream of its site of incorporation (1), so perhaps it won’t be a problem.

Is the virus mutating to become more infectious or pathogenic? Early in the epidemic a claim was made that there were two genetically distinguishable forms of SARS-CoV-2, called the S and L forms, in which the S form was ancestral to the L form and that the L form was spreading more aggressively, based on a relative frequency of 30% for the S form and 70% for the L. This has been debunked as being reflective of limited sampling, skewing the results, as well as founder effects, in which a single infected traveler seeds a new community with a single viral genotype. Later analyses showed the S form again becoming more predominant. More recently, similar observations have been made using a larger number of samples and focusing on the gene encoding the spike protein. They suggested that a mutation in the spike protein is becoming more dominant, thus, increasing a genotype variation, but here the same criticism would apply. Interestingly, they also present evidence for recombinant forms of SARS-CoV-2, suggesting some people become infected with more than one genotype. Nothing in the data reports differences in pathogenicity among different genotypes.

What about the origins of the virus? This is unclear and may remain so. Not only does the high number of coronavirus species make finding the exact ancestor problematic, but the proclivity of coronaviruses to recombine with each other adds considerable difficulty. The possibility that it was genetically engineered seems unlikely (2), among other reasons, because it is not optimally engineered. Two possibilities remain: that it directly infected humans or that it first infected an intermediate species, was adapted, and then infected humans, perhaps in the Wuhan wet market. The closest known related bat virus is more than 96% identical by nucleotide sequence to SARS-CoV-2. While this sounds quite close, its spike protein diverges in the receptor binding domain, suggesting that it may not bind efficiently to the human receptor, ACE2.. The Wuhan Institute of Virology scientists had a continuing study in which they collected bats and their blood and feces from caves in Yunnan Province to identify and catalog coronaviruses in response to SARS-1, and, in anticipation of future pandemics. Could one of them have become infected by inadvertently inhaling bat guano? It is a short high-speed train ride to Wuhan. Or could local villagers have acquired the virus? Bat guano is used in traditional Chinese medicine and is sometimes put in the eyes. You can buy it on Amazon. The virus may have been rare in humans until recently. Usually, however, transmission occurs through an intermediate host; civet cats with SARS-1 and camels with MERS. A fascinating book on the subject is Spillover, by David Quammen. Although pangolins have been suggested as an intermediate host, there is no convincing evidence so far. Perhaps in this case, humans are serving as the intermediate hosts. The definitive answer will require further sequencing efforts or it may never be answered conclusively.  Genetic studies of the virus have shown worldwide geographic routes of transmission that have occurred; see https://nextstrain.org/ncov/global.

Genomic sequences that have specific mutations can be thought of as having a convenient barcode that allows chains of transmissions to be traced. This can be highly valuable to epidemiologists. For example, a man who had traveled from China and arrived in Washington state in January tested positive for SARS-CoV-2. His virus had three distinct mutations that identified Wuhan as its origin. More than a month later, a high school student tested positive. The virus had the same mutations plus some new mutations. This led to the conclusion that the virus had been circulating in the general population during that time interval. Infections on the Grand Princess had the same mutational signature. Similar studies showed that the epidemic in New York had its origins primarily in travelers from Europe. The epidemic in Europe was mostly seeded by viruses from China. California virus samples also showed similarity to viruses from China but were distinct from the Washington state cluster, indicating a separate introduction, probably a bit earlier than those in Washington.

Another useful aspect of genetic studies is that they provide tools to understand potential targets for the immune system and by doing so, facilitate the development of vaccines. As an example, a study looked at B and T cell epitopes identified for SARS-CoV1 infections, then researchers looked for genetically similar regions in SARS-CoV-2 to identify possible immune targets (3). The authors further noted that these parts of the genome, which encode structural proteins of the virus, show very little variation among a large number of isolates, suggesting they may represent regions where the virus does not have much mutational latitude. Another method uses nucleotide sequences to determine the epitope , then synthesizes an overlapping set of peptides representing subsets of the epitope. These synthesized peptides are then be tested against various antisera, including natural immune sera, to identify regions of the protein that are immune targets. The viral genes can be cloned and expressed as protective antigens or directly synthesized proteins can be used for vaccine development…. Nucleotide sequence analysis, along with recombinant DNA techniques, have become extremely efficient, powerful and inexpensive, and are one of the main reasons that our understanding of emerging viruses comes as rapidly as it now does.

  1. C. J. Gordon et al., Remdesivir is a direct-acting antiviral that inhibits RNA-dependent RNA polymerase from severe acute respiratory syndrome coronavirus 2 with high potency. J Biol Chem, (2020).
  2. K. G. Andersen, A. Rambaut, W. I. Lipkin, E. C. Holmes, R. F. Garry, The proximal origin of SARS-CoV-2. Nat Med 26, 450-452 (2020).
  3. S. F. Ahmed, A. A. Quadeer, M. R. McKay, Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies. Viruses 12, (2020).

View All GVN SARS-CoV-2 Perspectives