So Where Did the Virus Come From, Anyway?

The question of where viruses involved in spillover zoonoses originate is always important and fascinating. Identifying their origin is critical in understanding how zoonotic epidemics originate, and in preparing preventive strategies for future epidemic. Fascinating, because identification, like solving a difficult whodunit, involves a series of forensics including those from epidemiology, molecular genetics, phylogenetics, and the study of reservoir and intermediate hosts (i.e., wild and domestic animals). Finding the immediate animal precursor virus is normally difficult; indeed, coronaviruses compound the difficulty because they have a predilection for recombination (1), constantly creating novel viruses, and making it necessary to identify more than one ancestral virus; unless, one has the rare good fortune of finding the exact parental recombinant. Of course, after the ancestral virus(es) has been identified, there remains the question of where and when did it first get into the human leading to the secondary transmission.

In this time of inexpensive and rapid genomic sequencing, the entire genome of the virus causing COVID-19 was rapidly obtained and published, identified as a beta coronavirus (2), and designated as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was distantly related to SARS-CoV-1, basically refuting the idea that it had simply mutated from SARS-CoV1. A much more closely related virus (>96% over its entire genome) was sequenced from horseshoe bats living in a cave in Yunnan province, called CoV RaTG13b (3). Large stretches of the CoV RaTG13 were virtually identical to SARS-CoV-2; although, other stretches appeared to be sufficiently distinct as to rule out a direct ancestral relationship. Some attention has also been focused on pangolins, an endangered species that also carry coronaviruses, and that is imported illegally into China in great numbers for food and traditional medicine. One coronavirus isolated from a Malayan pangolin confiscated in Guangdong province was >90% identical to that of SARS-CoV-2 over its whole genome, but was clearly not a direct precursor (4).

Recently a phylogenetic analysis has provided a reasonably convincing explanation for how SARS-CoV-2 was originally formed. The study shows that the virus is likely a triple recombinant, containing genetic elements of two bat coronaviruses and a pangolin coronavirus. Likely recombinational break points are before and after the angiotensin I Converting Enzyme 2 (ACE2) receptor binding motif in the spike protein of SARS-CoV-2. The receptor binding motif of SARS-CoV-2 is virtually identical to that of an isolate from pangolins. The upstream portion of the ORF1A region appears to have been acquired from a second bat coronavirus. Interestingly, the authors present data that SARS-CoV-1 is also the product of multiple recombination events involving different bat coronaviruses. Another study has analyzed in-depth evolutionary history of bat coronaviruses, and their ability to jump species to better understand how and where zoonoses are most likely to occur. Importantly, the SARS-CoV-2 spike protein displays a unique feature, namely the presence of a furin cleavage site insertion (PRRA) at the junction of two subunits of the S protein. Neither the bat beta coronaviruses, nor the pangolin beta coronaviruses sampled thus far have polybasic cleavage sites.  No animal coronavirus has been identified that is sufficiently similar to have served as the direct progenitor of SARS-CoV-2; although, the diversity of coronaviruses in bats and other species is massively under sampled (6).

All of the above has bearing on the evolution of virus. No clear answer for the origin of the virus has been concluded prior to human infection. At first, it seemed most likely to have emerged from the wet market in Hunan, but follow-up analyses of the market and the animals present have not yielded proof. This quickly gave rise to conspiracy theories that the virus was generated in a virology lab in Wuhan.  The possibility that the virus could be man-made was fairly convincingly refuted by Anderson et al., (6) and is not supported by academic data. What then was the direct source of the virus, and when did the virus first enter humans?

Given the prevalence of recombination among bat coronavirus and the myriad of possibilities for the introduction of SARS-CoV-2, it is probable that sample collection from diverse hosts and genomic sequencing analysis need to be continue to identify the origin of SARS-CoV-2. However, global spread of these viruses and their potential for cross-species transmission makes it clear that it is possible to have waves of COVID-19 outbreaks. The important thing is how well we be prepared for the next one.

  1. B. Hu et al., Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathog 13, e1006698 (2017).
  2. F. Wu et al., A new coronavirus associated with human respiratory disease in China. Nature 579, 265-269 (2020).
  3. P. Zhou et al., A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270-273 (2020).
  4. T. Zhang, Q. Wu, Z. Zhang, Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak. Curr Biol 30, 1346-1351 e1342 (2020).
  5. V. D. Menachery et al., A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat Med 21, 1508-1513 (2015).
  6. K. G. Andersen, A. Rambaut, W. I. Lipkin, E. C. Holmes, R. F. Garry, The proximal origin of SARS-CoV-2. Nat Med 26, 450-452 (2020).
  7. N. Wang et al., Serological Evidence of Bat SARS-Related Coronavirus Infection in Humans, China. Virol Sin 33, 104-107 (2018).


View All GVN SARS-CoV-2 Perspectives