Evolution of SARS-CoV-2

From MicrobeWiki, the student-edited microbiology resource
Revision as of 05:35, 17 April 2023 by Unknown user (talk)



By Christian Harris

Background

Virus Mutations and Recombination


Virus particles consist of genetic material (DNA or RNA) as well as proteins and enzymes necessary to enter a host cell, replicate their genome, and produce more virus particles. The genome of both DNA and RNA viruses can be single or double stranded, with a segmented, circular, or linear structure [17]. Viruses can only replicate within a host due to their inability to perform metabolism and lack of machinery [15, 17]. When comparing genome size to that of living organisms, viruses are vastly smaller, especially those of the RNA variety [16].

DNA viruses have mutation rates similar to that of eukaryotic organisms because they possess enzymes with proofreading functions. Mutation rates are also impacted by the presence of mutagens (such as UV light, X-rays, and some chemicals), natural behavior of bases, and mistakes in nucleic acid replication [15]. RNA viruses typically lack proofreading capabilities, however some “species,” such as coronavirus, have those capabilities [15, 1].

Another process viruses utilize to evolve their genome is recombination. Recombination is a major evolutionary force driving viral and microbial adaptation driving the spread of antibiotic resistance, drug resistance, and immunity and vaccine escape [36]. Viral recombination occurs when two parent strains “coinfect” the same host cell, resulting in a new strain with genes from both parents. Successful recombinations mostly occur when the parents strains are the same virus type, having a similar genome [15]. The two mechanisms of recombination are independent assortment and incomplete linkage [15]. Independent assortment occurs when a segment of one virus's genome is incorporated into the genome of another virus. When replication transpires within the host cell, genome segments are unlinked and assort randomly [15]. When progeny viruses are formed, it has a chance of containing segments from both parent viruses [17]. Incomplete linkage arises when a genome of one virus is broken within the host cell and is added to the genome of another virus. Successful recombinations can result in the emergence of novel progeny viruses [15] which may have altered characteristics to that of either parent virus [17].

General Coronavirus Information


Coronavirus is a virus that belongs to the order, Nidovirales, family Coronaviridae, and the subfamily Orthocoronavirinae [1]. The coronavirus subfamily consists of 4 genera: alphacoronavirus, betacoronavirus, deltacoronavirus, and gammacoronavirus [7]. While mutations may occur to adapt to new hosts, alphacoronavirus and betacoronavirus primarily infect mammals while gammacoronavirus and deltacoronavirus primarily infect birds and mammals [2]. The name coronavirus stems from its corona-like appearance caused by spike proteins when viewed under an electron transmission microscope [2]. Coronaviruses are about 80-120 nm [3] in diameter and store their genetic information as positive-sense single-stranded mRNA [1]. The mRNA carried by coronaviruses can be translated and replicated immediately upon entering a host cell [3]. Their genome ranges between 27 and 32 kilobases which is more than twice the size of other conventional RNA viruses [4]. Coronaviruses possess a proofreading enzyme, the nonstructural protein (NSP) 14 endonuclease, which significantly reduces their mutation rate to a degree similar to DNA viruses and mediating recombination [1]. The low mutation causes the size of the genome to increase since old genes may still have a future purpose and are unlikely to change. The genome consists of 5’ and 3’ untranslated regions (UTR), and at least six core open reading frames (ORFs). ORF1a and ORF1b comprise approximately two-thirds of the genome and are responsible for regulating transcription and replication [1]. The other 4 ORFs create the coronavirus structure, encode for a spike protein, an envelope protein, a membrane protein, and a nucleocapsid protein [1]. Accessory ORFs code for proteins that are not necessary for all coronaviruses to flourish, but may better facilitate success under certain environmental conditions, indicating different variants or sub variants [1]. The effects of coronavirus infections largely infect the host’s respiratory tract or enteric nervous system, however some variants are known to infect other organ systems [2].

Coronavirus history


It is estimated that the common ancestor of coronaviruses appeared roughly 10,000 years ago and gave rise to the 4 genera 4000 to 5,000 years ago [2]. Bats and birds, warm blooded flying vertebrates were ideal gene sources for coronavirus. Bats gave rise to the genera Alphacoronavirus and Betacoronavirus and birds gave rise to the genera Gammacoronavirus and Deltacoronavirus [7], explaining why Alphacoronavirus and Betacoronavirus tend to infect mammals and take precedence in research [8]. The first human coronaviruses (B814) was first discovered in 1965 causing a common cold, however it wasn’t until a few years later that the coronavirus classification was created. Within the coming decades, two new human coronavirus variants (HCoV0229E and HCoV-OC43) were discovered and caused a cycle of epidemics, however symptoms were generally mild [9, 10]. Also, the list of animals that had been infected by coronaviruses grew rapidly to include rats, mice, chickens, turkeys, calves, dogs, cats, rabbits and pigs [9]. The symptoms experienced by infected animals were more severe than that experienced by humans at the time, including (but not limited to): gastroenteritis, hepatitis, encephalitis, and pneumonitis [9]. It wasn’t until 2002 that coronavirus was considered highly pathogenic to humans when severe acute respiratory syndrome coronavirus (SARS-CoV-1), transferred from infected bats to people, broke out in Southern China [7, 12]. SARS-CoV-1 was spread throughout Asia, which has infected 8422 people and caused 916 deaths [13]. Since then, many new strands of coronavirus have been reported, some of them threatening. The most threatening new animal coronavirus strains are swine acute diarrhea syndrome coronavirus (SADS-CoV), swine transmissible gastroenteritis virus (TGEV), porcine epidemic diarrhea virus (PEDV), and avian infectious bronchitis virus (IBV) [10]. Human coronaviruses continued to emerge with the introduction of HCoV-NL63, HCoV-HKU1, and Middle East respiratory syndrome coronavirus (MERS-CoV) which was transferred from infected camels [10,14]. The variants HCoV-NL63, HCoV-HKU1 proved to be similar to HCoV0229E and HCoV-OC43 as hosts only experienced mild symptoms. MERS-CoV has caused 858 deaths since 2012 when it spread throughout parts of the Middle East and Africa [10]. In late 2019, another severe acute respiratory syndrome coronavirus (SARS-CoV-2), a betacoronavirus, surfaced and spread globally.

SARS-CoV-2


It is believed that severe acute respiratory syndrome coronavirus (SARS-CoV-2) originated from a type of bat coronavirus due to genomic similarities [18], however the cause of the first human infection of SARS-CoV-2 is still unknown [18]. Some have hypothesized that the virus may have originated within a lab, however there is little to no such evidence. Another more likely hypothesis is that the virus was directly transferred from an animal, most likely some intermediate host, to a person [18, 19]. SARS-CoV-2 is far more likely to infect humans, cats, and cattle than bats (which have similar but different ACE2 receptors [23]), meaning an intermediate host was likely for mutations in the spike proteins to occur [19]. SARS-CoV-2 enters a host cell by binding their spike proteins to angiotensin-converting enzyme 2 (ACE2) receptors which are vital for mammals to sustain a stable blood pressure [20]. For this reason, it is able to infect a wide variety of animals and is said to have a ‘generalist’ nature [20]. Spillover occurs from humans to animals as evident by the minimal adaptation required for SARS-CoV-2 to infect mink and deer [20]. For the same reason, it can back-spill to human populations as was seen at Dutch mink farms [21].


Comparison of SARS-CoV-2 to other severe HCoVs


SARS-CoV-2 is the most well known human respiratory coronavirus as it causes coronavirus disease-19 (COVID-19), however other types, such as SARS-CoV-1 and MERS-CoV, came before [25]. Clinical symptoms of SARS-CoV-1 are similar to that of SARS-CoV-2 and MERS-CoV is proven to be worse [25]. The exact case fatality rate (CFR) is difficult to measure because those with mild symptoms are less likely to seek professional help and thus be recorded. The estimated CRF of SARS-CoV-2 was 8.7% in 2020 [26], which is similar to SARS-CoV-1, but smaller than MERS-CoV which are 10% and 34% respectively [25]. As of recently (April 16th of 2023), the estimated CRF is less than 1% worldwide [27]. During the incubation period (time between infection and symptoms), viruses can still be spread [31]; however these viruses have similar incubation periods of about three to four days [25, 30]. What makes SARS-CoV-2 unique is its abilities to spread through aerosols (from normal breathing) and respiratory droplets (coughs and sneezes) [28] and remain viable as an aerosol for upwards of two hours [25]. SARS-CoV-1 also has the capabilities to spread through respiratory droplets, however little evidence has been shown regarding their ability to spread through aerosols [29] and the majority of transmission occurred through direct contact [25-57]. MERS-CoV is thought to spread between people through respiratory droplets, however there is little documentation to fully understand. In fact, most cases of MERS-CoV can be traced back to direct contact with infected camels [30]. The high transmission rates can be explained by the unique structure of the SARS-CoV-2 S protein. The S protein of SARS-CoV-2 and SARS-CoV-1 bind to the ACE2 receptor while MERS-CoV spike proteins bind to dipeptidyl-peptidase 4 (DPP4) [25]. The S protein of the SARS-CoV-2 has 12 extra nucleotides within the N-terminal domain, creating a furin-like cleavage site at the S1/S2 boundary, facilitating the priming of the S protein [25, 34]. In SARS-CoVs, the receptor-binding motif (RBM), is the main functional motif that forms the binds with the human ACE2 (ACE2) receptor [44]. When comparing the SARS-CoV-2 genome to that of bat coronaviruses, they found high similarities to the variants bat-SL-CoVZC45 and SL-CoVZXC21 with similarities of 88.0% and 87.2% respectively [25-89]. The greatest sequence variation is found in the S gene at 75%, demonstrating the belief that changes in the S protein are the primary cause of its success [25-89].

Production of SARS-CoV-2 Variants


As mentioned previously, coronaviruses possess an NSP 14 endonuclease enzyme, which significantly reduces their mutation rate to a degree similar to DNA viruses and recombination [1]. It has been estimated that the mutation rate per replication cycle per site is 3x10-6 [32], however the spike gene mutation rate is four to five times higher than the rest of the genome [33]. To understand why SARS-CoV-2 has many variants with a low mutation rate, we must understand that the more a virus circulates in a population, the more likely a fitness-increasing mutation can occur [48]. An infected individual will produce an estimated 3x105 to 3x108 “infectious units” [33] and over 680 million COVID-19 cases have been reported [35].

Coronaviruses display a particularly high (incomplete linkage) recombination rate with the help of the NSP 14 endonuclease enzyme and SARS-CoV-2 is no exception [33]. Early in the COVID-19 pandemic, it was difficult to determine recombination events in SARS-CoV-2 because there was little genetic variation. Since then, variation in the genomes have increased, making it easier to spot these variants [38]. A study conducted late in 2022 identified 589 recombination events out of 1.6 million samples, indicating that 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Similar to mutations, recombination rates are more likely to occur near the spike protein gene. Up to this point, these recombinant variants identified have not been particularly successful [36], however this could change with the emergence of XB.1.5 [58].


SARS-CoV-2 Variants


When SARS-CoV-2 first emerged in 2019, there was one prevalent strain. Since then, a multitude of variants have been found within patients and sequenced [24]. The majority of variants are not rigorously studied and quickly go extinct [36], however some are more pathogenic or transmissible than older variants and require further attention. The following variants were primary causes of COVID-19 throughout the pandemic and were listed as variants of concern (VOC) by the CDC at one point [46].

Alpha (B.1.1.7), Beta (B.1.135), and Gamma (P.1)


The Alpha, Beta, and Gamma (which are different from alphacoronavirus, betacoronavirus, and gammacoronavirus) variants were the first of the detected SARS-CoV-2 variants that had the potential to prolong the pandemic [37]. The Alpha variant (B.1.1.7 lineage) was discovered in Great Britain in late 2020 [39]. This strain quickly spread around the world and was the dominant one in North America [40, 37]. B.1.1.7 originally exhibited seventeen important mutations including amino acid replacements and deletions on the spike protein, ORF1ab, ORF8, and the nucleoprotein [41]. B.1.1.7 originally exhibited seventeen mutations which included several RBD mutations including N501Y and P681H, as well as several non-spike mutations [43]. The N501Y mutation greatly increases the affinity to the ACE2 receptor [51] and the P681 mutation increases resistance to beta interferon [52]. These mutations also decreased the amount of virions required to establish an infection [42], boosting the transmission rate by 50% and severity by 48% [37, 43]. Sub-lineages of B.1.1.7 were quickly discovered in Germany and the Czech Republic with two additional mutations on the spike gene [41]. By September of 2021, the widespread transmission of B.1.1.7 had ended [39].

The Beta variant (B.1.135 lineage) was first documented in South Africa in May 2020 [37] and was particularly prominent in Africa, accounting for upwards of 80% of cases during its peak, but was also rapidly spread throughout Europe and Asia [39]. B.1.135 exhibited nine mutations in spike protein which included N501Y, E484K and K417N [43]. These mutations together increased the affinity ten-fold compared to the original variant [51, 53] It was also far more severe than the Alpha variant with a 59% increase in mortality rate [43], reduced antibody treatment effectiveness [47]. By November of 2021, the widespread transmission of B.1.135 had ended [39].

The last of the three original variants to emerge was gamma (B.1.1.28.1 lineage under alias of P.1) which was discovered in Brazil in November of 2020 and faded in late 2021[39] . P.1 exhibits eight lineage-defining mutations including three mutations in the receptor-binding domain: L18F, N501Y, E484K and K417T (different from K417N) [43]. The K417T mutation causes a minor difference in the spike protein structure compared to the K417N mutation, so the combination of the N501Y, E484K and K417T mutations also increases affinity ten-fold [53]. The L18F mutation is responsible for the binding potential of neutralizing antibodies [54]. This also lead to a higher rate of hospitalization and morbidity [37].

Delta (B.1.617.2)


In late 2020, the Delta SARS-CoV-2 variant (B.1.617.2) was discovered in India and would go on to become the predominant variant globally [40]. The Delta variant has 23 mutations, with the most important being E484Q and L452R RBD mutations, a P681R cleavage site mutation, and mutations on ORF3 and ORF7 [43]. The L452R mutation increases the affinity of the spike protein for the ACE2 receptor [49] while also decreasing the immune system's recognition capabilities [50]. These mutations resulted in drastic improvements in transmission abilities and antibody treatment effectiveness [47]. For almost a year, the B1.617.2 was by far the most common variant worldwide. While previous variants had accounted for up to 80% of SARS-CoV-2 cases within a continent at a given time, the Delta variant accounted for nearly 100% of cases in every continent excluding Africa [39]. There is no surprise that the Delta lineage gave rise to many sublineages, most notably AY.4.2 [55]. This sublineage is believed to be 10% more contagious than other variants due to a mutation in the N-terminal domain [56]. This reign of prominence lasted between July and December of 2021, however cases continue to be reported in 2023 [39].

Omicron (B.1.1.529)


In November of 2021, the Omicron variant was documented and has since been the primary variant of concern. As of April 16th, 2023, omicron is the only COVID-19 causing variant to be labeled as a “Variant of Concern” according to the CDC [46]. The original strain of Omicron was far more transmissible than previous strains due to upwards of 30 mutations that operated on the spike protein [40]. It spread rapidly and produced infectious sub variants[40]. The predominant sub variant has been BQ.1.1 since its emergence in July of 2022 [57], until a new subvariant created from the recombination of Omicron sub lineages, XB.1.5, was discovered in late 2022. The XB.1.5 subvariant is the most transmissible so far [58], accounting for approximately 83% of the COVID-19 cases in the United States in April of 2023 [59]. The new sub variants XB.1.16, XB.1.9.1, and XB,1.9.2 have also begun to spread throughout the United States and are projected to account for a higher percentage of COVID-19 cases in the future [59].

The future of SARS-CoV-2


Vaccines have proven to be an effective way of limiting the transmission and symptoms of SARS-CoV-2 [40]. Methods of creating these vaccines differ, however all produce T-lymphocytes and B-lymphocytes that can effectively fight the virus [65]. Over 13 billion vaccinations have been provided worldwide [60] which is much larger than the human population of 8 billion [61]. This can be misleading because there are two doses and a booster available [63]. In fact, only 70% of the US population has completed the primary vaccination series and only 17% have received the booster vaccine (which has been available since September of 2022) [63]. When COVID-19 vaccines were first developed, it was intended to protect against a variety of variants [62] which explains why the Alpha, Beta, and Gamma SARS-CoV-2 variants are no longer variants of concern, however newer variants are. Vaccines are not perfect, as ‘breakthrough infections’ by the Delta and Omicron variants have occurred in vaccinated individuals, however the symptoms are less severe and transmission is less likely [40]. Until the vaccination rates rise, variants of SARS-CoV-2 will continue to spread[64]. Also, a substantial amount of effort and resources have been dedicated towards the surveillance of new mutations and recombination events [66]. During the start of the COVID-19 pandemic, the genomic data generation has exceeded the capacities of existing analysis platforms, making it difficult to provide real-time analysis on viral evolution [66, 67]. Since then, new methods of tracking recombination events in pandemic-scale phylogenies have been developed [67], allowing for the necessary surveillance [66].

Reputable Sources Updated Regularly


Over the course of three and a half years, many variants have been discovered and wreaked havoc on the human population. This page serves as a review of SARS-CoV-2 up till April 16, 2023, however SARS-CoV-2 will continue to evolve.

Important SARS-CoV-2 Variants with Classification: https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-classifications.html

Information about Variants of Concern and a list of all Variants seen within a year: https://cov-lineages.org

SARS-CoV-2 Variant Proportions within the United States: https://covid.cdc.gov/covid-data-tracker/#variant-proportions

SARS-CoV-2 and SARS-CoV-2 Vaccination Statistics: https://ourworldindata.org/covid-vaccinations