A few months after the covid-19 pandemic was declared, at the beginning of 2020, scientists sequenced the genome of the virus, the SARS-CoV-2, but many protein-coding genes were still unknown. Now, a comparative genomics study has made it possible to generate the most accurate and complete genetic map of the virus.
Made by researchers from Massachusetts Institute of Technology (MIT) and published Tuesday in the journal Nature Communications, the study has confirmed several protein-coding genes and found that others – which had been proposed as genes – did not code for any protein.
“We were able to use this powerful comparative genomics approach to evolutionary signatures to discover the true functional content of protein coding of this extremely important genome, “says Manolis Kellis, lead author of the study and professor of computer science at MIT, and a fellow at the Broad Institute at MIT and Harvard.
In a second part of the study, the research team also looked at about 2,000 mutations that have arisen in SARS-CoV-2 from the beginning of the pandemic, which allowed them to assess the importance of these mutations and their ability to evade the immune system or become more infectious.
It was known that, with almost 30,000 RNA bases, the SARS-CoV-2 genome has several regions that code for protein genes and others that were suspected but not definitively classified.
To determine which parts of the SARS-CoV-2 genome actually contain genes, the researchers turned to comparative genomics, and compared SARS-CoV-2 (which belongs to a subgenus of viruses called Sarbecovirus, which infects bats) with SARS-CoV (which caused the 2003 SARS outbreak) and 42 strains of bat sarbecovirus.
Thus, they confirmed six protein-coding genes in the SARS-CoV-2 genome, in addition to the five that are well established in all coronaviruses.
They also determined that the region that encodes a gene called ORF3a also encodes an additional gene, ORF3c, which has RNA bases that overlap with ORF3a, but are in a different reading frame, something rare in large genomes, but common in many viruses and, in the case of SARS-CoV-2, its function is not yet known.
The researchers also showed that five other regions that had been proposed as potential genes do not code. functional proteins, and ruled out that there are others to be discovered.
Furthermore, the authors found that many previous works used not only wrong gene sets, but also sometimes contradictory namesTherefore, in a recently published parallel article in the journal Virology, they presented recommendations for naming the SARS-CoV-2 genes.
In the study, the researchers also looked at more than 1,800 mutations that have emerged in SARS-CoV-2 and found that, in most cases, genes that evolved rapidly before the pandemic have continued to do so, and those that tended to evolve slowly have maintained that trend.
They also analyzed mutations that have arisen in worrisome variants, such as the British strain, the Brazilian strain and the South African strain They found that many of the mutations that make these variants more dangerous are in the spike protein, which helps the virus spread quickly and bypass the immune system.
However, each of these variants has “more than 20 other mutations, and it is important to know which of them can do something and which cannot “, cautions Irwin Jungreis, lead author of the study and a researcher at MIT.
For the authors, these data could help other scientists focus their attention on the mutations that appear to have most significant effects on the infectivity of the virus.
Eddie is an Australian news reporter with over 9 years in the industry and has published on Forbes and tech crunch.