Human beings live completely surrounded by tiny and unknown neighbors. The first attempt to classify all viruses, in 1971, found only three hundred different species. The latest report, published by the International Committee on Taxonomy of Viruses, already has more than 9,000, but those are just the well-studied and baptized species, like the ones to blame for covid, AIDS, Ebola, and the flu. The actual amount is unimaginable. A scientific team has just discovered almost 132,000 more species in one fell swoop, including nine coronaviruses, thanks to a new computer tool capable of combing through gigantic genetic databases.
The researchers have reanalyzed almost six million biological samples, coming from hospitals, but also from bat caves, penguin populations and from all kinds of places where massive gene sequencing experiments have been carried out in the last decade, according to the Spanish virologist Marcos de la Pena, co-author of the work. The new tool, called Serratus, has examined 10 million gigabytes of genetic information. The novelty is that the program focuses on specific fragments of the virus sequence, something like determining if a book is new from three essential phrases.
The research began in May 2020, in the midst of the covid pandemic, with the aim of developing a free and open source platform to urgently discover new viruses. “You cannot fight against what is not known”, sums up De la Peña, a CSIC scientist at the Institute of Plant Molecular and Cellular Biology, in Valencia. “The human population does not stop growing and invading new ecosystems. There are more and more strange interactions with animals and all kinds of living things, which have their own viruses. The possibilities of new virus jumps to humans are increasing”, warns the Spanish researcher. His work is published this wednesday in the magazine Nature.
We are in diapers in virology. We have almost no idea what’s out there
Marcos de la Peña, virologist
Characterizing a virus takes time and money, as has been seen with the new coronavirus. The Chinese authorities detected the first unexplained pneumonias in December 2019, the genome of the virus was published on January 10, 2020 and the International Committee on Taxonomy of Viruses proposed the name SARS-CoV-2 on february 11 of that year. These are incompatible deadlines with the magnitude of the challenge. De la Peña recalls that the 132,000 newly discovered virus species probably barely represent 0.01% of the real total.
The virologist acknowledges that it is very difficult to certify which species the new viruses infect. “We have seen pig coronaviruses in samples taken in cornfields. What does a coronavirus paint there? The explanation is, most likely, that there was contamination of the samples with animal manure. We have a lot of information, but determining the host is complicated”, admits De la Peña, born in Valencia 49 years ago. Two German researchers, for example, have already used the program to discover two new snake viruses.
The Serratus platform is the work of fifteen scientists, headed by the geneticist Artem Babaian, from the University of Cambridge (UK). The team has characterized only a few hundred of the 132,000 new viruses, including all nine coronaviruses. De la Peña has focused on 380 viruses related to the cause of human hepatitis D. “It is a liver virus that causes a significant number of deaths. We thought it was unique, but it turns out not. We have found similar viruses in natural ecosystems, such as soils and lakes. We believe that there are similar viruses in birds, deer and bats”, he explains. “Perhaps they cannot cause a pandemic in humans, but perhaps they can in amphibians, which could generate a new viral variety that ends up reaching humans in the future,” argues De la Peña. Serratus can also help find the evolutionary origin of emerging pathogens.
The discovery of new viruses has accelerated in recent years. The international Tara Oceans expedition announced in 2019 the identification of nearly 200,000 new species of marine viruses, after a trip around the world in which Spanish scientists participated, such as the microbiologist Silvia G. Acinas. In February 2021, researchers at the European Molecular Biology Laboratory found 140,000 virus species living in the human digestive system, half of them unknown until then.
De la Peña clarifies that in these great previous announcements, viruses that exclusively infect bacteria, the so-called bacteriophages, predominated. The 132,000 new viruses discovered by the Serratus platform are of the type that most affects animals, plants and fungi: RNA viruses, the group to which the covid coronavirus belongs. The Spanish virologist recalls that the Russian biologist Dmitri Ivanovski described the first virus in 1892, responsible for a disease of tobacco plants. It was an RNA virus. “In more than a century we had only identified 15,000 RNA viruses,” says De la Peña. “We are in diapers in virology. We have almost no idea what’s out there. With a single piece of work, we have multiplied by 10 the number of RNA virus species that we knew of. And this is just the beginning,” he adds.
The virologist Rafael Sanjuán, one of the greatest experts in Spain on the evolution of viruses, applauds the new study, in which he has not participated. The researcher recalls that in previous works the concept of “viral dark matter” had been coined, in reference to the genetic information that was suspected to belong to a virus, but that could not be classified as such because it had no recognizable similarities with other viruses. virus. “Using high-performance computing, in this work the authors manage to bring to light large amounts of new viral sequences,” celebrates Sanjuán, from the Institute of Integrative Systems Biology, in Valencia.
Sanjuán, who has just received almost 2.5 million euros from the EU to investigate threatening viruses hidden in wild animals, stresses that many of the sequences identified by the Serratus platform are partial: they only report part of the viral genome. “Furthermore, in many cases it is not possible to know which hosts these viruses infect. We also don’t know much about how these viruses work. To address these issues, other tools will be needed, such as the co-sequencing of host genes and the virus that allows them to be paired, as well as synthetic biology, which allows us to reconstruct some aspects of the infective cycle of these viruses in the laboratory,” says Sanjuán. “Serratus will serve as a starting point for a more precise identification of new viruses.”
Eddie is an Australian news reporter with over 9 years in the industry and has published on Forbes and tech crunch.