The revolution that Artificial Intelligence has brought about for the life sciences has been considered by Science magazine the main scientific advance of the year 2021.
The prestigious journal highlights the feat carried out by the DeepMind research laboratory, in association with the EMBL European Bioinformatics Institute (EMBL-EBI), which has created the most complete map of human proteins using Artificial Intelligence (AI).
This development solves one of nature’s most perplexing challenges: predicting the complex three-dimensional shape in which a chain of amino acids will fold when it becomes a functional protein.
Before the Science journal award, some scientists had compared the potential impact of this development to that of the Human Genome Project (1990-2003), which achieved a complete sequence of 90 percent of the three billion base pairs of the human genome.
It got Because of the new development, it is even more difficult: to determine the structure of more than 20,000 human proteins, more than double the number available until now, as well as of almost all the proteins produced by 20 reference organisms.
First computational method
It is the first computational method that can regularly predict protein structures with atomic precision, even when no similar structure is known, its discoverers noted last July, when they published in Nature the results of their work.
There are currently around 180,000 protein structures available in the public domain, each produced by experimental methods and accessible through the Protein Data Bank.
DeepMind and EMBL-EBI have added predictions for the structure of some 350,000 more widely distributed proteins in 20 different organisms, including animals like mice and fruit flies, and bacteria like E. coli.
The feat includes predictions for 98 percent of all human proteins, about 20,000 different structures, which go by the name of proteoma humano. It is not the first public data set for human proteins, but it is the most complete and accurate.
AI work
To achieve this spectacular result, DeepMind relied on its own machine learning tool (a branch of Artificial Intelligence) called AlphaFold, based on neural networks.
AlphaFold was instructed in DNA sequences and in the already known forms of tens of thousands of proteins, contained in a public access protein database hosted by the EBI-EMBL researchers.
The latest version of AlphaFold is underpinned by a new machine learning approach that incorporates physical and biological insights into protein structure, taking advantage of multiple sequence alignments.
Proteins are complex molecules that perform many tasks in the body, from making tissues to defending against disease. Its purpose is dictated by its structure, which folds like an origami sheet into complex and irregular shapes.
More accurate models
Understanding how a protein folds helps explain its function, which in turn enables scientists to perform a variety of tasks, from fundamental research on how the body works, to designing new drugs and treatments.
The journal Science considers the new approach so important that it compares it to the Crispr genetic scissors, which revolutionized genetic engineering.
The new development has led to structural biologists and other researchers already using AlphaFold to obtain more precise models of proteins that have been difficult or impossible to characterize with previous experimental methods.
According to Science, deciphering a single protein structure in a conventional way can take years and cost hundreds of thousands of dollars.
Computation issue
With the new programs, AlphaFold and RoseTTAFold, there is no longer a need for much effort: supported by AI, these programs train themselves with the help of databases in which protein structures that have already been investigated are stored.
Both programs can determine the structure into which a protein folds based only on the amino acids it contains. A historic milestone in biotechnology.
Proteins consist of long chains that form a kind of defined spiral and then perform certain tasks. And there are a total of about 20,000 genes in which the amino acid sequence of various proteins is determined.
If a researcher knows a specific gene, he can deduce the amino acid chain of a protein by working for decades
If you have previously worked for years to determine the structure of a protein, now it is enough to enter an amino acid sequence into a computer. Quite a milestone that Science now recognizes as the most important scientific achievement of the year that is ending.
www.informacion.es
Eddie is an Australian news reporter with over 9 years in the industry and has published on Forbes and tech crunch.