Due to new approaches to artificial intelligence (AI), the race of one of the greatest challenges of biology is strengthened – the prediction of 3D structures of proteins from their amino acid sequences.
At the end of last year, Google's AI DeepMind company introduced an algorithm called AlphaFold, which brought together two technicians who emerged in this field and defeated established candidates in competition with predicting the structure of proteins with a surprising limit. In April this year, an American researcher discovered an algorithm that uses a completely different approach. He claims that his AI is up to one million times faster in predicting structures than DeepMind, although it is probably not so accurate in all situations.
More broadly, biologists wonder how deep learning – the artificial intelligence technique used in both approaches – could be used to predict protein regimes that ultimately dictate the function of proteins. These approaches are cheaper and faster than existing laboratory techniques, such as X-ray crystallography, and knowledge can help researchers to better understand the disease and formulate medicines. "There's a lot of excitement about where things can go now," says John Moult, a biologist at the University of Maryland at College Park and founder of the two-year competition called the Critical Assessment of Structural Protein Forecast (CASP). to design computer programs that predict the structure of proteins from sequences.
An innovative approach
The latest creator of the algorithm, Mohammed AlQuraishi, a biologist at the Harvard Medical School in Boston, Massachusetts, has not yet directly compared the accuracy of his method with the AlphaFold method – and he suspects that AlphaFold will overcome his technique in accuracy when sequences of proteins are comparable to the analyzed ones available as a reference. However, it says that because its algorithm uses a mathematical function to calculate protein structures in one step – instead of in two steps, such as AlphaFold, which uses similar structures as a basis in the first step – it can predict structures in milliseconds instead of hours or days.
"AlQuraishi's approach is very promising. It builds on progress in in-depth learning and some new tricks that AlQuraishi invented, "says Ian Holmes, a computer biologist at the University of Berkeley. "Perhaps in the future, he will be able to connect his idea with others in order to make progress in this field," says Jinbo Xu, a computer scientist at the Toyota Institute of Technology in Chicago, Illinois, who competed at CASP13.
At the heart of AlQuraishi's system is the neural network, the type of algorithm inspired by the wiring of the brain that is learned from the examples. They are fed with known data on how amino acid sequences are mapped to protein structures and then learn to create new structures from unknown sequences. The new part of his network is in his ability to create such mappings from end to end; other systems use a neural network to predict certain properties of the structure, and then for another type of algorithm, which is difficult to search for a likely structure that includes these features. The AlQuraishi network lasts for months to train, but when it is trained, it can almost immediately convert the sequence into a structure.
His approach, called the repetitive geometric network, predicts the structure of one protein segment partly on the basis of what lies ahead and behind him. This is similar to how people can interpret the word in a sentence with surrounding words; these interpretations are influenced by the central word.
Technical problems have meant that the AlQuraishi algorithm was not successful with CASP13. He published details of AI v Cellular systems April1 and published his code publicly at GitHub, hoping to build others on the job. (Structures for most of the tested proteins in CASP13 have not yet been published, so they still could not directly compare their AlphaFold method.)
AlphaFold successfully competed on CASP13 and created a mix when it surpassed all other algorithms on hard targets by almost 15%, based on one measure.
AlphaFold works in two steps. Like other approaches used in the competition, it begins with something called several sequences. It compares the sequence of proteins with similar sequences in the database to reveal pairs of amino acids that do not lie side by side in the chain, but usually occur in a tandem. This suggests that these two amino acids in the folded protein are close to one another. DeepMind has learned a neural network to accept such mating and predict the distance between two pairs of amino acids in the folded protein.
By comparing his predictions with precisely measured distances in proteins, he learned to better guess how protein would collapse. A parallel neural network has predicted the angles of sequences between successive amino acids in the folded protein chain.
But these steps alone can not predict the structure, because the exact set of distances and predicted angles may not be physically possible. So in the second step AlphaFold created a physically possible – but almost random – folding arrangement for the sequence. Instead of the second neural network, it used an optimization method called a gradient descent to improve the iterative structure by approaching the (not completely possible) forecasts from the first step.
Some other teams used one of the approaches, but none of them. In the first step, most teams only anticipated contact in pairs of amino acids, not at a distance. In the second step, most are using complex optimization rules instead of lowering gradients, which is almost automatic.
"They did a great job. About a year before other groups, "says Xu.
DeepMind has not yet released all the details about AlphaFold – but other groups have since started to accept the tactics displayed by DeepMind and other leading teams at CASP13. Jianlin Cheng, a computer scientist at the University of Missouri in Columbia, says he will change his deep neural networks to have some features of AlphaFold, for example by adding more layers to the neural network in the distance prediction phase. Multiple layers – a deeper network – often allows networks to process information more deeply, which is why the name is deeply learning.
"We look forward to using similar systems," says Andrew Senior, a computer scientist at DeepMindu, who led the AlphaFold team.
Moult said that there was a lot of discussion on CASP13 about how else deep learning can be used to stack proteins. Maybe it could help improve forecasts of an approximate structure; reporting on how reliable the algorithm is in the stack prediction; or model protein interactions.
Although computer predictions are not yet sufficiently precise to be widely used in drug design, increasing accuracy allows other applications, such as understanding how a mutated protein contributes to the disease, or knowing which part of the protein is changing into a vaccine for immunotherapy. . "These models are starting to be used," says Moult.