Scientists at The University of Texas at Dallas are using machine learning to study proteins—the molecules that perform vital functions in life—in ways that impact protein engineering, human health and evolutionary tracking proteins associated with infectious diseases.
In the growing field of protein design, researchers examine the evolutionary history of proteins — how their structure and function have changed over time due to genetic mutations — and can use that information to make design new proteins for purposes such as fighting diseases or enabling biotechnology applications from new proteins not in nature.
A team led by Dr. Faruck Morcos, associate professor of biological sciences in the School of Natural Sciences and Mathematics, uses advanced computer techniques to create a 3D “landscape” that allows scientists to visualize what new proteins can do.
“This latent generative landscape represents an advance in the modeling of proteins and, together with the software we have published, an accessible tool for those seeking to create, engineer or study proteins and the their functions,” said computational biology doctoral student Cheyenne Ziegler MS, one of the lead authors of a paper published online April 19 at Communication in Nature describes the work. Morcos is the corresponding author of the study.
Proteins are made up of sequences of molecular building blocks called amino acids. Protein sequences give researchers clues to their functions in the body.
“Our new framework is like a road map,” Morcos said. “Instead of just analyzing existing protein sequences, we look at the evolution of proteins and create maps that look at existing proteins as well as generate and project potential ones.” order.”
Using variational autoencoders (VAE)—an unsupervised learning model that includes neural networks and coevolutionary modeling, an inference technique developed by the research team—Morcos said scientists can classify protein sequences by their evolutionary changes. and their specific functions, then create new sequences that are similar in composition, with a rating of their compatibility with the real-world function.
“Recent focus in the field has shifted to using machine learning methods to predict protein structures and understand protein sequence attributes.
Morcos and his team plotted the protein sequence data based on similar characteristics.
“The closer the proteins are to each other in this virtual landscape, the more they behave similarly,” he said. “The map indicates where we have a higher chance for a new protein to be functional—there are many possible mutations as proteins evolve, but very few are fit to exist.”
UTD researchers used mathematical methods to create peaks and valleys in the virtual landscape. These barriers represent sets of unlikely sequences that help separate groups of proteins in terms of their function or evolutionary trajectory, similar to how geographical boundaries can separate groups of animals that evolve differently from other areas.
Color-coding provides that third dimension of describing each coordinate. The proteins that are present are also included and concentrated in the dark areas.
“Is this protein fit to perform its function or not? How much does it look like a real protein? The dark blue regions are valleys of high stability, where most proteins appear like things that might exist. These sequences can be real proteins,” Morcos said. “Lighter colored regions are less explored and probably less suitable.”
Morcos said their system can also catalog proteins of unknown function in a process called annotation.
“The majority of protein sequences that exist do not yet have an annotation—a label that identifies a function or location,” he said. “We just don’t know what they do. That’s why scientists invest a lot of effort in accurately predicting the function of a protein. Our map is an effective way to determine the functions of a bag -ing protein by knowing what its neighbors are doing.”
Cheyenne Ziegler et al, Latent generative landscapes as maps of functional diversity in protein sequence space, Communication in Nature (2023). DOI: 10.1038/s41467-023-37958-z
Given by the University of Texas at Dallas
Citation: Biologists mapping method describes pathways to new proteins (2023, July 10) retrieved on July 11, 2023 from https://phys.org/news/2023-07-biologists- method-paths-proteins.html
This document is subject to copyright. Except for any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. Content is provided for informational purposes only.