Introduction
Bioinformatics and computational biology play an important role in conducting biological research simultaneously with an increase in data obtained as a result of experiments. The solution to the problems of bioscientification is based on various fields of computer science, including algorithms, graph theory, computer modeling, parallel computing methods, pattern recognition, and visualization. Visualization is necessary for a researcher to understand the space and geometry of the object being studied. The dynamics of biological structures complicates the task, for example, protein folding is a complex dynamic structure. Efficient computational models are required to study the folding process. The simulation is an effective tool for studying complex biosystems. In the field of protein folding, bioscientific models should take into account modeling of force fields, energy, and entropy. Dynamics is considered as a problem of minimizing entropy. These are accurate and effective methodologies for studying shapes and dynamics in the protein folding process. A computing paradigm is needed for digital interfaces in the study of biological phenomena, such as the time steps of protein folding, which focus on form and dynamics. The relationship between experimental data and computational models of biosystems is important because biological systems include complex phenomena. For example, structures representing a polypeptide chain and a generalized protein chain, along with their complex degrees of freedom.
Datasets are defined as experimental data and computational models. The initial process involves fitting the experimental data access to mathematical and computational models. The next step involves using a model for forecasting the behavior of the system. For example, creating accurate computational models and principles of folding using pattern matching and force field calculations. By manipulating models interactively, giving them the desired shapes, and then by transmitting this information, the computer can provide a powerful tool for initial conformational tuning. It can also be used to study step-by-step changes in conformations. This is an important topic that has implications for medicine and drug development in addition to understanding the fundamental rules of protein folding.
The purpose of the work is (1) the development of a new and effective method that increases the speed and accuracy of protein modeling mobility and prediction of conformational transitions. (2) Creation of a complex scientific and technical solutions in the field of development software tools for predicting functional mobility and dynamic docking of proteins. (3) Description of a mathematical model of the allowable transition between conformations and the function of the required energy for modeling conformational mobility of the protein. (4) A prototype of a protein simulation software module that implements both the developed algorithms as well as necessary auxiliary functions (data input/output, data storage, etc.).
Literature review
The first protein for which the amino acid sequence could be determined was insulin. The first proteins for which a three-dimensional structure has been determined are myoglobin and hemoglobin. After that, protein folding became one of the main issues of concern to scientists. How exactly does a protein determine the specific structure it will fold into? It was believed that since this structure is operational, it is, accordingly, the most stable. However, how does a squirrel find it? It was assumed that the protein does not fold into the most stable of all structures but into a rapidly accessible conformation selected during evolution (1).
Later, it was shown that there are fast folding paths, and parallel paths accelerate the folding process (2). Thus, the Levinthal paradox was solved in 1997, and it became conceptually clear how the protein folds, but this did not help predict its structure. The three-dimensional structure of proteins, the spatial arrangement of their atoms, and the computational design of proteins were presented in (3). It has been experimentally proven that the three-dimensional structure of a protein is determined solely by its amino acid sequence in the body. If the amino acid sequence is known, the working three-dimensional structure of proteins can be quickly and efficiently predicted. Since then, several different approaches have been presented on how this can be done. However, none of them, unfortunately, gave 100% accuracy in such predictions. That is, it was believed that the amino acid sequence uniquely determines the three-dimensional structure of a protein molecule, although in many cases folding does not occur by itself. More specifically, it occurs with the support of “chaperones and a translocation complex for membrane and secreted proteins”.
The mechanism of folding is still not understood (4), although many theoretical calculations (5) and experiments (6–8) constantly add new information to the knowledge of this complex process. It was described in (9) that lymphotactin (a small protein of the chemokine family) stimulates the chemotaxis of T-lymphocytes. Under physiological conditions, it forms two native conformations at once, which quickly merge into each other. In this case, one of the conformations is arranged similarly to most other chemokines — it is a three-lobed β-sheet with the alpha helix at the C-end of the molecule. Moreover, the other one folds into a four-lobed β-sheet (a previously unknown type of folding) and functions in the form of a dimer (unlike the first one). In this case, each form of the molecule performs its own biochemical function necessary for chemotaxis.
Proteins often have several functional forms structurally close to one native structure (10). However, it was found that the physiological form of lymphotactin is formed by two approximately equally probable but at the same time, completely structurally different conformations. At the same time, they dynamically merge into one another. Chemokines (or chemotactic cytokines), to which lymphotactin belongs, are secreted proteins of the vascular wall and immune cells. They activate neutrophils and monocytes and attract them to the inflammatory site. The mechanism of action of chemokines is binding to glycosaminoglycans of vascular endothelial cells. As well as activation of conjugated receptors on the surface of the leukocyte membrane. Together, these mechanisms ensure the immobilization of leukocytes and their migration through the walls of capillaries to the site of inflammation. Lymphotactin is the only member of the C-chemokine subfamily, the total number of which reaches 50.
Scientists began working on the task of calculating a protein with an amino acid sequence that had not yet been observed in nature on a computer back in the 1980s and 1990s. Significant successes began to be achieved in the late 1990s and early 2000s (11–15). This was followed by attempts to simulate the life of a protein using a computer. It was necessary to see how it would collapse, but the speed of computers is not as fast as the flow of similar processes in nature. Therefore, it took a very long time to wait for the protein to curdle. However, in 2010 the first papers appeared where very, very small proteins were still folded, so we can say that some progress has been made in this area. In this case, external conditions determine only whether the protein will fold into a three-dimensional structure or prefer to remain unfolded. It is clear that there are exceptions, but they only confirm the rule. For example, there are proteins that have more than one final configuration (working structure). That is, in this case, some of the molecules fold into one configuration, some into another, and switching between them is usually due to external factors. There are quite a few such proteins. In 2008, there were about a dozen of them, and now there are probably about a hundred.
In the book “Chance and Necessity,” it is written that the emergence of life is based on the ability of proteins to recognize other molecules, including other proteins, by their shape (16). There are a huge number of proteins in nature, and they are all different. If we consider proteins of the enzyme class, for example, alcohol dehydrogenases (a protein that breaks down ethyl alcohol), then under favorable external conditions it collapses by itself. The external conditions only determine whether it will collapse or remain deployed. That is, no other molecules are required for the formation of the structure of this protein, which it must recognize. However, there are other proteins belonging to the class of natively deployed ones. This means that they are in an expanded state until a partner is found who can stabilize their three-dimensional structure. In addition, if there is such a partner, then interacting with him, it is more profitable for them to curl up, which they do. This class of proteins includes, for example, almost all ribosomal proteins, which, in the presence of a ribosome, fold on it. Because the interaction with the ribosome makes their structure work, stabilizing it. In the absence of these interactions, they generally prefer to be in an expanded form. However, when partners appear, they recognize them and prefer to be in a collapsed form in a complex with these partners. In addition, these are not isolated cases. It depends on what counts as partners. If we include, for example, water molecules, it turns out that all proteins need such a partner in order to find their three-dimensional structure. However, you can consider water not as a partner, but simply as a solvent, and focus on more specific things, such as the interaction of proteins with metal ions. The structures of some proteins are stabilized by interaction with metal ions, for example, zinc, iron, etc. In this case, they already act as partners. There are also proteins whose three-dimensional structure is stabilized by some more complex molecular formations. Such proteins include, for example, hemoglobin and myoglobin, the structure of which is stabilized by heme. At the same time, the average protein size is about 200–400 amino acids per chain (depending on the organism). In addition, there are thousands of proteins in the human body. They do almost all the work in living organisms. This work consists in interacting with some other atomic and molecular formations, often with nucleic acids. There are about 20,000 genes encoding different proteins in the human body. Thus, there are 20,000 types of proteins in our body. If we draw an analogy with a factory, then this is 20,000 professions. Each protein has a unique amino acid sequence and folds into a three-dimensional structure with a rather rigid shape. This allows the protein to do its intended job. Any living organism, in fact, is a huge conveyor belt or conveyor with thousands of workers—proteins that carry out all chemical reactions and other vital processes in the body. The amino acid sequence determines the structure of a protein, and the structure determines its function. New data on various variants of different proteins have been obtained, and their stability has been determined (17). There is also an AlphaFold program, but it also does not solve all the problems related to protein folding (18, 19). In fact, this process is associated with many complex issues, and the prediction of a three-dimensional protein structure—this is an extremely important task, but it is far from the only one in this area.
In this paper, a model of conformational motion was proposed based on an approximate representation of a protein molecule. As well as the process of its movement between two spatial configurations. Such movements of molecules occur over relatively long (approximately milliseconds) time intervals. An interactive modeling system has been developed based on the functional assignment of the models.
Conformational movement of protein molecules
Functional models are described using quadrics and functions of deviation from quadrics (20):
where q′ (x,y,z) is the functional surface; q(x,y,z) is the quadric; i = 1…N is the number of deviation functions; di (x,y,z) is the deviation function,
where qi(x,y,z) is a perturbing function of the second order.
Set-theoretic operations are used to create complex objects. Another method of defining surfaces using free-form patches is also used (21).
A protein molecule is a polymer consisting of amino acids. Since the conformational mobility of the protein is provided by a change in the structure of the main chain, only the atoms belonging to it are considered. This conformation model, which ignores the atoms of the side chains of the protein, refers to approximate models. The conformation x = (x1,…xn) is defined by the Cartesian coordinates of the atoms of the main chain of the protein. The conformation x as a set of three vectors is expressed as follows:
where the vector of bond lengths is , is the vector of planar angles, and is the vector of torsion angles.
Where
∥⋅∥ is the Euclidean norm of the vector.
As a result, torsion angles can be explicitly operated on, leaving the bond lengths and planar angles unchanged. This is in good agreement with the biochemical laws according to which the mobility of the main chain of a protein is determined by a change in torsion angles. Two conformations, x0 and x1, of the same protein are considered:
Conformation (3) is valid relatively x0 if and . If (9) and (10) be valid relative to each other, then the permissible movement of the protein molecule between conformations (9) and (10) will be the Lipschitz function γ [0,1]→ℜ3n, such that γ(t) it is acceptable relatively x0 for any t ∈ [0,1]. The function γ(t) at time t characterizes the position of all n atoms of the main chain of the protein in question. The permissibility of motion means the constancy of bond lengths and planar angles during movement. The problem of obtaining the conformational movement of a protein between specified conformations (9) and (10) is solved by minimizing the functional:
Here li is the length of the path traveled by the i-th atom of the main chain when moving from x0 to function-based model (FBM) (x1) (1), mi is the mass of the atom, p ≥ 1 is the model parameter. Minimization is performed according to all possible permissible relative x0 movements and FBM (1) movements. Problem (11) is solved numerically by discretizing the motion model. The desired allowable motion is represented as a sequence of M conformations. Of which the initial and final are determined by the given conformations x0 and x1. The function looks like this:
Where is the solution of the task.
Figure 1 shows the general scheme of the computing pipeline.
The protein is presented at various levels of detail depending on the requirements of the application. The most detailed level is the polyatomic model, which fully describes the structure of the protein, determining the coordinates of all the atoms in the molecule. For analysis of conformational flexibility, it is necessary to compare two protein structures in order to determine their structural similarity. The standard deviation is used. The structures of proteins are determined in various conformations. Analysis of a protein in two different conformations determines conformational flexibility. Then there are the genuine structural changes. For each position where all fragments of one structure are compared with another, the standard deviation list is calculated for each substructure (Figure 2).
Next, several streams are started, each of which blocks the calculation queue, removes one input parameter from the calculation queue, releases the calculation queue, starts computing fragments, and then adds the results.
The result of each calculation is a list of matching fragments. The order of the fragment pairs does not matter to the algorithm. Therefore, each calculation can be performed not only independently but also in any order.
Results or finding
R testing was performed on a computer with an Intel Core i5-2500K processor and a GeForce GTX 970 graphics processor. Figure 3 shows the program window. Figure 4 shows the protein conformation process. Table 1 shows the transformation data of four proteins.
The interactive tool environment greatly simplifies the process of creating and editing functionally specified objects. For visualization, a ray-tracing algorithm is used, which efficiently searches for points on surfaces involved in image formation. Algorithms and C++ classes for interactive geometric modeling were developed and implemented: C++ classes of functionally specified objects; C++ classes of rendering functionally specified objects; and C++ classes of the interface of the interactive geometric modeling system with the possibility of a simple mechanism for extending new algorithms and characteristics. These hierarchical classes form the core of the package, which can be extended to add functionality or change characteristics.
Discussion
A comparative analysis of the described method was carried out with the following methods for modeling conformational mobility of proteins: with the elastic network model (22), with the method of linear interpolation of Cartesian coordinates (23), and with the method of combined interpolation of Cartesian coordinates and torsion angles with minimizing the energies of intermediate conformations (24). The access numbers in the Protein Data Bank and the model numbers set the initial and final conformations. The authors of (22) investigated the structure and function of three families of motor proteins: kinesins, myosins, and F1-ATPases. To do this, we used a version of a simple model of protein movements based on an elastic network. Another paper describes MovieMaker, a web server that allows generating short (about 10 seconds) downloadable films about protein movements (23). The authors of the article (24) described improvements to the MolMovDB database of molecular movements. Some of the innovations are (a) development of the Morph server. It has become better at performing interpolation between two submitted structures. (b) Multi-circuit support. This made it possible to analyze the movements of the subunits. (c) The possibility of using FRODA interpolation. It allowed the creation of more complex pathways, potentially overcoming steric barriers. To compare the methods, transformations between pairs of conformations of four proteins were selected (see Table 1). Standard deviations in the table are given for alpha carbon atoms of the main chain of the protein. The transformations obtained for these proteins by the methods (22) and (23) are characterized by a distorted geometry of the molecule (violations of bond lengths and planar angles). The results obtained by the method (24) and the proposed method are free from this disadvantage and are generally close to each other.
Results
Standard deviation as an indicator of similarity finds application in structural bioinformatics. The presented algorithm automates the analysis necessary to determine the correspondence between two protein structures. All calculations are independent, and the time to remove parameters from the queue and add results is insignificant compared to the time spent on fragments of calculations.
Figure 5 shows the acceleration obtained on the GPU.
The abscissa axis shows the number of streams, and the ordinate axis shows the time in seconds.
Conclusion
Information technologies are becoming increasingly important in all fields of science, including in the study of biochemical processes occurring in a living cell and in the pharmaceutical industry. In the development of new medicinal compounds, algorithms for computer modeling of intermolecular interactions in biologically significant complexes of receptor proteins with low molecular weight compounds, their ligands, are used. Such interactions underlie most biochemical processes responsible for signal transmission, intercellular recognition, reception, and many others.
A program has been developed for modeling the conformational movements of protein molecules based on the principle of mass transfer and an approximate representation of protein. Unlike the methods of molecular dynamics, the considered method is designed to simulate movements that occur over relatively long (approximately milliseconds) time intervals. The tasks of selecting and modifying objects, selecting an object (or several objects) in a scene were also solved. Affine MRS transformations (move, rotate, scale), geometric operations, and deformation were implemented. In addition, you can work with a list of tools, write to a file and upload a scene tree, create new tags and recognize them. The geometric model allows you to design objects and their compositions of unlimited complexity. This is achieved primarily by using Boolean union and intersection operations. The deformation consists in the possibility of adding a disturbance to any point on the surface with the parameters set by the tool. The tool sets the scope and type of disturbance. To do this, information is graphically provided to the user (including highlighting an object, for example, with a color, or using a bounding box, drawing axes, etc.). Alternative research in this area is exploring how changes in protein stability can be predicted due to the substitution of a single amino acid in its amino acid sequence. Strangely enough, there is still no method to predict what will happen in the case of such amino acid substitutions. Because if you make such substitutions randomly, then for one protein consisting of an average of 200 amino acid residues, about five to six substitutions will be enough to get a fifty percent chance that it will stop working. By replacing only the amino acid found in the same protein, but in other organisms, it turns out that up to 70% of all amino acid residues in the protein can be changed and it will continue to perform its function. In the future, it is planned to study the stability of proteins and study their structure and evolution. It would be good if a computer program were developed in which some information about the necessary functions of a protein could be entered, and then, by conditionally pressing a button, the amino acid sequence of such a protein could be obtained.
Author contributions
Vyatkin S. I. — made a significant contribution to the writing of the article, collected and analyzed the information, interpreted the results He conducted an experiment, compiled a literary review, wrote the text of the article, compiled a table and a digital image. Dolgovesov B. S. — final approval of the version for publication.
Funding
This work was supported by the Ministry of Science and Education of the Russian Federation, project no. 124041700102-4.
Conflict of interest
The authors declare that they have no conflicts of interest.
References
1. Martínez L. Introducing the Levinthal’s protein folding paradox and its solution. J Chem Educ. (2014) 91(11):1918–23.
2. Finkelstein AV, Badretdinov AYa. Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold Design. (1997) 2:115–21.
4. Dill KA, Ozkan SB, Weikl TW, Chodera JD, Voelz VA. The protein folding problem: when will it be solved? Curr Opin Struct Biol. (2007) 17:342–6.
5. Rose GD, Fleming PJ, Banavar JR, Maritan A. A backbone-based theory of protein folding. Proc Nat Acad Sci. (2006) 103:16623–33.
6. Serganov A, Yuan Y-R, Pikovskaya O, Polonskaia A, Malinina L, Phan AT et al. Structural basis for discriminative regulation of gene expression by Adenine- and Guanine-sensing mRNAs. Chem Biol. (2004) 11:1729–41.
7. Greenleaf WJ, Frieda KL, Foster DA, Woodside MT, Block SM. Direct observation of hierarchical folding in single riboswitch aptamers. Science. (2008) 319:630–3.
8. Hu X, Jincheng Z, Zhao Y, Zhang H, Wang Q, Ge B , et al. Direct observation and real-time tracking of an extraordinarily stable folding intermediate in mitotic arrest deficient protein 2 folding by single-molecule fluorescence resonance energy transfer. J Phys Chem Lett. (2023) 14(3):763–9.
9. Tuinstra RL, Peterson FC, Kutlesa S, Elgin ES, Kron MA, Volkman BF. Interconversion between two unrelated protein folds in the lymphotactin native state. PNAS. (2008) 105:5057–62.
10. Fersht A. From the first protein structures to our current knowledge of protein folding: delights and scepticisms. Nat Rev Mol Cell Biol. (2008) 9(8):650–4.
11. Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE , et al. De novo design of protein structure and function with RFdiffusion. Nature. (2023) 620:1089–100.
12. Finkelstein AV, Badretdin AJ, Galzitskaya OV, Ivankov DN, Bogatyreva NS, Garbuzynskiy SO. There and back again: two views on the protein folding puzzle. Phys Life Rev. (2017) 21:56–71.
13. Garbuzynskiy SO, Ivankov DN, Bogatyreva NS, Finkelstein AV. Golden triangle for protein folding rates. Proc Nat Acad Sci. (2013) 110:147–50.
14. Krieger E, Darden T, Nabuurs SB, Finkelstein A, Vriend G. Making optimal use of empirical energy functions: force? field parameterization in crystal space. Proteins. (2004) 57:678–83.
15. Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco KW, Baker D, Finkelstein AV. Contact order revisited: influence of protein size on the folding rate. Prot Sci. (2003) 12:2057–62.
16. Monod J. Le Hasard et la Necessite : Essai sur la philosophie naturelle de la biologie moderne. Paris: Editions du Seuil. coll. <Points Essais > (1970). 256 p.
17. Lam AYW, Tsuboyama K, Tadakuma H, Tomari Y. DNAJA2 and Hero11 mediate similar conformational extension and aggregation suppression of TDP-43. RNA. (2024) 30(11):1422–1436.
18. AlphaFold 3. Google DeepMind and Isomorphic Labs Introduce AlphaFold 3 AI Model. (2024). Archive May 9 2024.
19. Finkelstein AV, Ivankov DN. Structure identification by AlphaFold: a physics-based prediction or recognition using huge databases? J Mol Biol. (2024) 6(1):1–10.
20. Vyatkin S, Dolgovesov B. Compression of geometric data with the use of perturbation functions. Optoelectron Instrum Data Process. (2018) 54(4):334–9.
21. Vyatkin SI, Dolgovesov BS. Method of anisotropic deformation of elastic materials based on free-form patches. BOHR Int J Biocomput Nano Technol. (2023) 2(1):8–13.
22. Zheng W, Doniach S. A comparative study of motorprotein motions by using a simple elastic-network model. Proc Nat Acad Sci. (2003) 100(23):13253–8.
23. Maiti R, Van Domselaar GH, Wishart DS. MovieMaker: a web server for rapid rendering of protein motions and interactions. Nucleic Acids Res. (2005) 33(2):W358–62.
24. Flores S, Echols N, Milburn D, Hespenheide B, Keating K, Lu J , et al. The database of macromolecular motions: new features added at the decade mark. Nucleic Acids Res. (2006) 34(1):D296–301.
© The Author(s). 2025 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.





