Chaperone-based methods for global minimisation
of protein energy functions


The protein folding problem

One very notable problem in global minimisation is that of predicting the 3-dimensional structure of a protein molecule, and thereby hopefully also its function, from its amino acid sequence alone. A DNA sequence arising from one of the many genome sequencing projects can be translated fairly easily into an amino acid sequence but the subsequent translation of this into a 3-dimensional protein structure -- which has been called by some 'the second half of the genetic code' -- is a much more intractable problem. The correct mapping between 1-D protein sequence and 3-D molecule is very difficult to find, as it requires the traversal of an energy landscape of enormous complexity, with very many traps (local minima) that prevent a straightforward descent procedure from finding the optimal solution. The most successful methods at present for determining the structure of an unknown protein rely on finding similarities between its amino acid sequence and those of other proteins whose structures are known (similarities perhaps arising from their having had a common ancestor at some point in the evolutionary past); this type of modelling will of course be successful only where sequence similarities can be found to proteins whose structures have been previously determined, and structure determination by experimental means remains a slow process. Moreover within these modelling-by-analogy procedures it is still necessary to minimise a function which describes the energy of the structure (to 'relax' a model which has been put together from structural building blocks derived from related proteins) and it is believed that even here it may be possible for the structure to get trapped in a local energy minimum and the process to thereby fail. So for the forseeable future it does appear that the classical protein folding problem, that of minimising a very complex function describing the protein's internal energy as a function of the amino acid chain geometry, will remain of substantial relevance.

Chaperone-assisted folding in nature

It has sometimes been said that 'Nature doesn't have a protein folding problem, only we do'. But this is not wholly true as it appears there are a substantial number of proteins -- for the bacterium E. coli estimated at 10-30% -- which cannot fold correctly without the presence of other molecules, referred to as chaperones, which in some way modify the environment of these proteins to make their finding their correct 3-dimensional structure easier. The most studied and best understood of the molecular chaperone systems is the GroEL/GroES complex of E. coli. This system provides an enclosed environment for folding to take place, which is helpful in that it prevents partly-folded proteins from incorrectly sticking to each other. However the system is now believed to be additionally actively targeting for pulling apart those regions of protein surface which can be recognised as incorrectly formed, and also to be then providing a modified physico-chemical environment that promotes correct folding. Could the principles used by molecular chaperones be extracted and used to form the basis of more effective computational techniques for predicting protein structures?

Computational models of chaperone-assisted folding

These models, which using a range of simplified protein models as their testbed, provided both a protective environment for protein folding and one which like the GroEL/GroES system could play an active role in unravelling incorrectly formed structure and directing folding along more fruitful pathways. Unlike the work of most previous modellers these were 'off-lattice' models (representing the range of movements of the protein chain by smoothly-varying parameters, as in nature). Emphasis was placed on the active role of the chaperone in temporarily modifying the energy surface to promote easier access to the global minimum; however unlike the author's earlier ERA method for neural network training the chaperone-based folding methods did not rely on general assumptions about the smoothness of the energy surface. The type of surface transformation was instead specifically tailored to the protein folding scenario, which is believed to be largely driven by the preference to bury certain types of amino acids (referred to as hydrophobic) within the interior of the structure and to expose others (referred to conversely as hydrophilic) to the surrounding aqueous environment. The fundamental effect is of a pull/push mechanism acting differentially over multiple unfolding/refolding cycles, resulting in low-energy structures in which hydrophobic monomers tend to end up within the interior of the structure, and hydrophilic ones at the exterior -- in other words, solutions are selected to have both low energies and structurally preferred properties, not as in generic (smoothing-type) global minimisation procedures, low energy values alone.