Chaperone-based methods for global minimisation of protein energy functions

  • The protein folding problem
  • Chaperone-assisted folding in nature
  • Computational models of chaperone-assisted folding


Publications


The protein folding problem

The most notable - or maybe notorious - problem we face of global minimisation is that of predicting the 3-dimensional structure of a protein molecule from its amino acid sequence alone. The structure of a protein, the twists and turns of the amino acid chain as it folds up into a compact, often roughly globular shape, is what determines its biological function. A DNA sequence arising from one of the many genome sequencing projects can be translated fairly easily into an amino acid sequence (modulo some problems in finding automatically where a gene starts and stops, and in some cases the necessary excision of non-coding regions), but the subsequent translation of this into a 3-dimensional protein structure - which has been called by some 'the second half of the genetic code' - is a much more intractable problem. Even though it is generally believed that it is the amino acid sequence which determines the structure, the correct mapping between 1-D protein sequence and 3-D molecule is very difficult to find, as it requires the traversal of an energy landscape of enormous complexity, with very many traps (local minima) that prevent a straightforward descent procedure from finding the optimal solution. The most successful methods at present for determining the structure of an unknown protein rely on finding similarities between its amino acid sequence and those of other proteins whose structures are known (similarities perhaps arising from their having had a common ancestor at some point in the evolutionary past); this type of modelling will of course be successful only where sequence similarities can be found to proteins whose structures have been previously determined. Although it is now being said that we will have examples of all the typical protein folds within the next 10-15 years, structure determination by experimental means is still a slow process. Moreover at a crucial point in these modelling-by-analogy procedures it is still necessary to minimise a function which describes the energy of the structure (to 'relax' a model which has been put together from structural building blocks derived from related proteins) and it is believed that even here it may be possible for the structure to get trapped in a local energy minimum and the process to thereby fail. So for the forseeable future it does appear that the classical protein folding problem, that of minimising a very complex function describing the protein's internal energy as a function of the amino acid chain geometry, will remain of great relevance.

Chaperone-assisted folding in nature

It has sometimes been said that 'Nature doesn't have a protein folding problem, only we do'. However this is not wholly true; it appears that there are a substantial number of proteins (for the bacterium E. coli estimated at 10-30%) which cannot fold correctly without the presence of other molecules, referred to as chaperones, which in some way modify the environment of these proteins to make their finding their correct 3-dimensional structure easier. The most studied and best understood of the molecular chaperone systems is the GroEL/GroES complex of E. coli. GroEL is a large molecule which has a double-ring structure, either end being in turn capable of enclosing a folding protein, and with these open ends when in use being normally 'capped' by another, smaller, protein, GroES. The combined system thus can provide an enclosed environment for folding to take place, which in itself is helpful, by preventing partly-folded proteins from incorrectly sticking to each other. However the GroEL/GroES system is additionally now believed to be actively targeting for pulling apart those regions of protein surface which can be recognised as incorrectly formed, and also to be then providing a modified physico-chemical environment that promotes correct folding. Given that we do have a (computational) protein folding problem at the moment, what can we learn from the solution that nature has found? In particular, could the underlying principles used by molecular chaperones be extracted and used to form the basis of more effective computational techniques?

Computational models of chaperone-assisted folding

The chaperone models developed here at UCL, using a range of simplified protein models as their testbed, provide both a protective environment for protein folding and one which like the GroEL/GroES system plays an active role in unravelling incorrectly formed structure and directing folding along more fruitful pathways. Unlike the work of most previous modellers these are 'off-lattice' models (representing the range of movements of the protein chain by smoothly-varying parameters, as in nature). They emphasise the active role of the chaperone in temporarily modifying the energy surface to promote easier access to the global minimum, and in this can be seen to be part of the family of hypersurface deformation methods for global minimisation, which includes smoothing methods like the author's earlier ERA method for neural network training . However the chaperone-based folding methods do not rely on general assumptions about the smoothness of the energy surface; the type of surface transformation is instead specifically tailored to the protein folding scenario, which is believed to be largely driven by the preference to bury certain types of amino acids (referred to as hydrophobic) within the interior of the structure and to expose others (referred to conversely as hydrophilic) to the surrounding aqueous environment. The fundamental effect is of a pull/push mechanism acting differentially over multiple unfolding/refolding cycles, resulting in low-energy structures in which hydrophobic monomers tend to end up within the interior of the structure, hydrophilic ones at the exterior - in other words, solutions are selected to have both low energies and structurally preferred properties, not as in generic (smoothing-type) global minimisation procedures, low energy values alone.


Publications