|
Hardware-realisable neural networks
What are neural networks? Neural networks are pattern recognition and control systems based on what we know about how human and animal brains acquire and manipulate knowledge about their environments. They are built up from interconnected objects which are simplified representations of biological neurons, having inputs from other neuron units that are modified in their effects by learnable 'weights'. They differ from conventional computer solutions in the way that knowledge is acquired (by a process of gradual learning rather than the one-step loading of a pre-crafted program) and in the way it is represented (as a pattern of numerical weight values spread over many of the neuron-like objects, rather than as an item stored in a very localised area of computer memory). Neural networks can provide solutions to problems which conventional programmed systems find very hard to deal with. These are typically ones for which a human, asked how they actually solved the problem themselves, might reply that they 'just know' how it is done, and are not able to provide any set of articulable rules (examples are face and handwriting recognition). These sorts of problems were found largely intractable by the rule-based AI systems of the 1970s, and this was one of the reasons for the fast-growing interest in neurally-inspired alternatives in the following decade. Why build neural network hardware? In the early days of neural computing, most work was in fact done with simple forms of 'neurocomputer', specialised electronic hardware built to act like a network of biological neurons. This was for the pragmatic reason that general-purpose, programmable computers - which might have been used to simulate these biologically inspired systems - were not widely available. However the explosion of interest in neural network technologies in the 1980s, which has led to today's applications in visual pattern recognition, speech/speaker recognition, financial prediction and many other areas, coincided with the growing availability of general purpose computers with ever- faster CPU speeds and ever-cheaper memory. Consequently most work since then has been done in the form of simulations running on PCs and workstations. However, implementing neural computing techniques in dedicated neurocomputer hardware still has a number of important advantages, providing self-contained, physically robust solutions for application areas where it might not be feasible to install a PC/workstation running neural network software – toys, for example, or autonomous robots for industrial and exploration uses. The pRAM model Most conventional neural network training procedures - used to develop the required behaviour in a learning system - have assumed that the 'weight' parameters in which the system's knowledge is stored can be positive or negative, and unboundedly large in size. These analog weights, and the algorithms by which they are adapted, are not well suited to hardware implementation. However there has been another tradition of neural network implementation which has used binary weights, with the 0/1 values stored in RAM memory blocks which themselves play the role of the 'neurons' in the system; this approach, sometimes called 'weightless neural computing', was pioneered by Aleksander and Stonham in the 1960s, and has since been further developed by Aleksander, Austin, Allinson, Bisset and other workers (mainly based in the UK). In its original form, this type of RAM- based neural network system used 'one-shot' learning procedures very different to the iterative ones of conventional neural networks. There have though been alternatives suggested which are more akin to conventional neural network training, such as those used to train the Probabilistic Logic Node (PLN) system of Aleksander and Myers, and the 'cubic nodes' of Gurney: the PLN uses not binary weights but three levels (0, 0.5, 1) in which the value of 0.5 means that an output of 0 or 1 can be expected with equal probability if that particular RAM location is addressed; cubic nodes are more finely quantised bipolar (generating signals of +1 or -1) RAM-based systems, which have been trained for example using the well-known error backpropagation procedure. The Probabilistic RAM (pRAM) model is a generalisation of the PLN which also uses probabilities as weights, but quantises these so finely (by storing the values in a 16-bit register) that they can effectively be treated as continuous in the range [0,1]. This allows a wide range of familiar neural network learning methods to be used (reinforcement, Hebbian learning, a Kohonen-style self-organising map, nonlinear PCA) in a hardware-implementable context, and also makes it possible to analyse the operation of the learning rules using continuum mathematics in order to discover their expected results without the need for extensive (and maybe unreliable) ad hoc testing. The way that stochastic behaviour is introduced into the pRAM model is akin to the 'synaptic noise' in real neural systems which is caused by the tendency of neural junctions to leak small quantities of neurotransmitter even when no signal is being passed (so that neurons may thereby sometimes fire spontaneously). Such synaptically noisy neurons were first described by Prof John G Taylor (King's College Department of Mathematics) in 1972, and this neural model was one of the underpinning ideas that led to the proposal of the pRAM model by Gorse and Taylor in 1988. In the early days of the pRAM it was intended to use the model, because of its greater biological realism than other stochastic neural models, mainly for neurobiological modelling, with electronic hardware possibly being introduced to speed up large-scale neural simulations. However with the involvement in the early 1990s of Prof Trevor G Clarkson, Dr Terry C K Ng (now with City University of Hong Kong) and other then-members of the King's College Department of Electronic and Electrical Engineering, notably Dr Chris C Christodoulou (Birkbeck College Department of Computer Science) and Dr Yelin Guan (ONI Systems Inc., San Jose), it quickly became apparent that pRAM-based technology could also be used for a large number of industrial applications. Theoretical work in the later 1990s also expanded the range of abilities of the pRAM, so that models were developed which were able to learn real-valued functions by manipulating the lengths of the spike trains used to represent them; the experimental work using this new model was largely carried out at UCL by Dr David A Romano-Critchley (now with ThinkingCAP Technology Ltd., London). Hardware implementation - the pRAM-256 chip An implementation of the pRAM model with true 'on-chip' learning was first proposed by Clarkson, Ng, Gorse and Taylor in 1992. This has subsequently been developed into the pRAM-256 system, in which 256 6-input pRAMs, with user-reconfigurable interconnections, are contained on a single chip. These chips can be connected together to produce systems with thousands of trainable 'silicon neurons'. pRAM applications pRAM systems have been shown to be effective in a wide range of tasks, for example speech/speaker recognition, automatic target recognition, ATM connection admission control, time series prediction, neurocontrol problems such as the classic 'inverted pendulum' scenario. The bit-stream communication between pRAM neurons, rather than being a hindrance to the system when learning, is actively beneficial in promoting generalisation - other systems have to introduce such a 'blurring' of the input (so that in effect a wider range of patterns are seen during training) in a much more artifical way. When running the system in classifier/predictor mode, after the training has taken place, one simply selects to use longer spike trains to represent the variables, so that under these circumstances noise in the system is reduced. Many pRAM applications have exploited the positive use of noise in this way; it is possible that the brain also uses this in those regions where there is a high level of synatic leakage, and where spontaneous firing is therefore common.
|