Neural Networks Backgrounder

Neural Networks & Robotics

<< UC Berkeley - Scientists use brain...

Goodbye, World - Dennis Ritchie... >>

Monday, October 3. 2011

in: Neural Networks

Neural Networks Backgrounder: Ce n'est pas une l'histoire

Overview

This article provides a layman's-level discussion of neural network technology within the framework of a sketchy historical sequence. Neural networks are described while presenting an overview of just one of the many routes taken by the field in the last half-century or so.

It is not for those interested in a full history of neural networks (i.e., connectionism). It is just a quick backgrounder, which should suffice to give readers a little bit of perspective into how we got from "there" to "here." The actual history of this field is storied, and sometimes even checkered and controversial. I highly recommend to anybody who is interested, that you get a good book or two on the subject.

This entry will also serve as a place to accumulate links to resources and information on the subject of neural networks and their history at this layman's level.

. . . . . . .
Contents

^[top]

The Heart Of A Deer

Early neural networks consisted of a single layer of perceptrons. Like their McCulloch-Pitts predecessors (1943), they had binary outputs, but now they had a new way to learn (Rosenblatt 1958). Their weights could be altered by a simple learning rule. These worked very well for basic classification and mapping problems, especially, where the solution was not going to be evident in the data-set that was available at design-time. You could train them to map a set of example training input patterns to your desired output patterns, and voila! You have a system that performs your pattern mapping. This may not seem that exciting in and of itself, but once trained, your network will perform mappings for similar patterns that were not included in the original training set. In other words, your network will generalize from the patterns it was trained on, to new, never-before-experienced patterns, and generate appropriate responses.

Further advancements came in Widrow and Hoff's Adaline (and later, Madaline [1]) networks. Eventually, the binary output limitations of earlier models also fell away. Those binary designs seemed to be merely an early misinterpretation of how action potentials are used within biological neurons to represent signals. Action potentials produced by neurons are all-or-nothing pulses, but the pulses themselves turned out to be shape-, phase- and frequency-modulated representations of analog levels.

These new developments, along with the intrinsic ability of neural networks to generalize, produced a lot of excitement about the new neural network field. As stated, the excitement came mainly from the fact that you could train a system to respond in a desired way, and the system would respond to previously untrained problems in a way that was similar.

^[top]

The Party Didn't Last. . .

These were great innovations, but the single layer networks of the time were shown, in a book by Minsky and Papert [2], to have some serious problems. They could not perform any mapping that included an exclusive-OR component (this, OR that, but NOT both). Since many pattern-responses in nature require such a mapping, this limitation was clear evidence that something new remained to be discovered.

It probably should be noted here (with cautions against conclusions about the motives of others) that some histories claim that Minsky and Papert's primary goal was to drive research funding away from neural networks, and into the field of computational artificial intelligence. If this was their motivation, they may have succeeded, but the connectionists weren't giving up.

^[top]

Multi-layer Perceptron (MLP) Networks

To solve the linear separability issues alluded to in the exclusive-OR problem, the connectionists realized that a network would need to be constructed with multiple layers of neurons. There would need to be output neurons that provided the desired results, as well as hidden neurons feeding them. The hidden layers would be taught to perform intermediate functions needed to solve exclusive-OR.

Output neurons could be taught in the same way they had always been taught, by determining the difference between the desired output and their actual output, then using that difference-value to make weight adjustments. The neurons in the "hidden" layer, however, presented a problem. If a neuron is "hidden" behind another neuron, how can you know what its desired output value should be?

When we know what the output value of a neuron should be we can easily produce an error value with which to adjust its weights. But how do we produce an error value for hidden layer neurons, where we don't know what the output should be?

Hidden Desire — The above diagram shows a neural network that can be trained to perform the exclusive-OR function (A or B, but not both). — — With the input-values shown for A and B, we know that the desired value on the “Output” neuron should be 100%. But what should the desired value be on the output of neuron Z?

You know what value you want to see on your output neuron, and can determine the difference between that desired value, and the value that it is currently producing. The difference between these two values is the error value that is used to make changes to the neuron's weights.

But what do you do if your neuron is not an output neuron? How can you determine what its error value should be if you don't know what its desired output value should be?

^[top]

Enter Paul Werbos: The Back-Propagation Learning Method

Though Minsky's exclusive-OR problem was what drove people to the realization that multiple layers would be needed, the problem of how to train those layers that were "hidden" simply drove people crazy. Then, along came a guy named Paul Werbos, who was able to assimilate and apply a new idea in control theory. One which he had learned directly from the guy who developed it, his professor, Dr. Yu Chi Ho. Werbos applied Ho's theories directly to the new discoveries and criticisms coming out of the "Ai" field. His idea would later come to be known as "The Back-Propagation Learning Algorithm."

Using Werbos's new algorithm, you could train the output neurons by calculating the difference between the desired and actual output-values, just as you always did. The new dimension added by backpropagation was that each output neuron would then propagate an error value back for use by the hidden neurons that were providing its inputs. In turn, the hidden neurons would use those back-propagated error values as a basis for their own error values, which they would use to make their own weight adjustments.

If there was yet another layer of neurons behind the first hidden layer, then the first hidden layer would use back-propagation again, in order to provide an error value for them. In this way, each layer would perform its calculations, and, based on those calculations, it would send an error-value back for use by the neurons that were contacting it.

I know what you're thinking. Werbos didn't invent back-propagation! Well, yes, in spite of some impressive, and sometimes shameless, efforts to marginalize his contribution, it seems he really did.

^[top]

Backpropagation In Feed-Forward MLPs Was Good

Just as McCulloch and Pitts produced a first renaissance around neural networks, Dr. Werbos's backpropagation algorithm created a second renaissance. The journals were once again a-buzz with various explanations and applications of neural networks trained using the back propagation algorithm. In these journals, you would have seen the networks promoted as having been discovered by other names, but the original idea was Werbos's.

Early on, there were many new questions raised that needed to be answered about the many cool and interesting things you could do with back-propagation neural networks. There was way to much to discover to worry much about the shortcomings.

Besides, unlike the exclusive-OR problem, backpropagation's shortcomings were obvious. You didn't need a rigorous analysis to see that they were only good for feed-forward networks. Why? Because of the nature of the backpropagation algorithm itself. In order to produce a backpropagation value to send back to pre-synaptic neurons, a neuron must first produce its own weight-adjustment value. To produce its own value, it has to use the backpropagation values that have already been calculated by the post-synaptic neurons with which it makes contact.

Round and round we go — In order to calculate the backpropagation value for a neuron, you need the backpropagation value for all the neurons it connects to. In order to calculate this value for A (δ_A) in the above example, you need the backpropagation value from B (δ_B). In order to calculate δ_B, you need the backpropagation value from A, But A’s value (δ_A) can’t be calculated until you calculate B’s value (δ_B).

Early on, backpropagation's inability to support feedback didn't seem to be a serious limitation. After all, you didn't need feedback to solve exclusive-OR. Even if there were found to be a need, in spite of the exclusive-OR mapping being solved, the dream was in place. If hidden layers of neurons could be trained, then some way to improve upon backpropagation to make it capable of handling feedback should be a snap. Most (including me) thought that it would be just a matter of adding some computational extensions to the existing back-propagation algorithm to make it able to handle feedback smoothly and intuitively.

^[top]

The Need For—and Lack of—Feedback

So the problem was easily understood. In order to calculate the neuron-level error value for the current hidden-layer neuron, it is necessary to have first calculated the same error value for each post-synaptic neuron to which the current neuron is connected. This is what restricts back propagation to feed-forward-only networks.

Feedback from an output neuron to an input (hidden layer) neuron. This is not permitted in back-propagation networks

Studies of biological neural networks have demonstrated that numerous and diverse signal feedback paths and mechanisms exist in biological organisms [3]. These mechanisms include direct signaling carried on afferent axons back through networks whose outputs, in turn, affect the efferent signal flows. Signal feedback in biological organisms also occurs through a variety of chemical signaling mechanisms carried directly through glial cells (those cells that support neurons in biological brains), and through the bloodstream from a variety of intra-organism sources. Finally, signal feedback mechanisms occur tacitly in biological organisms through the senses that carry afferent information about causal effects that efferent signals have had on outside world environments.

In biological organisms [4], the concepts of “internal” and ”external” may not represent absolute locations, but instead may allude to a continuum. Simplistically, afferent signaling may begin with senses of external world events caused by motor controls such as muscles. On the other hand, it may be caused by a chain of events that started within the brain and ended up as stimulation to the adrenal gland, which in turn produces an afferent chemical signal that causes the brain to retain more experiences. The later case shows that external signals can be generated by a feedback loop that is outside of the brain, but never leaves the organism.

Such loops may occur entirely inside the brain, or may just get out to the point of generating scent signals via the organism's own sweat glands, which are then sensed and have an afferent effect on the network. Much further out, a very complex chain of external events may be affected by the brain and produce effects that are then sensed by the brain via the organisms many senses. In this way, the brain can produce effects on, and correct for, external world events.

To summarize, feedback loops of signals originating in the brain and returning can remain inside the brain, go outside the brain but remain inside the organism, or include causal activities completely outside of the organism.

^[top]

Backpropagation Is A Black Box

Backpropagation is an abstraction that encapsulates the detailed mechanisms used in (inferred from) biological networks. It aims to mimic behavior observed in biological neurons in order to better respond to the external environment. Because backpropagation's abstraction of these underlying mechanisms encapsulates the details into its reverse error propagation calculation, it doesn't normally allow them to be broken out and used in combination. In other words, the details of how biological neural networks actually produce changes in their connection strengths are completely incorporated into a single, conceptual, black-box that is the backpropagation algorithm.

The advantage of this level of abstraction is that backpropagation fully encompasses and mimics the concepts of positive and negative reinforcement learning within its calculations. This frees the network designer from having to consider such details when designing a neural network. The disadvantage can, in some sense, be expressed by simply parroting the advantage. The designer must accept back propagation's abstract interpretation of the underlying concepts, and the observed behavior that it encapsulates. There is little (if any) flexibility in the details of how such mechanisms can be employed by the neural network to effect changes in its connection weights.

^[top]

Attempts to Retrofit Backpropagation

A variety of methods to work around back-propagation's feedback limitation have been tried with varying levels of success. The following provides example.

Back Propagation Through Time (BPTT) — This has been used to partially mitigate backpropagation's limitation on feedback. A simple means of applying standard back propagation in a neural network that would normally employ feedback (a recurrent neural network) is called "Back Propagation Through Time" or BPTT. It accommodates the use of the standard back propagation algorithms by unfolding the time sequence that would normally be present in a recurrent network in order to get around back-propagation's restriction on feedback. The essence of BPTT is that it unfolds the discrete-time recurrent neural network into a multilayer feed-forward neural network (FFNN) each time a sequence is processed. In effect, the FFNN has a separate hidden layer for each "time step" in the sequence. It should be noted that the feed-forward-only restriction of back propagation has not been overcome in BPTT. Instead a clever means of removing the feedback from a recurrent network has been implemented so that the back propagation learning algorithm can be used along with its inherent restriction.

Real Time Back propagation (RTBP) Permits Feedback But Only Of Output Neurons — Real time back propagation, or RTBP, permits feedback from output layer neurons to output layer neurons only. As shown in the above calculations, each of the output neurons’ error values are calculated directly by subtracting an expected (desired) output from the actual output of each neuron. Since there is no need to use the calculated error values from post-synaptic neurons in order to calculate these error values, this tightly restricted form of feedback for this one layer can be permitted in RTBP networks. There is still no way to produce the feedback of complex afferent signal flows seen in biological neural networks. Any feedback from the outputs of hidden neurons to themselves, or to neurons even further back would break the learning algorithm. This is because, for all but the output neurons, the back propagation learning algorithm requires that error values be calculated for all post-synaptic neurons before the error value for the current neuron can be calculated.

Alternatives to Back Propagation That Allow Feedback Will Usually Strictly Dictate Specific Network and Feedback Structures — Alternatives to back propagation neural networks have been conceived in an effort to produce networks with some rudimentary forms of feedback (Douglas Eck, University Of Montreal, Recurrent Neural Networks— A Brief Overview, 1-October-2007). These generally rely on very strictly prescribed feedback topologies and complex feedback functions to achieve networks with recurrent characteristics. While the recurrent character of the brain is abstractly emulated in these schemes, the ability to design neural networks with the complex and varied feedback structures observed in connected networks of biological neurons is poorly addressed or not addressed. Other forms of networks that permit strictly defined feedback will limit the network to a single layer, or to no more than two layers.

^[top]

Enter Influence Based Learning

Influence based learning, or just influence learning, permits neurons to learn from forward-connected (post-synaptic) neurons, without any reverse, sequentially-dependent calculation. One advantage of this (now patented) learning method over backpropagation and derivatives is that it permits neural networks to be constructed with arbitrarily complex signal feedback paths. Further advantages are discussed in the blog entries here that are tagged with Influence-Learning.

^[top]

Summary

Neural networks are parallel processing networks composed of artificial neurons connected together in a variety of topologies. Neural networks seek to mimic the way biological neural networks are constructed in order to perform computational functions that are similar to cognitive brain functions. One of the goals of this effort has been to provide a generalized adaptive (learning) method, which is capable of supporting artificial neural networks that include complex signal feedback.

Another goal, (though less prevalent historically) has been to provide a learning mechanism capable of continuously [5] adapting to its environment. Both of these goals have been realized in a new learning method called influence based learning, which is described in Netlab. In the bargain, the methods provided in Netlab will also work fine for more conventional artificial neural networks that do not employ signal feedback. The resultant capability allows designers to produce artificial neural networks that more closely mimic the complex feedback (as well as traditional feed-forward) mechanisms observed in studies of biological neural networks.

^[top]

Sources & Resources

[pdf] Recurrent Neural Networks— A Brief Overview (Eck)
[$/Amazon] The Roots of Backpropagation: From Ordered Derivatives to Neural Networks...
Werbos helps to get the history untangled. Important: I have not read this, so can not vouch for it beyond what I have read others say about it.
[pdf] A Theoretical Framework for Backpropagation
This paper may be important to those who study this field from a historical perspective. Of note: the date, editors, acknowledgments section, and the mention of Werbos's thesis in the second paragraph of the first section.
[pdf]Artificial Neural Networks of the Perceptron, Madaline, and Backpropagation Family
Backpropagation (quick reference)

=======
Notes:

Madaline networks were an early two-layer network architecture with one output unit combining the outputs of multiple parallel Adalines behind it. The combining function was typically (but not always) simply choosing whichever binary state was being output by the majority of Adalines. They also used a majority function to choose which single hidden Adaline to train in a given training cycle. They could be trained to solve the exclusive-OR problem.
Marvin Minsky, S Papert - "Perceptrons" - 1969
Levitan, Kaczmarek, "The Neuron,Cell And Molecular Biology," 2002, Oxford University Press, ISBN: 0-19-514523-2
Paraphrased from my book (Netlab loligo).
Not merely continual, but continuous. You will sometimes see this referred to in my writing as “constant” learning.

Posted by John R at 15:24 | Comments (0) | Trackbacks (3)

Defined tags for this entry: History, Influence-Learning, Neural-Networks

Trackbacks

Trackback specific URI for this entry

Biological Underpinnings of Influence Learning
The Netlab development effort has led to a new method and device that produces learning factors for pre-synaptic neurons. The need to provide learning factors for pre-synaptic neurons was first addressed by backpropagation (Werbos, 1974). The new method

Weblog: Loligo Blog
Tracked: Jan 07, 20:03

Influence Learning Gets A Patent
Woo HOOO! Influence Based Learning, one of two new learning methods described in the book Netlab Loligo, has just been awarded a United States Patent. The official title of the patent is: “Feedback-Toler

Weblog: Loligo Blog
Tracked: Jan 07, 20:17

AI - Time for a New Name?
It is known that almost all words are metaphorical at their base, and some people (e.g., me) posit that they all are. Some go even further, and say that the sub-language signaling in the brain, which eventually leads to language, is also metaphorical.

Weblog: Loligo Blog
Tracked: Jan 07, 20:20

Comments

Display comments as (Linear | Threaded)

No comments

Add Comment

Name
Email
Homepage
In reply to
Comment	Enclosing asterisks marks text as bold (word), underscore are made via _word_. Standard emoticons like :-) and ;-) are converted to images. E-Mail addresses will not be displayed and will only be used for E-Mail notifications. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: BBCode format allowed
	Remember Information? Subscribe to this entry

Neural Networks Backgrounder

Neural Networks & Robotics

Monday, October 3. 2011

Neural Networks Backgrounder: Ce n'est pas une l'histoire

Overview

The Heart Of A Deer

The Party Didn't Last. . .

Multi-layer Perceptron (MLP) Networks

Enter Paul Werbos: The Back-Propagation Learning Method

Backpropagation In Feed-Forward MLPs Was Good

The Need For—and Lack of—Feedback

Backpropagation Is A Black Box

Attempts to Retrofit Backpropagation

Enter Influence Based Learning

Summary

Sources & Resources

Calendar

Quicksearch

tags

Categories

Feeds

Archives

Blog Administration

Powered by