A programmer who is obsessed with giving experimenters
a better environment for developing biologically-guided
neural network designs. Author of
an introductory book on the subject titled:
"Netlab Loligo: New Approaches to Neural Network
Simulation". BOOK REVIEWERS ARE NEEDED!
Can you help?
The Netlab development effort has led to a new method and device that produces learning factors for pre-synaptic neurons. The need to provide learning factors for pre-synaptic neurons was first addressed by backpropagation (Werbos, 1974). The new method differs from backpropagation in that its use is not restricted to feed-forward only networks. This new learning algorithm and method, called Influence Learning, is described here and in other entries in this blog (see Resources section below) .
Influence Learning is based on a simple conjecture. It assumes that those forward neurons that are exercising the most influence over responses to the immediate situation will be more attractive to pre-synaptic neurons. That is, for the purpose of forming or strengthening connections, active pre-synaptic neurons will be most attracted to forward neurons that are exercising the most influence.
Perhaps the most relevant thing to understand about this process is that these determinations are based entirely on activities taking place while signals (stimuli) are propagating through the network. Unlike backpropagation, there is no need for an externally generated error signal to be pushed through the network, in backwards order, and in ever diminishing magnitudes.
Support In Biological Observations
While influence learning in artificial neural network simulations is new, it is based on biological observations and underpinnings from discoveries made over twenty years ago. One of the biological observations that led to the above speculation about attraction to the exercise of influence was discussed briefly in the book The Neuron: Cell and Molecular Biology.
An experiment described in that book shows what happens when you cut (or pharmacologically block) the axon of a target neuron. In that experiment the pre-synaptic connections to the target neuron began to retract after its axon was cut. That is, the axons making presynaptic connections to the modified neuron went away when it no longer made synaptic connections to its own post-synaptic neurons.
The book also described how, when the target neuron’s axon was unblocked (or grew back), the axons from presynaptic neurons immediately began to reform and re-establish connections with the target. Based on these observations, the following possibility was asserted.
"...Maintenance of presynaptic inputs may depend on a post-synaptic factor that is transported from the terminal back toward the soma."
The following diagram depicts these observations schematically.
A set of constructs and methods introduced and described in the book: Netlab Loligo will improve the ability of systems constructed with them to adapt to current short-term situations, and learn from those short-term experiences over the long term.
A New Learning Theory That Predicts A “Present Moment”
How do we, as biological organisms, manage to keep so much finely detailed information in our brains about how to respond to any given situation? That is, how do we manage to keep countless tiny intricacies stored away in our “subconscious” ready to be called upon at just the right time, right when we need them in the present moment?
According to this theory of learning, the answer to that question is: We don't.
Instead, our long term connections—those that immediately drive our responses at all times—are only concerned with getting us started in any given “present.” Responses stored in long-term connections start us along a trajectory that makes it easier for us to learn whatever short-term, detailed responses are needed for any given detailed situation.
Connections that drive short-term responses, on the other hand, form spontaneously in-the-moment, and quickly adapt to whatever present situation we currently find ourselves in. Just as significantly, connections driving short-term responses tend to dissipate as quickly as they form. This theory essentially says that each connection in the brain that drives responses (physical or internal) includes multiple distinct connection strengths, which each increase and decrease at different rates of speed.
How It's Done
Multi-temporality is achieved in Netlab's simulation environment by providing multiple weights per a connection point (i.e., synapse), which are referred to as Multitemporal[Note 1] synapses. Multitemporal synapses employ multiple weights. Each of the multiple weights associated with a given synapse represents a connection strength, and can be set to acquire and retain its strength at a different rate from the others. The methods also specify Weight-To-Weight Learning, which is a means of teaching a given weight in the set of multiple weights, using the value of other weights from the same connection. Together these constructs provide all the functionality required to model the theory of learning discussed above.
Following is a graphic excerpted from the book: Netlab Loligo, which shows a neuron containing three different weights for each connection point. Each weight is given its own learning algorithms, with its own learning-rate, and forget-rate.
Influence Based Learning, one of two new learning methods described in the book Netlab Loligo, has just been awarded a United States Patent. The official title of the patent is:
“Feedback-Tolerant Method And Device Producing Weight-Adjustment Factors For Pre-Synaptic Neurons In Artificial Neural Networks”
The title is a mouthful, primarily designed to help future patent searchers determine if their great idea has already been discovered and patented. It is fully described and discussed in the book, where it is simply referred to as Influence Learning.
As the patent-title expresses, one of the benefits it imparts over existing learning algorithms, is that it is feedback-tolerant. It will work fine with the current-day feed-forward networks configured as "slabs", but it also allows connecting neurons to pre-synaptic neurons as well. That is, it allows feedback, which means you don't have to configure your network with "hidden layers" anymore if you don't want to. You are free to use any connectome you'd like.
Influence learning is one of two new learning algorithms that have emerged (so far) from the Netlab development effort. This blog entry contains a brief overview describing how it works, and some of the advantages it brings to the task of neural network weight-adjustment.
How It Works
This learning method is based on the notion that—like their collective counterparts—neurons may be attracted to, and occasionally repulsed by, the exercise of influence by others. In the case of neurons, the "others" would be other neurons. As simple as that notion sounds, it produces a learning method with a number of interesting benefits and advantages over the current crop of learning algorithms.
A neuron using influence learning is not nosy, and does not concern itself with how its post-synaptic (forward) neurons are learning. It simply trusts that their job is to learn, and that they are doing their job. In other words, a given neuron fully expects, and assumes that other neurons within the system are learning. Each one treats post-synaptic neurons that are exercising the most influence as role models for adjusting connection-strengths. The norm is for neurons to see influential forward neurons as positive role models, but neurons may also see influential forward neurons as negative role models.
It Is Simple
As you might guess, the first benefit is simplicity. The method does not try to hide a lack of new ideas behind a wall of new computational complexity. It is a simple, new, method based on a simple, almost axiomatic, observation, and it can be implemented with relatively little computational power.
It Imposes No Restrictions On Feedback
Influence Learning is completely free of feedback restrictions. That is, network connection-structures may be designed with any type, or amount of feedback looping. The learning mechanism will continue to be able to properly adapt connection-strengths regardless of how complex the feedback scheme is. The types of feedback designers are free to employ include servo feedback, which places the outside world (or some network structure that is closer to the outside world) directly in the signaling feedback path.
This type of "servo-feedback" is shown graphically in figure 6-5 of the book, which has been re-produced here.
As a programmer I find it very satisfying when a phony false choice is taken down. Chris Chatham, who maintains Developing Intelligence blog looks like he's hot on the trail of one.
Here's a cool visualization from the article used to clarify the local-to-distributed data:**
He provides a very good explanation for the apparent disagreement in the experimental data. His conclusion? The two aren't mutually exclusive. (thank you Mr. Chatham)
So, how does this work? Is the brain just big enough to accommodate two different mechanisms? Possibly, but Chatham also explores a distinct possibility that the same underlying mechanisms may be responsible for both types of development. It turns out there is a bit of good reason to think it is the latter.
= = = = = = = = = = =
Notes:
** Okay, I don't know how much clarity it brings, but it is a slick visualization, therefor, it makes it into the post.
The following excerpts from Chapter 4 of the book ("Our Metaphysical Tool Shed") may help to clarify the point of this post.
Anti-Razor1
It would probably be sufficient to simply express the weighted result as a ratio, but for now that's just one option. As our understanding grows, we may find that our maintaining two separate sums and many of the other values, is the metaphorical equivalent of gluing feathers to the wings of a flying machine. . .
. . .
1. This sub-heading is a reference to Ockham's razor, which is almost always mis-characterized in common usage. William of Ockham's original advice is based on sound, logical, reasoning, while the common mis-characterization is essentially a fashion statement. Here, however, I argue that programmers must give experimenters the ability to define their own (real) razors, and so should not mandate them in the modeling tools we provide. That is, we should give experimenters more, and let them decide for themselves how to divide and conquer those “more” into “fewer”.
Many of the ANN modeling tools available today are merely paint-by-number kits, which allow experimenters to try out solutions that other people have worked out and documented in formulaic recipes. Netlab is decidedly NOT among these offerings.
The known neural network formulas, for example, each represent somebody else's abstraction, and reduction of the observation data. The function produced is essentially a workable, defined, recipe for creating an effect that is similar to observed behaviors. In short, the person who came up with the formula in the first place was the one who did all the heavy lifting.
Experimenters should be given tools that let them find and test their own theories, and their own ways to pair down and represent what they think is most essential about what's going on in the problem space.
Netlab
To state it simply, Netlab is built on the proposition that experimenters should be able to try out their own ideas, and not merely find new ways to use other people's ideas.
One very good way for a programmer to achieve such a goal in his design, is to reverse engineer the existing formulas. That is, for each existing formula, produce a simulation environment that would allow somebody to create the formula for the first time, were it not already known. This is one of the design philosophies underlying Netlab's software specification.
Some Practical Examples
To this point it has been a rather abstract post. What follows is mostly a list of links into the glossary (I think). Now that the I.P. protection is starting to come through, I'll be able to describe this stuff more openly, and more deeply. As new documentation becomes available I'll either try to update this section with links, or copy this section into a more complete discussion of the practical mechanisms provided for achieving the simple goal described above.
Chemicals
One of the abstractions provided by Netlab is the notion of chemicals. Neurons, beside producing output values on their axons based on a variety conditions (direct and indirect), are also capable of producing chemicals. The chemicals, much like the axon-level, can be specified to be produced in various concentration-levels based on a variety of environmental factors. The factors that lead to the production of a chemical are specified by the designer and can include stimulus on the neuron's synapses by other axons, other chemical influences in the vicinity of the neuron, or globally present chemicals (among many other factors). To specify a new chemical the designer simply comes up with a name for it.
No characteristics are explicitly specified for a given designer-named chemical. The properties and characteristics of any given named chemical are purely a biproduct of how other objects in the environment (usually other neurons) have been specified to respond/react to them. Responses to any given chemical can be different for different individual instances of a neuron, or for different classes of a neuron (this is simplified, "neuron" is really "object" and can include other super-types besides neurons),
Spacial and Temporal Distances
Netlab's description language provides a way to modularize the design and construction of neural networks. It allows experimenters to produce components, called "units", which contain other components, such as neurons and previously designed units.
The modular construct used to overcome complexity at design time, is preserved in the Netlab run-time, giving Netlab's networks an abstraction of volumetric space, which can be used as a framework when representing both spacial and temporal phenomena.
Pathfinding
Once you have a viable abstraction for representing temporal and volumetric chemical gradients, you can then begin to define all kinds of useful mechanisms based on the influence (e.g., repulsive or attractive) your chemical concentrations have on them. For example, like all neural network packages, Netlab includes the traditional Latent Connections. These are the static connections you determine at design time. Whether or not they actually develop is based on changes to their strengths, but the connection will always be between the same two neurons, which were specified at design time.
Netlab is able to go farther, allowing designers to specify Receptor Pads which are areas on a neuron's synapse-space that can make connections with any other neuron in the network while it is running. Other neurons put out something called a growth cone. Among other things, dynamic connections are facilitated at run time based on affinities or aversions to named chemicals specified for both the growth-cones, and the receptor pads. This allows, for example, a given class of growth cone to be defined to "desire", "seek-out", and "find" its perfect receptor pad, dynamically, as the network runs. It is good to mention again here, that chemicals influencing these movement decisions by growth-cones are also being produced dynamically, and their concentrations will be based on factors that are products of the running network within its dynamic environment.
No Limits On Feedback
A new patented learning method has been developed that completely eliminates past restrictions placed on types or amounts of feedback employed in the structure of the network. Beside just being great for general feedback that may want to span multiple local loops at times, this is also very nice for servo-structures, which put the outside world directly in the feedback path. This allows for representing the outside world as a the correction-factors that must be adapted, in order to correct for outside forces. In essence, the world's complex and chaotic stimuli become a transfer function that sits directly in the network's feedback loop. The network, then performs the function of correcting for inconsistencies in the feedback function, which it can then learn.
A recent USC study applies a new technique that allows researches to more closely map the brain's wiring. One goal of the study is to better clarify our current understanding of the connection-structure of brains. Also, to try and settle the raging "It's an internetwork" / "It's a hierarchical pyramid" debate.
The Netlab abstraction is designed to facilitate a similar, but slightly different concept of brain wiring-structure, which is visually depicted in the cover art of the book:
The above diagram should be seen as a cross-section through a sphere, so the word "donut" in this entry-title takes some license. The interior/exterior connection-model, as depicted, does seem to find—at least passing— observational support in the USC study, e.g.,:
"The circuits showed up as patterns of circular loops, suggesting that at least in this part of the rat brain, the wiring diagram looks like a distributed network."