Key Word Search
by term...

by definition...

for letter: "M"
Results
Machine Consciousness
____
Magnetic Resonance Imaging
____
Maria Goeppert-Mayer
____
Marr, David
____
Mathematical Symbols
____
Max Planck
____
MC
____
Memories
____
Memory
____
Memory Acquisition Time
____
Memristor
____
Mentions
____
Michael Faraday
____
MLP
____
Modulate
____
Modulation
____
Monatomic
____
MRI
____
mRNA
____
Mullis, Kary B.
____
Multilayer Perceptron
____
Multiphoton Microscopy
____
Multitemporal Connection
____
Multitemporal Connection Point
____
Multitemporal Synapse
____
Multi-Weight Connection




Unless otherwise indicated, all glossary content is:
(C) Copyright 2008-2022
Dominic John Repici
~ ALL RIGHTS RESERVED ~
No part of this Content may be copied without the express written permision of Dominic John Repici.






























 



 
Multitemporal Synapse

 
Netlab™ introduces a new form of synapse (based on multitemporal connection-points) called a multitemporal synapse, it is sometimes hyphenated as multi-temporal synapse.

It fully solves the stability-plasticity problem, and eliminates catastrophic forgetting in networks utilizing any training algorithm. It achieves this by providing multiple connection-weights per a connection, where each weight within the connection can be set to learn and forget at a different rate from the others. That is, a separate learning method and forget process can be specified for each of the multiple weights used to make up a single given connection between two objects (see diagram below). During signal propagation each input signal is modulated by the combined values of all the connection-weights in its synapse.

Each connection consists of the following three novel characteristics:
  • Each connection between two entities consists of multiple, distinct, connection-weights,

  • Each of the weights associated with a single connection can learn and forget at a different rate, and

  • Connection-strengths (represented by weights) can learn from other weights at the same connection.

The following three sections provide a little more detail about these characteristics:


Each connection between two entities consists of multiple, distinct, connection-weights

By employing multiple connection-strengths (represented by weights), each having distinct acquisition and retention times, individual synapses are able to strengthen, and continuously forget, at multiple different rates. This, in turn, allows the system to continuously learn and adapt to each new present-moment that happens along. The idea works by embracing forgetting as simply an inevitable—and even necessary—part of continuously adapting to present-moment details.


Each of the weights associated with a single connection can learn and forget at a different rate

Slower-learning (or permanent) weights learn from their faster counterparts at the same connection-points (see below). Because of this, they are able to be trained gradually, over time, by many different present-moment experiences.

The fast-learning weights, on the other hand, quickly learn to respond to each present moment as it is encountered and, just as quickly, forget those lessons when they are no longer needed.


Connection-weights can learn from other weights at the same connection

Any weight value can be adjusted by any means, such as influence learning or classic back-propagation. That said, a learning algorithm called weight-to-weight learning has been developed specifically for multitemporal synapses. This learning method uses one or more of the multiple weights associated with a connection to produce a learning factor, which is then used to train another weight at the same connection-point (e.g., in the same synapse).


This allows, for example, a connection to be specified with a long-term (slow-learning) connection-strength, which will learn directly from a fast-forming connection-strength at the same synapse. The fast-forming connection quickly converges on transient responses, and just as quickly forgets them. The long-term weights gradually absorb training from the fast-forming weights each time the fast weights converge on a new (to them) response. To the long-term, slower learning weights, these quickly learned weight-sets can be thought of as flash-cards, holding the proper response to the current situation. Just as traditional ANNs are taught, gradually, on a set of exemplars, so too, are the slower weights in multi-temporal connection-points.


In this fashion, slower weights are able to absorb the various encounters that come and go on the fast weights in an interleaved fashion, and at a slow rate. They are, therefore, continuously, gradually, trained on an interleaved repertoire of multiple present-moments as those present moments pass. This is very similar to how the experiences would be trained in a traditional ANN, In the general case, however, the repertoire-sequence seen, and slowly trained, by slow weights will be the most recently relevant sub-set of all present moments experienced. audio playback bars with slower falling bars on top




. . . . . . .
Discussion


Multi-temporal connections intrinsically provide two capabilities to a running neural network that completely overcome, and solve the stability-plasticity problem, as well as the problem of catastrophic interference, thus, allowing for a network that is able to continuously learn while it interacts with its milieu.
  1. On the response side, short-term weights allow the network to quickly form precise behavioral responses to situations that are novel in details, but have general similarities to, situations experienced in the past.

  2. On the learning side, short-term weights also intrinsically process ongoing experiences in such a way as to provide an interleaved set of exemplars, in a flash-card-like fashion, to the slow weights, which can continuously train them in real time.

More detail on each of these follows:

1. The additional short-term weights allow the network to quickly form fine-grained responses to situations that are novel in details, but broadly similar to, situations that have been experienced in the past. — At first glance this may sound no different than traditional ANNs, but in fact, it allows long-term weights to be free of the clutter normally required to store fine-grained response details. The kind of information that makes it (slowly) into the slowly adapting (long-term) weights can best be thought of as a set of vague beginnings of responses to stimuli, which were learned from stable values that formed on the fast weights, many times over the life of the network. Because of their incomplete nature, many of these responses can be maintained in long-term connection-strengths with little interference. During signal propagation, the long-term weights act as a prompting agent, immediately starting a network response that is a similar —but residual— representation of responses to many similar situations that had been experienced in the past. The short-term weights start from zero at each new encounter (simplified). They will be prompted, by responses stored in long-term weights, in a direction that has a good chance of being nearly the correct way to start responding.

2. The additional short-term weights provide a continual presentation of training exemplars to the long-term weights, in an interleaved, flash-card-like fashion. — This allows for the slow weights to be continuously trained in the course of the network's normal interactions with its milieu. Because fast-adapting weights quickly learn a detailed response to the current situation, and just as quickly decay back down to zero, they have the effect of presenting multiple training exemplars –in real time-- to be slightly learned by the long-term weights. This allows new experiences to be added to the slow weights, while existing response-vectors are reinforced by situations that are similar to previously learned experiences.






. . . . . . .
Fast learning/forgetting weights will re-learn very quickly


In the absence of any new experience, the residual connection-strengths maintained in the fast weights quickly decay to zero. Typically, after only a minute or less with no input, the fast weights will be totally blank. No residual connections—left over from past experiences—will remain.

Also, keep in mind that the weight-to-weight learning in this scenario is one-way, from fast to slow weights only. That is, slow weights learn from the values on fast weights when they exist, but fast weights are not directly affected by values stored in slow weights. Nonetheless, in spite of starting with a totally blank slate for each experience, the fast weights will learn and adapt much faster when the present-moment is similar to past experiences. This occurs tacitly, even though the slow-learning weights have no direct effect on the values of the fast weights.

Two underlying mechanisms are primarily responsible for this behavior.
  1. The Hand-Over-Hand Effect
    . . . . .
    Consider, first, that each time the fast weights converge on an experience, a small portion of the converged state of the fast weights is transferred into the slow-learning weights at the same connection-points via weight-to-weight learning.

    To the fast weights, which start out blank and quickly ramp up, every situation is always new, and never before seen. Their job is to quickly adapt to it, and they are able to do so with prompting from past residual responses maintained in the slower-learning weights. The responses caused by the slower weights, on the other hand, serve only to guide the fast weights with gentle prompting in the form of starting-responses. This is what facilitates quick adaptation of the fast weights to the current experience.

    Note that the slow-learning connections, are, paradoxically, fast-responding. That is, they will immediately produce the blunt, anoetic beginnings of a correct response the instant similar stimuli first arrive (spread over many connection-points). When stimulated, they immediately convert a given set of familiar stimuli to a starting response signal.

    To the fast-learning connection-weights, the residual responses driven by the slow weights is like a teacher, using hand-over-hand prompting of the student along a correct path. This, in turn, facilitates much faster re-learning. Though the prompting from slow-weight-facilitated responses is tentative, and incomplete, it is complete enough to allow the blank-slate of the fast weights to quickly adapt to the new situation.

    Another way to think of the information in the long-term weights is that it provides response mnemonics, which helps the short term weights quickly adapt to immediate details.


  2. Freedom from Catastrophic Interference Effects
    . . . . .
    Another consideration that contributes to fast acquisition times on the fast weights is that they do not have to be concerned with over-writing or interfering with previously learned lessons. They are wiped clean very quickly once the need to respond to a given experience ends. They only need to quickly learn whatever response is currently before them, without concern for overwriting past experiences, so the learning-rate can be set to be very fast.

    Interference is mitigated in slow weights in a fashion that resembles how it is mitigated in traditional ANNs during their training stage. In traditional ANNs this is accomplished during training by employing a slow learning rate, which moves weight values only slightly on each interleaved presentation of an exemple stimulus/response in the training set. With multitemporal synapses, however, this process can occur continuously in slow weights, while the organism interacts with its environment via its fast weights.





. . . . . . .
The Benefits Are Significant


Multitemporal synapses impart a variety of benefits. The problem of catastrophic forgetting, for example, is eliminated because, to slower (or permanent) weights, each "present-moment" is seen as one of many training-exemplars, in a randomly repeating, interleaved, set of exemplars. That is, fast-learning weights respond to each new present-moment as a blank neural network, which is quickly, and completely trained to respond as a single, isolated, response mapping. Because present-moments are quickly forgotten in fast weights, many different present-moments are learned, and presented to the slower connection-weights over time, allowing all the present-moments to be gradually trained as a complete, interleaved, set.

Structurally, this is indistinguishable to how conventional networks must be trained on a complete set of desired response-mappings. The conventional practice is for each desired response in the set to be trained gradually, over many successive iterations of the entire set.

There are some important differences, however.

In conventionally trained neural networks, while the set of all training exemplars may be shuffled between each training-cycle, the stimuli-response mappings themselves are always identical, every time they are presented. For example, conventionally, whenever stimulus/response-#1 is seen in the random training rotation, it will be identical to how it looked the last time it was presented.

In multitemporal synapses, however, when stimulus/response-#1 comes up a second time in the rotation, it will be slightly different than it was the first time it was presented. This is because, while similar, no two present-moments experienced in the wild will be exactly the same. Nor will the appropriate response.




~~~~~~~
Diagram


The following diagram is taken from Figure 5.1 in the book. It shows the structure of multitemporal synapses schematically.



In the above diagram, a neuron is depicted with three weights, representing three distinct connection strengths, per a synapse. Each synapse weight can have its own rate of learning and its own (optional) forget rate. Each synapse-weight thus becomes a class of weight that is replicated over multiple input synapses. Conceptually, this can be visualized as forming synapse-weight layers across the multiple inputs.




~~~~~~~
Underlying Biology


When considering the underlying biology that led to these algorithms, the name "multitemporal synapse" may be a bit misleading. It is probably a much better fit to refer to the underlying concept as "multitemporal connection points."

In biological brains, there are a variety of known mechanisms that facilitate timing differences in the formation and decay of connections. Many of these have been observed at the level of individual synapse. Many others can be based on connection formation/retention characteristics between different neurons, and different neuron types. Relatively longer-term connections may be formed (i.e., more slowly) based on pathfinding, or on the formation of new synapses from an existing axon.

To summarize, in biological brains, there are many known mechanisms, which are responsible for differences in the amount of time it takes to form, and retain, a connection between any two entities. Even based on the limitations of what we currently know, only a small sample has been presented here.





~~~~~~~~~~~
An Important Distinction: Propagation vs Adaptation (reactive vs adaptive signaling)


As someone who enjoys learning about Neural Networks, for the longest time I naturally conflated signal propagation (the signals produced by neurons as they responded to a given situation) with adaptive signaling (the changes occurring to connection strengths that change how neurons react to those propagated reactive input signals).

It took me a while to realize the distinction between these two things, and the necessity for them to be considered separately. I've noticed this tendency to not separate these two types of Neural-Network state-changes (and changers) in others who are passionate about ANNS as well. There are times when these two agents are very closely related. In Influence Learning for example, adaptive influences are created as reactive signals are propagated. Even in this case, and perhaps especially in this case, a firm grasp of the separate and distinct natures of the two mechanisms is needed to fully analyze how the two mechanisms are relating to each other as the Neural Network is interacting with, and adapting to, its milieu.

In multitemporal connections, the temporality is on the adaptive side (not the reactive side). That is, the temporality is in how quickly or slowly the connections are formed in response to the needs of the current situation (the reactive signaling) and how quickly or slowly those connections are lost once the instigating situation has subsided.




~~~~~~~~~~~
Spit-balling


I'm not sure about this, but sometimes it makes sense to use an existing well-understood concept as a way to metaphorically relate, and better understand, a new one. Of course, that requires that the metaphor be apt, and tightly track the new concept. So . . . queue theory seems to be a good metaphorical match to temporal adaptation. It remains to be seen if that assumption holds up.
  • FILO: Quickly adapts/learns, slowly forgets. (fast-learning, slow-forgetting connection-strengths)
  • FIFO: Quickly adapts/learns, quickly forgets. (fast-learning, fast-forgetting connection-strengths AKA fast weights)
  • LIFO: Slowly adapts/learns, quickly forgets. (slow-learning, fast-forgetting connection-strengths. Useful?)
  • LILO: Slowly adapts/learns, slowly forgets. (slow-learning, slow-forgetting connection-strengths AKA slower weights)





. . . . . . .
Some History


As a non-academic person I tend to have a great deal of trouble in establishing priority. These patents have been the latest attempt to address that concern. Sadly, they have not worked as well as hoped.
  • 22-March-2007 — Patent application #11/689,676 filed (as a CIP to an earlier application).
  • 08-March-2011 — Patent # 7,904,398 awarded





. . . . . . .
Resources



. . . . . . .
Plagiarism















Also: Weight-to-Weight Learning     Influence Learning     catastrophic forgetting

 
 


































Web-based glossary software: (c) Creativyst, 2001-2022