Catastrophic Forgetting

Catastrophic forgetting (also: catastrophic interference) is a term, often used in connectionist literature, to describe a common problem with many traditional artificial neural network models. It refers to the catastrophic loss of previously learned responses, whenever an attempt is made to train the network with a single new (additional) response. Affected networks include, for example, backpropagation learning networks.

Catastrophic forgetting is a primary reason artificial neural networks are not able to continuously learn from their surroundings. Traditionally, such networks must be fully trained on a complete set of expected responses prior to being put into service, where learning must then be disabled.

Catastrophic forgetting is a specific sense of a more general phenomenon called, interference. Catastrophic forgetting is sometimes called catastrophic interference in connectionist literature. The normal form of interference, which is observed in natural learning systems, such as humans, tends to cause gradual losses. In artificial systems, however, the losses caused by interference are, well, catastrophic.

. . . . . . .
Training A Set vs Training A Single New Response

The reason training on an entire set differs from simply adding a single new response is that, during training, each response in the set is used to move the weights slightly on each training iteration. Training of the mappings proceeds in an interleaved fashion. That is, the entire set is cycled through multiple times during training, where each response is only slightly trained in each cycle. If a new response is simply added on to an existing set, it would, necessarily, be trained completely by itself, without the other responses having a chance to be reinforced. To put it another way...

This doesn't work.
  • Fully train response 1, then
  • Fully train response 2, then
  • . . .
  • When all have been fully trained,
  • Done (?)
This works!
  • Slightly train response 1, then
  • Slightly train response 2, then
  • . . .
  • When all slightly trained,
  • repeat until all are fully trained

. . . . . . .
Making Do

Historically, the normal way of dealing with catastrophic forgetting in artificial neural networks has been to simply train on a broad enough set of exemplar responses to begin with, and, once trained, use the network in non-learning (response-only) mode.

It is often the case, though, that an existing, fully-trained, network might need to have some new response-mapping added to its repertoire. These cases have generally been handled by starting over with a blank (completely untrained) network. The blank network would then be trained on a set of exemplars that included the entire original set, plus the new response mapping. This procedure is called "rehearsal."

Another possible way to resolve this problem might be to start with the existing weight-set, rather than to begin with a blank network. To train the new response, one could continue to cycle through the entire original set, plus the new response.

Another re-training procedure, known as "pseudorehearsal," has also been shown to provide very good results, without the need to store the original training set (see references below). Certainly, many studies of such options have been performed.

. . . . . . .
Multitemporal Synapses Provide a More General Solution

Relatively recently, a new learning structure and method called multi-temporal synapses has been developed. Among other things, multitemporal synapses eliminate the problems associated with catastrophic forgetting in artificial neural networks. The idea works by embracing forgetting as simply an inevitable, and even necessary, part of continuously adapting to present-moment details. Multitemporal synapses are able to learn, and continuously forget, at different rates. This, in turn, allows the system to continuously learn and adapt to each new present moment that happens along.

Slower (or permanent) weights in multitemporal synapses learn from their faster counterparts at the same connection-points. Because of this, they are able to be trained gradually, by many different present-moment experiences, over time.

The function of the fast-learning weights is to quickly learn to respond to each present moment as it is encountered, and to just as quickly forget those lessons when no longer needed. The slower weights absorb the various encounters in an interleaved fashion, and at a slow rate. They are, therefore, continuously, gradually, trained on a repertoire of multiple present moments as they pass. In the general case, the repertoire will be the most recently relevant sub-set of all present moments experienced.

default image for entry

. . . . . . .

Also: Multitemporal Synapses     Memory     Stability-Plasticity Problem


Web-based glossary software: (c) Creativyst, 2001-2014