What is the best way to demonstrate catastrophic forgetting in 1D? Is 1D an oversimplification for this phenomenon that's seen in billion-dimensional neural networks? I am not sure. But here's what I came up with:

Two disjoint domains. We train on domain A first, then train on domain B with the same learning rate. Think of these domains as distinct skills we would want neural networks to learn without forgetting past skills.

Sequential Learning

Two training runs

Note how quickly the model degrades on the first domain when we start training on the second. It takes just a single iteration. This is why it's called catastrophic.

What happens when we train the same neural network on both domains in a single training run?

Joint Training

One training run