Catastrophic Forgetting in 1D
What is the best way to demonstrate catastrophic forgetting in 1D? Is 1D an oversimplification for this phenomenon that's seen in billion-dimensional neural networks? I am not sure. But here's what I came up with:
Two disjoint domains. We train on domain A first, then train on domain B with the same learning rate. Think of these domains as distinct skills we would want neural networks to learn without forgetting past skills.
Sequential Learning
| Two training runs | |
|---|---|
Note how quickly the model degrades on the first domain when we start training on the second. It takes just a single iteration. This is why it's called catastrophic.
What happens when we train the same neural network on both domains in a single training run?
Joint Training
| One training run | |
|---|---|