Enhancing Plasticity and Mitigating Forgetting in Continual Learning

The recent developments in the field of continual learning have seen significant advancements in mitigating catastrophic forgetting and enhancing plasticity. Researchers are increasingly focusing on developing adaptive algorithms that can reset neuron weights when necessary, thereby maintaining the network's ability to learn new tasks. Techniques such as Self-Normalized Resets and Direction-Aware SHrinking are showing promise in preserving learned features while selectively forgetting noise. Additionally, the integration of deep Fourier features into neural networks is proving to be an effective strategy for balancing linearity and nonlinearity, leading to improved trainability over time. Theoretical frameworks are also being established to understand and overcome task confusion in class-incremental learning, with generative modeling emerging as a key solution. Furthermore, novel permutation-invariant learning frameworks based on high-dimensional particle filters are addressing the issue of permutation dependence in sequential learning. Patch-level data augmentation methods like CutMix are being theoretically validated for their ability to learn a broader range of features, enhancing overall performance. Lastly, efforts are being made to reduce the reliance on learning rate warmup in neural network training by analyzing and modifying optimizer behaviors to counteract large initial updates.

Sources

Self-Normalized Resets for Plasticity in Continual Learning

Plastic Learning with Deep Fourier Features

Task Confusion and Catastrophic Forgetting in Class-Incremental Learning: A Mathematical Framework for Discriminative and Generative Modelings

Permutation Invariant Learning with High-Dimensional Particle Filters

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity

Provable Benefit of Cutout and CutMix for Feature Learning

EXACFS -- A CIL Method to mitigate Catastrophic Forgetting

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Built with on top of