Advances in Neural Network Generalization and Optimization

The field of neural networks is moving towards a deeper understanding of generalization and optimization. Researchers are exploring new methods to improve the efficiency and effectiveness of neural network training, including techniques to accelerate generalization and avoid delayed generalization phenomena. The use of embedding transfer, gradient transformation, and optimizer choice are being investigated as means to improve training dynamics and model performance. Notably, the connection between parameter magnitudes and Hessian eigenspaces is being studied, providing insights into the structure of the loss landscape. Additionally, novel dropout methods and combinatorial theories of dropout are being proposed to improve model generalization and robustness.

Some noteworthy papers in this area include: Let Me Grok for You, which proposes a method to accelerate grokking in neural networks by transferring embeddings from a weaker model. A Combinatorial Theory of Dropout, which provides a unified foundation for understanding dropout and suggests new directions for mask-guided regularization and subnetwork optimization. NeuralGrok, which proposes a novel gradient-based approach to accelerate generalization in transformers. How Effective Can Dropout Be in Multiple Instance Learning, which explores the effectiveness of dropout in multiple instance learning and proposes a novel MIL-specific dropout method. Muon Optimizer Accelerates Grokking, which investigates the impact of different optimizers on the grokking phenomenon and shows that the Muon optimizer significantly accelerates the onset of grokking.

Sources

Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model

Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods

A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization

How Effective Can Dropout Be in Multiple Instance Learning ?

Muon Optimizer Accelerates Grokking

An Effective Gram Matrix Characterizes Generalization in Deep Networks

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

The effects of Hessian eigenvalue spectral density type on the applicability of Hessian analysis to generalization capability assessment of neural networks

Built with on top of