Efficiency and Performance Advances in NAS and Pruning

The recent advancements in neural architecture search (NAS) and neural network pruning are significantly pushing the boundaries of efficiency and performance in deep learning. NAS techniques are increasingly leveraging prior knowledge and lower-dimensional projections to enhance search efficiency and accuracy, moving away from computationally expensive exhaustive searches. This shift is enabling more fine-grained control over architecture design while reducing the computational burden. On the pruning front, iterative and block-wise approaches are being developed to handle the scalability issues associated with large models, offering a balance between performance and computational cost. These methods are particularly effective for large language models and vision models, demonstrating superior performance over existing pruning techniques. Additionally, novel approaches in architecture synthesis are emerging, focusing on tailored architectures that optimize for multiple quality and efficiency metrics, thereby advancing the state-of-the-art in model hybridization. Notably, some papers have introduced innovative techniques such as knowledge-aware evolutionary GNAS and iterative Combinatorial Brain Surgeon, which are setting new benchmarks in their respective domains.

Sources

Delta-NAS: Difference of Architecture Encoding for Predictor-based Evolutionary Neural Architecture Search

Knowledge-aware Evolutionary Graph Neural Architecture Search

Scalable iterative pruning of large language and vision models using block coordinate descent

STAR: Synthesis of Tailored Architectures

Training Noise Token Pruning

Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework

Built with on top of