Efficient Neural Network Designs for Edge Computing

The recent advancements in edge AI and neuromorphic computing are significantly reshaping the landscape of low-latency and low-power applications, particularly in AR/VR and IoT devices. There is a notable trend towards hybrid models that combine convolutional neural networks (CNNs) with transformers (ViTs), which are being optimized for heterogeneous computing environments to achieve superior performance and energy efficiency. These hybrid models are being tailored for specific hardware architectures, such as Neural Processing Units (NPUs) and Compute-In-Memory (CIM) systems, through innovative Neural Architecture Search (NAS) frameworks. Additionally, there is a growing focus on mixed-signal neuromorphic accelerators that leverage analog computing to enhance inference efficiency in event-based neural networks, addressing the sparsity and power constraints inherent in edge applications. Another significant development is the emergence of model-aware compilation frameworks for heterogeneous edge devices, which streamline the deployment of deep neural networks (DNNs) by optimizing code generation for both general-purpose processors and specialized accelerators. These frameworks are proving to be highly effective in reducing inference latency and improving energy efficiency across various edge platforms. Furthermore, the exploration of weightless neural networks and ultra-low-bit quantization models is pushing the boundaries of what is possible in terms of model size, computational efficiency, and accuracy, particularly for resource-constrained environments like FPGAs and microcontrollers. These innovations collectively underscore a shift towards more efficient, hardware-aware neural network designs that are better suited to the demands of modern edge computing.

Noteworthy papers include one that introduces a NAS framework for hybrid CNN/ViT models, achieving significant accuracy improvements and latency reductions on heterogeneous edge systems. Another paper presents a mixed-signal neuromorphic accelerator that achieves high energy efficiency in accelerating event-based neural network models. Additionally, a novel compilation framework for heterogeneous edge devices demonstrates substantial latency reductions by optimizing DNN deployment across diverse hardware targets.

Sources

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

MENAGE: Mixed-Signal Event-Driven Neuromorphic Accelerator for Edge Applications

MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices

ActNAS : Generating Efficient YOLO Models using Activation NAS

Differentiable Weightless Neural Networks

Efficiera Residual Networks: Hardware-Friendly Fully Binary Weight with 2-bit Activation Model Achieves Practical ImageNet Accuracy

DPD-NeuralEngine: A 22-nm 6.6-TOPS/W/mm$^2$ Recurrent Neural Network Accelerator for Wideband Power Amplifier Digital Pre-Distortion

An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors

Built with on top of