The recent advancements in edge AI and neuromorphic computing are significantly reshaping the landscape of low-latency and low-power applications, particularly in AR/VR and IoT devices. There is a notable trend towards hybrid models that combine convolutional neural networks (CNNs) with transformers (ViTs), which are being optimized for heterogeneous computing environments to achieve superior performance and energy efficiency. These hybrid models are being tailored for specific hardware architectures, such as Neural Processing Units (NPUs) and Compute-In-Memory (CIM) systems, through innovative Neural Architecture Search (NAS) frameworks. Additionally, there is a growing focus on mixed-signal neuromorphic accelerators that leverage analog computing to enhance inference efficiency in event-based neural networks, addressing the sparsity and power constraints inherent in edge applications. Another significant development is the emergence of model-aware compilation frameworks for heterogeneous edge devices, which streamline the deployment of deep neural networks (DNNs) by optimizing code generation for both general-purpose processors and specialized accelerators. These frameworks are proving to be highly effective in reducing inference latency and improving energy efficiency across various edge platforms. Furthermore, the exploration of weightless neural networks and ultra-low-bit quantization models is pushing the boundaries of what is possible in terms of model size, computational efficiency, and accuracy, particularly for resource-constrained environments like FPGAs and microcontrollers. These innovations collectively underscore a shift towards more efficient, hardware-aware neural network designs that are better suited to the demands of modern edge computing.
Noteworthy papers include one that introduces a NAS framework for hybrid CNN/ViT models, achieving significant accuracy improvements and latency reductions on heterogeneous edge systems. Another paper presents a mixed-signal neuromorphic accelerator that achieves high energy efficiency in accelerating event-based neural network models. Additionally, a novel compilation framework for heterogeneous edge devices demonstrates substantial latency reductions by optimizing DNN deployment across diverse hardware targets.