Efficient Neural Network Architectures for Computer Vision

The field of computer vision is moving towards the development of more efficient neural network architectures that can balance performance and computational resources. Recent research has focused on designing lightweight models that can capture a wide range of perceptual information while achieving precise feature aggregation for dynamic and complex visual representations. One notable direction is the use of frequency decomposition and integration, which has been shown to enhance cross-task generalization and preserve class-specific details. Another area of research is the development of dynamic kernel sharing and spectral-adaptive modulation, which can improve the representation power of neural networks while maintaining computational efficiency.

Noteworthy papers in this area include: Efficient Continual Learning through Frequency Decomposition and Integration, which proposes a novel framework that decomposes and integrates information across frequencies to enhance cross-task generalization and preserve class-specific details. LSNet: See Large, Focus Small, which introduces a new family of lightweight models that combine large-kernel perception and small-kernel aggregation to efficiently capture a wide range of perceptual information. KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters, which proposes a lightweight convolution kernel plug-in that enables dynamic kernel specialization without altering the standard convolution structure.

Sources

Efficient Continual Learning through Frequency Decomposition and Integration

GmNet: Revisiting Gating Mechanisms From A Frequency View

LSNet: See Large, Focus Small

KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters

Expanding-and-Shrinking Binary Neural Networks

Spectral-Adaptive Modulation Networks for Visual Perception

Optimization of Layer Skipping and Frequency Scaling for Convolutional Neural Networks under Latency Constraint

PolygoNet: Leveraging Simplified Polygonal Representation for Effective Image Classification

A Sensorimotor Vision Transformer

Built with on top of