Advances in Efficient Deep Learning for Visual Recognition

The field of deep learning for visual recognition is moving towards developing more efficient and adaptive models. Research is focused on reducing computational costs and improving accuracy in resource-constrained environments. This is being achieved through innovative methods such as dynamic convolution, frequency domain learning, and test-time adaptation. Notable papers in this area include:

  • FMDConv, which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off.
  • FDConv, a novel approach that learns a fixed parameter budget in the Fourier domain, enabling the construction of frequency-diverse weights without increasing the parameter cost.
  • SURGEON, a method that substantially reduces memory cost while preserving comparable accuracy improvements during fully test-time adaptation without relying on specific network architectures or modifications to the original training procedure.
  • FACETS, a novel unified iterative NAS method that refines the architecture of all modules in a cyclical manner, reducing the overall search space while preserving interdependencies among modules and incorporating constraints based on the target device's computational budget.

Sources

FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off

Frequency Dynamic Convolution for Dense Image Prediction

ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

FACETS: Efficient Once-for-all Object Detection via Constrained Iterative Search

Built with on top of