The field of deep learning for visual recognition is moving towards developing more efficient and adaptive models. Research is focused on reducing computational costs and improving accuracy in resource-constrained environments. This is being achieved through innovative methods such as dynamic convolution, frequency domain learning, and test-time adaptation. Notable papers in this area include:
- FMDConv, which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off.
- FDConv, a novel approach that learns a fixed parameter budget in the Fourier domain, enabling the construction of frequency-diverse weights without increasing the parameter cost.
- SURGEON, a method that substantially reduces memory cost while preserving comparable accuracy improvements during fully test-time adaptation without relying on specific network architectures or modifications to the original training procedure.
- FACETS, a novel unified iterative NAS method that refines the architecture of all modules in a cyclical manner, reducing the overall search space while preserving interdependencies among modules and incorporating constraints based on the target device's computational budget.