Advances in Audio Compression and Speech Enhancement

The field of audio compression and speech enhancement is moving towards more efficient and effective methods, with a focus on neural network-based approaches. Recent developments have shown that spectral-based methods, such as those using Short-Time Fourier Transform (STFT), can achieve superior perceptual quality and flexibility in compression ratio adjustment. Additionally, optimized speech enhancement models and accelerators are being designed to run on low-power devices, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Noteworthy papers include STFTCodec, which achieves high-fidelity audio compression through time-frequency domain representation, and HiFi-Stream, which provides a lightweight and optimized version of the HiFi++ model for streaming speech enhancement. Other notable works include NeuralAids, a fully on-device speech AI system for wireless hearables, and QINCODEC, a neural audio compression codec that uses implicit neural codebooks.

Advances in Audio Compression and Speech Enhancement

Sources