Advances in Audio Compression and Speech Enhancement

The field of audio compression and speech enhancement is moving towards more efficient and effective methods, with a focus on neural network-based approaches. Recent developments have shown that spectral-based methods, such as those using Short-Time Fourier Transform (STFT), can achieve superior perceptual quality and flexibility in compression ratio adjustment. Additionally, optimized speech enhancement models and accelerators are being designed to run on low-power devices, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Noteworthy papers include STFTCodec, which achieves high-fidelity audio compression through time-frequency domain representation, and HiFi-Stream, which provides a lightweight and optimized version of the HiFi++ model for streaming speech enhancement. Other notable works include NeuralAids, a fully on-device speech AI system for wireless hearables, and QINCODEC, a neural audio compression codec that uses implicit neural codebooks.

Sources

STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation

HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Wireless Hearables With Programmable Speech AI Accelerators

QINCODEC: Neural Audio Compression with Implicit Neural Codebooks

A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices

A 71.2-$\mu$W Speech Recognition Accelerator with Recurrent Spiking Neural Network

Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting

Built with on top of