Speech Recognition and Dysfluency Detection

Report on Current Developments in Speech Recognition and Dysfluency Detection

General Direction of the Field

The recent advancements in the field of speech recognition and dysfluency detection are marked by a shift towards more inclusive, accurate, and context-aware models. Researchers are increasingly focusing on addressing the challenges posed by disordered speech, spontaneous speech, and low-resource languages. The development of datasets that reflect real-world speech conditions, such as informal conversations and speech disorders, is a significant trend. These datasets are crucial for training models that can generalize well to diverse and challenging environments.

Another notable direction is the integration of multi-task learning and self-supervised models. These approaches aim to leverage the strengths of pre-trained models to improve performance on specific tasks, such as dysfluency detection and punctuation restoration. The use of end-to-end frameworks, inspired by object detection algorithms, is also gaining traction, particularly for tasks that require precise timing and type transcription of dysfluencies.

Moreover, there is a growing emphasis on the ethical considerations of dataset curation, particularly in the context of disordered speech. Ensuring that datasets are well-annotated, diverse, and representative of the target populations is becoming a central concern. This includes the use of human-reviewed annotations and the collection of comprehensive metadata to enhance the reliability and usefulness of the datasets.

Noteworthy Developments

  1. Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
    This framework introduces a scalable and multi-lingual approach to dysfluency detection, achieving state-of-the-art performance across multiple corpora.

  2. Self-supervised Speech Models for Word-Level Stuttered Speech Detection
    The study presents a novel word-level detection model that outperforms previous approaches, making significant strides in automated stuttering diagnosis.

  3. Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge
    The PD-DWS system achieves top performance in the challenge, demonstrating the effectiveness of a dual-filter strategy and multi-task learning in dysarthria speech recognition.

  4. Augmenting Automatic Speech Recognition Models with Disfluency Detection
    This work introduces an inference-only approach to enhance ASR models with disfluency detection capabilities, significantly improving the transcription of spontaneous speech.

  5. Spontaneous Informal Speech Dataset for Punctuation Restoration
    The SponSpeech dataset and its associated filtering pipeline represent a significant contribution to the field, enabling more realistic evaluations of punctuation restoration models.

These developments highlight the ongoing innovation and progress in the field, pushing the boundaries of what is possible in speech recognition and dysfluency detection.

Sources

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection

Self-supervised Speech Models for Word-Level Stuttered Speech Detection

Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge

Augmenting Automatic Speech Recognition Models with Disfluency Detection

Predicting Punctuation in Ancient Chinese Texts: A Multi-Layered LSTM and Attention-Based Approach

Spontaneous Informal Speech Dataset for Punctuation Restoration

WER We Stand: Benchmarking Urdu ASR Models

ASR Benchmarking: Need for a More Representative Conversational Dataset

Built with on top of