The field of speech recognition is moving towards improved performance in low-resource languages and specialized domains, such as elderly speech and technical terminology. Researchers are exploring fine-tuning strategies for existing models and developing new datasets to address the scarcity of speech data in these areas. The integration of custom language models and multilingual architectures is also gaining attention, with the goal of achieving higher transcription accuracy across diverse audio formats and acoustic environments. Noteworthy papers include:
- SeniorTalk, which introduces a carefully annotated Chinese spoken dialogue dataset for super-aged seniors, supporting a wide range of speech tasks. This dataset provides crucial insights for the development of speech technologies targeting this age group.
- Dolphin, a large-scale multilingual automatic speech recognition model that significantly outperforms current state-of-the-art open-source models across various languages.