Advances in Multimodal Emotion Recognition

The field of emotion recognition is rapidly advancing, with a growing focus on multimodal approaches that integrate multiple sources of information, such as text, speech, and facial expressions. This trend is driven by the need to better understand human emotions and develop more effective affective computing systems. Recent research has explored the use of deep learning techniques, such as convolutional neural networks and recurrent neural networks, to improve emotion recognition accuracy. Additionally, there is a growing interest in using large language models and contrastive learning to refine speech emotion recognition and enable zero-shot emotion recognition across languages. Noteworthy papers in this area include the proposal of GatedxLSTM, a novel speech-text multimodal emotion recognition model that achieves state-of-the-art performance on the IEMOCAP dataset. Another notable paper presents OmniVox, a systematic evaluation of omni-LLMs for zero-shot emotion recognition, demonstrating their competitive performance with fine-tuned audio models.

Sources

Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems

Group Decision-Making System with Sentiment Analysis of Discussion Chat and Fuzzy Consensus Modeling

Emotion Detection in Twitter Messages Using Combination of Long Short-Term Memory and Convolutional Deep Neural Networks

GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations

OmniVox: Zero-Shot Emotion Recognition with Omni-LLMs

Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition Across Languages

Hybrid Emotion Recognition: Enhancing Customer Interactions Through Acoustic and Textual Analysis

Automated UX Insights from User Research Videos by Integrating Facial Emotion and Text Sentiment

Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning

Built with on top of