The field of affective computing is witnessing significant advancements in multimodal emotion recognition and human-computer interaction. Researchers are exploring innovative approaches to integrate multiple modalities, such as audio, visual, and physiological signals, to improve the accuracy and reliability of emotion recognition systems. Notably, the development of frameworks that simulate the integration of visual, auditory, and emotional pathways of the brain is enhancing the interpretability and efficiency of affective intelligence. Furthermore, the application of physics-informed multi-task pre-training and cross-modal disentanglement techniques is improving the performance of human activity recognition and emotion detection systems. Some noteworthy papers in this area include: The Audio-Visual Fusion Emotion Generation Model, which introduces a novel framework for audio-visual emotion fusion and generation. The MerGen paper, which proposes a generative neural network for simulating electrophysiological recordings, providing a realistic learning tool for clinicians. The PIM paper, which presents a physics-informed multi-task pre-training framework for improving inertial sensor-based human activity recognition. The ContourUSV paper, which introduces an efficient automated system for detecting rodent ultrasonic vocalizations, outperforming state-of-the-art systems in precision, recall, and F1 score.
Advances in Multimodal Emotion Recognition and Human-Computer Interaction
Sources
GSound-SIR: A Spatial Impulse Response Ray-Tracing and High-order Ambisonic Auralization Python Toolkit
PIM: Physics-Informed Multi-task Pre-training for Improving Inertial Sensor-Based Human Activity Recognition