Advancements in Human Pose, Gesture, and Activity Recognition

The recent developments in the research area of human pose and gesture recognition, as well as wearable human activity recognition, indicate a significant shift towards leveraging large-scale data and advanced model architectures to enhance accuracy and efficiency. A notable trend is the exploration of foundation models and scaling laws in expressive human pose and shape estimation, aiming for generalist models that can handle a wide range of scenarios. Similarly, in wearable human activity recognition, there's a move towards more sophisticated models that can effectively capture and fuse intra- and inter-sensor spatio-temporal signals for improved recognition accuracy. Another emerging direction is the application of graph convolutional networks and state-space models for skeleton-based action and gesture recognition, focusing on better modeling of spatio-temporal dependencies and dynamic variations in skeletal motion. Additionally, there's a growing interest in multimodal and multi-party social signal prediction, aiming to understand complex social dynamics through the integration of various social cues. Privacy-preserving technologies for gesture recognition in virtual reality settings are also gaining attention, offering alternatives to traditional camera-based methods. Lastly, the development of multimodal sensor datasets for health monitoring, particularly for older adults recovering from lower-limb fractures, highlights the potential of machine learning in healthcare applications.

Noteworthy Papers

  • SMPLest-X: Introduces a family of generalist foundation models for expressive human pose and shape estimation, achieving state-of-the-art results through data and model scaling.
  • DecomposeWHAR: Proposes a novel model for wearable human activity recognition that effectively captures and fuses intra- and inter-sensor spatio-temporal signals, outperforming existing methods.
  • HFGCN: Presents a hypergraph fusion graph convolutional network for skeleton-based action recognition, improving accuracy by focusing on human skeleton points and body parts simultaneously.
  • EgoHand: Offers a privacy-preserving solution for hand gesture recognition in virtual reality, using millimeter-wave radar and IMUs for accurate gesture detection.
  • MV-GMN: Introduces a state-space model for multi-view action recognition, demonstrating superior performance with reduced computational complexity.

Sources

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition

HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition

Tracking Mouse from Incomplete Body-Part Observations and Deep-Learned Deformable-Mouse Model Motion-Track Constraint for Behavior Analysis

Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation

Efficient Frame Extraction: A Novel Approach Through Frame Similarity and Surgical Tool Tracking for Video Segmentation

Survey on Hand Gesture Recognition from Visual Input

DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling

SMART-Vision: Survey of Modern Action Recognition Techniques in Vision

M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention

EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs

MV-GMN: State Space Model for Multi-View Action Recognition

Multimodal Sensor Dataset for Monitoring Older Adults Post Lower-Limb Fractures in Community Settings

Built with on top of