Semantic and Language Integration in 3D Gaussian Splatting

Current Trends in 3D Scene Understanding with Gaussian Splatting

Recent advancements in 3D scene understanding have seen a significant shift towards integrating semantic and language features into Gaussian Splatting (3DGS) models. This approach allows for more nuanced and interactive scene representations, enabling tasks such as open-vocabulary segmentation, object detection, and robotic grasping with sparse view inputs. The field is moving towards more efficient and flexible methods that reduce dependency on dense multi-view inputs and complex data preprocessing, thereby enhancing real-world applicability.

Noteworthy Developments:

  • A novel method introduces a semantic-scaffold representation to balance appearance and semantics, improving boundary delineation and segmentation accuracy.
  • An unsupervised framework achieves view-consistent scene understanding without 2D labels, demonstrating comparable performance to state-of-the-art methods.
  • A system enables multi-level interactions within 3D space using a 3D language field, enhancing both understanding and engagement.
  • A simple approach for language Gaussian Splatting achieves state-of-the-art results with significant speed improvements.
  • A robotic grasping system operates efficiently with sparse-view inputs, providing robust solutions for dynamic environments.

Sources

InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding

ChatSplat: 3D Conversational Gaussian Splatting

Occam's LGS: A Simple Approach for Language Gaussian Splatting

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

SparseLGS: Sparse View Language Embedded Gaussian Splatting

Built with on top of