The recent advancements in the research area have primarily focused on enhancing the spatial and physical reasoning capabilities of large language models (LLMs) and multimodal language models (MLMs). A significant trend is the development of specialized datasets and training methodologies to improve dynamic spatial reasoning, which is crucial for tasks involving perspective-taking and egocentric action recognition. Additionally, there is a growing emphasis on integrating simulated data to enhance physical reasoning, particularly in understanding and predicting object behavior in dynamic environments. These approaches not only improve model performance on specific benchmarks but also demonstrate robustness in real-world applications. Notably, the use of reinforcement learning with human and artificial intelligence feedback has shown promising results in improving LLMs' performance on complex physics problems, highlighting a shift towards more interactive and adaptive learning models. Furthermore, the exploration of scaling properties in reinforcement learning from human feedback (RLHF) has provided insights into optimizing performance within computational limits, suggesting a more efficient approach to model training and deployment.