The current research landscape in the field of large language models (LLMs) applied to reasoning and decision-making tasks is characterized by a shift towards more efficient and adaptive strategies. Researchers are increasingly focusing on developing frameworks that allow LLMs to learn and select optimal strategies autonomously, reducing the need for iterative refinement and external guidance. This trend is exemplified by advancements in reinforcement learning-driven continuous self-improvement and the integration of cross-task experience sharing, which enhance the generalization and adaptability of LLMs across diverse tasks. Additionally, there is a notable emphasis on improving the non-myopic generation capabilities of LLMs to enhance planning accuracy and computational efficiency. These developments not only promise to improve the performance of LLMs in reasoning tasks but also to reduce computational costs, making them more viable for resource-constrained scenarios. Notably, the introduction of frameworks that bridge diverse environments and optimize principled reasoning and acting marks a significant step forward in the adaptability and robustness of LLM-driven agents in complex, real-world applications.