Advancements in Large Language Models' Reasoning Capabilities

The field of large language models (LLMs) is witnessing significant advancements in their reasoning capabilities. Recent studies have focused on understanding how LLMs acquire reasoning capabilities and exhibit 'aha moments' when they reorganize their methods to allocate more thinking time to problems. The development of new methods and frameworks, such as Retro-Search and ThoughtProbe, has enabled the exploration of more efficient and effective reasoning paths. Additionally, the introduction of new benchmarks, such as AGITB, has provided a more comprehensive evaluation of LLMs' reasoning abilities. Noteworthy papers include 'Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning', which demonstrates the potential of search algorithms in improving LLMs' reasoning capabilities, and 'AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence', which provides a novel benchmark for evaluating LLMs' intelligence.

Sources

Understanding Aha Moments: from External Observations to Internal Mechanisms

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition

Reasoning on Multiple Needles In A Haystack

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Concise Reasoning via Reinforcement Learning

Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification

ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning

Built with on top of