Efficient Reasoning in Large Language Models

The field of large language models is moving towards more efficient and effective reasoning capabilities. Recent research has focused on improving the computational costs and environmental impact of these models, while also enhancing their accuracy and performance. One key direction is the development of methods that can dynamically terminate sampling once sufficient consistency is achieved, reducing the need for excessive computations. Another area of research is the use of adaptive reasoning modes, which can allocate inference-time compute according to specific task characteristics. Additionally, there is a growing interest in exploring the latent thoughts that underlie the text generation process, which can significantly improve pretraining data efficiency.

Noteworthy papers include: ConSol, which proposes leveraging Sequential Probability Ratio Testing to dynamically terminate sampling, achieving comparable accuracy to self-consistency methods at a substantially reduced computational cost. ImageGen-CoT, which introduces a novel framework that incorporates a thought process prior to image generation, resulting in a substantial 80% performance gain for SEED-X on T2I-ICL tasks. Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging, which presents a comprehensive empirical study on model merging for L2S reasoning, reducing average response length by up to 55% while preserving or improving baseline performance. Entropy-Aware Branching for Improved Mathematical Reasoning, which proposes a novel approach that dynamically branches the generation process on demand, boosting the reasoning capabilities of small LLMs up to 4.6% compared to conventional argmax decoding.

Sources

ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

Reasoning to Learn from Latent Thoughts

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking

Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection

SWI: Speaking with Intent in Large Language Models

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Entropy-Aware Branching for Improved Mathematical Reasoning

Built with on top of