Advancements in Large Language Models' Reasoning Capabilities

The field of large language models (LLMs) is rapidly advancing, with a focus on improving their reasoning capabilities. Recent studies have investigated the mechanisms underlying LLMs' reasoning, including the interplay between memorization and genuine reasoning. Notably, researchers have identified limitations in current models, such as the Reversal Curse, where LLMs struggle to learn reversible factual associations. To address these limitations, novel methods have been proposed, including the use of symbolic engines, generative evaluation frameworks, and techniques to mitigate reasoning inconsistencies. Furthermore, the development of new benchmarks and evaluation tools, such as KUMO and YourBench, enables more accurate assessments of LLMs' reasoning abilities. Overall, the field is moving towards more robust, interpretable, and generalizable LLMs. Noteworthy papers include 'Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models', which introduces a visualization tool for inspecting LLMs' reasoning paths, and 'Large (Vision) Language Models are Unsupervised In-Context Learners', which presents a joint inference framework for fully unsupervised adaptation.

Sources

Monte Carlo Sampling for Analyzing In-Context Examples

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction

Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models

WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization

A SAT-centered XAI method for Deep Learning based Video Understanding

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

Implicit In-Context Learning: Evidence from Artificial Language Experiments

JudgeLRM: Large Reasoning Models as a Judge

Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Does "Reasoning" with Large Language Models Improve Recognizing, Generating, and Reframing Unhelpful Thoughts?

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation

Towards Responsible and Trustworthy Educational Data Mining: Comparing Symbolic, Sub-Symbolic, and Neural-Symbolic AI Methods

YourBench: Easy Custom Evaluation Sets for Everyone

Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure

Large (Vision) Language Models are Unsupervised In-Context Learners

Reasoning Inconsistencies and How to Mitigate Them in Deep Learning

Generative Evaluation of Complex Reasoning in Large Language Models