Enhancing Reasoning and Decision-Making in LLMs and MLLMs

The recent developments in the field of large language models (LLMs) and multimodal large language models (MLLMs) have shown a significant shift towards enhancing reasoning and decision-making capabilities through innovative prompting strategies. A notable trend is the integration of visual and textual information in reasoning processes, exemplified by the introduction of image-incorporated multimodal Chain-of-Thought (CoT) methods. These methods aim to improve the fine-grained associations between visual inputs and textual outputs, thereby enhancing the interpretability and accuracy of model responses. Additionally, there is a growing focus on optimizing prompt engineering to balance cost and accuracy, with novel metrics like the Economical Prompting Index (EPI) being proposed to evaluate cost-effectiveness. The field is also witnessing advancements in zero-shot and few-shot learning, where models are being designed to perform tasks without prior experience by mimicking human thought processes. Furthermore, the exploration of different cognitive processes, such as association and counterfactual thinking, is being leveraged to construct thought trees that facilitate complex problem-solving. These developments collectively indicate a move towards more sophisticated and context-aware models that can handle real-world scenarios more effectively.

Sources

Interleaved-Modal Chain-of-Thought

Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection

Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index

PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving

Chain-of-Thought in Large Language Models: Decoding, Projection, and Activation

MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM

Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Built with on top of