Balancing Text and Vision in Long-Context Models

The recent advancements in the field of large language models (LLMs) and vision-language models (LVLMs) have been significant, with a particular focus on enhancing their capabilities in long-context reasoning and multi-document processing. A common theme across these developments is the need to balance textual and visual information, especially in long-context scenarios where models tend to over-rely on text. Innovations such as context pruning and hierarchical prompt tuning have been introduced to mitigate this issue, improving the models' ability to handle extended inputs while maintaining visual relevance. Additionally, there is a growing emphasis on the robustness and generalization of LLMs, with methods like reinforcement learning and contrastive loss being employed to reduce overfitting to specific prompts or environments. The integration of weak supervision and AI feedback in reward modeling is also advancing the field, offering scalable solutions for training LLMs without extensive manual labeling. Furthermore, the evaluation of LLMs' performance in document-level tasks has highlighted the limitations of traditional metrics like BLEU, prompting the exploration of more nuanced evaluation paradigms. Overall, the field is moving towards more sophisticated and context-aware models that can better handle complex, real-world tasks.

Sources

Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting

SCULPT: Systematic Tuning of Long Prompts

Reward Modeling with Weak Supervision for Language Models

Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye

LongReward: Improving Long-context Large Language Models with AI Feedback

UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

Understanding Synthetic Context Extension via Retrieval Heads

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

MDCure: A Scalable Pipeline for Multi-Document Instruction-Following

Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs

On Positional Bias of Faithfulness for Long-form Summarization

What is Wrong with Perplexity for Long-context Language Modeling?

Built with on top of