Efficient and Interpretable Fact-Checking in LLMs

The current research landscape in fact-checking and hallucination detection for large language models (LLMs) is characterized by a shift towards more efficient, interpretable, and domain-agnostic solutions. Innovations are focusing on reducing the reliance on costly LLM fine-tuning and external knowledge bases, instead leveraging internal model states and compact, open-source models for faster and more cost-effective fact-checking. The integration of symbolic reasoning and natural logic inference is gaining traction, particularly for handling tabular data and arithmetic functions, which enhances the verifiability and flexibility of fact-checking systems. Additionally, there is a growing emphasis on developing frameworks that can provide statistical guarantees for factuality testing, ensuring high-stakes applications can trust the outputs of LLMs. The field is also witnessing advancements in decoding methods that enhance the factual accuracy of LLMs without significant latency overhead, as well as the development of unified approaches to reliability evaluation that operate flexibly across diverse contexts.

Noteworthy developments include the introduction of a novel decoding framework that improves factual accuracy by leveraging latent knowledge within LLMs, and a unified approach to reliability evaluation that significantly enhances performance across various hallucination detection benchmarks. Another standout is a lightweight fact-checker that uses compact NLI models for real-time detection of nonfactual outputs from retrieval-augmented generation systems.

Sources

Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot

FIRE: Fact-checking with Iterative Retrieval and Verification

Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output

TabVer: Tabular Fact Verification with Natural Logic

AMREx: AMR for Explainable Fact Verification

Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

Graph-based Confidence Calibration for Large Language Models

FactTest: Factuality Testing in Large Language Models with Statistical Guarantees

VERITAS: A Unified Approach to Reliability Evaluation

RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation

Measuring short-form factuality in large language models

Prompt-Guided Internal States for Hallucination Detection of Large Language Models

Built with on top of