Enhancing Security and Privacy in Large Language Models

Current Trends in Large Language Model Security and Privacy

Recent advancements in the field of Large Language Models (LLMs) have predominantly focused on enhancing security and privacy measures to mitigate risks associated with model misuse and unauthorized access. The research community is actively developing innovative techniques to assert model ownership, detect unauthorized usage, and protect sensitive information embedded within the models. Key areas of development include robust fingerprinting methods that withstand model merging, novel watermarking techniques for identifying machine-generated content, and advanced methods for detecting pretraining data. These innovations not only address current challenges but also pave the way for more secure and trustworthy LLMs.

Noteworthy Developments

  • Robust Fingerprinting against Model Merging: A novel approach introduces robust fingerprints that remain detectable even after model merging, addressing a significant gap in current methods.
  • Token-Level Characterization for Secret Detection: An innovative method leverages token probabilities to distinguish real from fake secrets in Code LLMs, significantly enhancing privacy protection.
  • Frequency-Based Watermarking: A new watermarking technique embeds frequency-based watermarks in LLM-generated text, enabling accurate identification and robust against various attacks.
  • Fine-Tuning for Pretraining Data Detection: A method improves the detection of pretraining data by leveraging fine-tuning on unseen data, significantly enhancing the accuracy of current scoring functions.
  • Efficient Watermark Removal: An advanced framework effectively removes n-gram-based watermarks, addressing the robustness of watermarking schemes.

Sources

MergePrint: Robust Fingerprinting against Merging Large Language Models

Decoding Secret Memorization in Code LLMs Through Token-Level Characterization

FreqMark: Frequency-Based Watermark for Sentence-Level Detection of LLM-Generated Text

Fine-tuning can Help Detect Pretraining Data from Large Language Models

Periodic autocorrelation of sequences

Incremental computation of the set of period sets

Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay

UTF:Undertrained Tokens as Fingerprints A Novel Approach to LLM Identification

Stability properties for subgroups generated by return words

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models

A Watermark for Order-Agnostic Language Models

De-mark: Watermark Removal in Large Language Models

Built with on top of