Privacy-Preserving Machine Learning

Report on Current Developments in Privacy-Preserving Machine Learning

General Direction of the Field

The recent advancements in privacy-preserving machine learning (PPML) have been driven by the need to protect sensitive information while maintaining the performance and scalability of machine learning models, particularly large language models (LLMs) and automatic speech recognition (ASR) systems. The field is moving towards more efficient and adaptive privacy mechanisms that balance computational costs with robust privacy guarantees. Innovations are focusing on parameter-efficient fine-tuning, adaptive noise allocation, and novel cryptographic techniques to enhance privacy without compromising model utility.

One of the key trends is the integration of differential privacy (DP) with parameter-efficient fine-tuning methods, which allows for the mitigation of privacy risks with reduced computational overhead. This approach is particularly beneficial for large models, where traditional DP training can be prohibitively expensive. Additionally, there is a growing interest in leveraging homomorphic encryption (HE) and other cryptographic protocols to enable privacy-preserving computations directly on encrypted data, thereby addressing the privacy concerns associated with personalized model interactions.

Another significant development is the exploration of adaptive privacy mechanisms that dynamically adjust privacy levels based on the context of the data and the specific requirements of the application. This adaptive approach aims to provide more granular control over privacy-utility trade-offs, making it feasible to deploy privacy-preserving models in real-time and high-stakes environments.

Furthermore, the use of knowledge distillation and federated learning techniques is gaining traction, particularly in scenarios where large models need to be fine-tuned on private data without exposing the data to external servers. These methods enable the transfer of knowledge from large, privacy-sensitive models to smaller, more manageable models while maintaining privacy through model-based rather than data-based sharing.

Noteworthy Papers

  1. Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models: This study sets a new performance benchmark for DP fine-tuning in ASR, achieving significant word error rate reductions while maintaining strong privacy guarantees.

  2. Adaptively Private Next-Token Prediction of Large Language Models: The introduction of Adaptive PMixED significantly reduces privacy loss while preserving utility, making private decoding more practical for LLMs.

  3. Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications: The proposed Scalable Differential Privacy (SDP) framework enhances both privacy and performance, showcasing its potential for widespread adoption in sensitive domains.

  4. Encryption-Friendly LLM Architecture: The modified HE-friendly transformer architecture demonstrates significant computational speedups while maintaining comparable performance, providing a viable solution for privacy-preserving LLM services.

  5. Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation: The ANADP algorithm adaptively allocates noise based on parameter importance, narrowing the performance gap between regular and DP fine-tuning.

  6. DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech: DiDOTS achieves significant improvements in privacy performance while retaining utility, making it a promising solution for sensitive medical data.

  7. HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform: HF-NTT significantly improves the efficiency of polynomial multiplication, a critical operation in fully homomorphic encryption.

  8. FELLAS: Enhancing Federated Sequential Recommendation with LLM as External Services: FELLAS leverages LLMs to enhance federated sequential recommendation while ensuring privacy through dx-privacy satisfied sequence perturbation.

  9. Chameleon: An Efficient FHE Scheme Switching Acceleration on GPUs: Chameleon optimizes GPU-based FHE scheme switching, achieving significant speedups and making hybrid FHE schemes more practical.

  10. MORSE: An Efficient Homomorphic Secret Sharing Scheme Enabling Non-Linear Operation: MORSE introduces an efficient HSS scheme that supports non-linear operations with reduced overhead and improved performance.

  11. SnoopGuard: Uncovering and Mitigating Privacy Risks of Bots in Group Chats: SnoopGuard enhances privacy in group messaging by preventing chatbots from accessing unrelated messages and ensuring sender anonymity.

  12. LLM Cascade with Multi-Objective Optimal Consideration: This paper introduces a multi-objective optimization strategy for LLM cascades, enabling better alignment with real-world application demands.

  13. Private Language Models via Truncated Laplacian Mechanism: The proposed truncated Laplacian mechanism offers a novel approach to private word embedding with lower variance and high utility in the high privacy regime.

Sources

Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models

Adaptively Private Next-Token Prediction of Large Language Models

Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications

Encryption-Friendly LLM Architecture

Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation

DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech

HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform

FELLAS: Enhancing Federated Sequential Recommendation with LLM as External Services

Extended Functional Representation Lemma: A Tool For Privacy, Semantic Representation, Caching, and Compression Design

KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Chameleon: An Efficient FHE Scheme Switching Acceleration on GPUs

MORSE: An Efficient Homomorphic Secret Sharing Scheme Enabling Non-Linear Operation

Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats

LLM Cascade with Multi-Objective Optimal Consideration

Private Language Models via Truncated Laplacian Mechanism

Built with on top of