The recent advancements in large language models (LLMs) have seen a shift towards more efficient and privacy-conscious approaches. Researchers are focusing on developing frameworks that allow for the continual personalization of evolving LLMs with minimal computational resources, addressing the challenges posed by frequent model updates and limited access to fine-tuning datasets. Additionally, there is a growing emphasis on enhancing the customizability of semi-open LLMs while safeguarding against recovery attacks, which has led to the exploration of models with fewer closed-source layers. Furthermore, the need for private inference in LLMs has driven the development of architectures optimized for efficient processing of encrypted inputs, reducing the reliance on computationally intensive nonlinear operations. These trends collectively aim to make LLMs more adaptable, secure, and accessible to a broader range of applications, particularly in sensitive domains such as healthcare.
Noteworthy papers include one that introduces a training-free framework for the continual personalization of evolving LLMs, achieving comparable performance to fine-tuning methods with significantly reduced GPU memory usage. Another paper proposes a novel approach to designing semi-open LLMs that enhances customizability while maintaining resilience against recovery attacks, using a fine-tuning-free metric to estimate the maximum number of publicly accessible layers. Lastly, a paper presents an architectural optimization framework for LLMs that reduces nonlinear operations, enabling more efficient private inference with substantial reductions in communication and latency.