The recent advancements in large language models (LLMs) have focused on developing efficient and privacy-preserving tuning and compression techniques tailored for domain-specific applications. Researchers are increasingly prioritizing methods that balance computational efficiency, privacy protection, and model performance. A notable trend is the shift towards layer-wise compression and selective tuning, which allows for significant model size reduction without compromising accuracy. These approaches leverage novel algorithms to dynamically determine the importance of model layers and adapt them to specific domains, achieving substantial inference speedups and memory savings. Additionally, the integration of stochastic gates and low-rank adaptation in finetuning processes has shown promise in enhancing model accuracy while reducing computational overhead. The field is also witnessing a move towards unified tuning and pruning frameworks that optimize both model structure and fine-tuning simultaneously, leading to improved performance in domain-specific tasks. Notably, innovative techniques such as ScaleOT and ATP are setting new benchmarks in privacy-utility scalability and all-in-one tuning, respectively, demonstrating the potential for further advancements in this rapidly evolving area.