The field of large language models (LLMs) is moving towards improving efficiency and reducing computational costs. Recent research has focused on developing novel compression techniques, quantization methods, and caching strategies to enable the practical deployment of LLMs. These advancements have led to significant reductions in memory consumption and improvements in inference speed, making LLMs more accessible for real-world applications. Notably, innovative approaches such as nested activation-aware decomposition, task-adaptive group-wise KV cache window selection, and log-distributed quantization have demonstrated superior performance and efficiency. Notable papers include: Large Language Model Compression via the Nested Activation-Aware Decomposition, which proposes a novel post-training compression paradigm for LLMs. LogQuant, a groundbreaking 2-bit quantization technique for KV Cache in LLM inference, delivering substantial memory savings while preserving superior performance.