The research area is witnessing a significant shift towards more efficient and lightweight models, with a strong emphasis on model compression and optimization techniques. Innovations in parameter-efficient architectures, such as the development of unified blocks that extend traditional convolutional neural networks (CNNs) to attention-based models, are pushing the boundaries of what can be achieved with models of smaller magnitudes. These advancements are not only improving performance across various tasks, including vision recognition, dense prediction, and image generation, but also ensuring that these models remain practical for deployment on resource-constrained devices. Additionally, there is a growing focus on lossless model compression, with novel joint optimization strategies and theoretical frameworks being introduced to minimize performance discrepancies caused by compression. These methods aim to stabilize and enhance model efficiency without sacrificing accuracy, demonstrating robust efficacy across different neural network architectures and datasets. Notably, the integration of low-rank factorization and quantization techniques within these frameworks is proving to be particularly effective in achieving lossless compression. Overall, the field is progressing towards more efficient, lightweight, and high-performing models that are better suited for real-world applications.