The recent developments in the research area of resource management and optimization for AI applications across various computing environments, particularly edge-cloud continuum and mobile devices, have shown significant advancements. The field is moving towards more holistic and adaptive frameworks that address multiple performance metrics simultaneously, such as latency, energy efficiency, accuracy, and throughput. These frameworks leverage novel techniques such as training-free neural architecture search, approximate computing, and dynamic context-aware scaling to achieve better performance and resource utilization. Notably, there is a growing emphasis on privacy-preserving inference methods and the integration of AI into ecological studies, reflecting a broader application spectrum. The noteworthy papers in this area introduce innovative solutions like HE2C for comprehensive edge-cloud resource management, GradAlign for training-free model performance inference, and QuAKE for speeding up model inference using approximate kernels. These contributions not only advance the technical capabilities but also broaden the applicability of AI in diverse and resource-constrained environments.