Holistic AI Resource Management and Optimization

The recent developments in the research area of resource management and optimization for AI applications across various computing environments, particularly edge-cloud continuum and mobile devices, have shown significant advancements. The field is moving towards more holistic and adaptive frameworks that address multiple performance metrics simultaneously, such as latency, energy efficiency, accuracy, and throughput. These frameworks leverage novel techniques such as training-free neural architecture search, approximate computing, and dynamic context-aware scaling to achieve better performance and resource utilization. Notably, there is a growing emphasis on privacy-preserving inference methods and the integration of AI into ecological studies, reflecting a broader application spectrum. The noteworthy papers in this area introduce innovative solutions like HE2C for comprehensive edge-cloud resource management, GradAlign for training-free model performance inference, and QuAKE for speeding up model inference using approximate kernels. These contributions not only advance the technical capabilities but also broaden the applicability of AI in diverse and resource-constrained environments.

Sources

HE2C: A Holistic Approach for Allocating Latency-Sensitive AI Tasks across Edge-Cloud

GradAlign for Training-free Model Performance Inference

QuAKE: Speeding up Model Inference Using Quick and Approximate Kernels for Exponential Non-Linearities

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

TruncFormer: Private LLM Inference Using Only Truncations

Characterizing and Modeling AI-Driven Animal Ecology Studies at the Edge

Simplifying HPC resource selection: A tool for optimizing execution time and cost on Azure

ILASH: A Predictive Neural Architecture Search Framework for Multi-Task Applications

AI-Driven Resource Allocation Framework for Microservices in Hybrid Cloud Platforms

Cost-Performance Evaluation of General Compute Instances: AWS, Azure, GCP, and OCI

MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Approximate Top-$k$ for Increased Parallelism

Built with on top of