Advancements in Serverless Computing and Machine Learning Integration

The recent developments in the research area highlight a significant shift towards optimizing and integrating serverless computing with machine learning (ML) models, particularly focusing on large-scale and distributed systems. A notable trend is the exploration of serverless platforms for deploying Mixture-of-Experts (MoE) models, aiming to leverage the scalability and cost-effectiveness of serverless computing while addressing challenges related to expert popularity and communication bottlenecks. Innovations in this space include advanced optimization frameworks and deployment algorithms that significantly reduce costs and maintain high throughput.

Another emerging direction is the collaboration between Large Language Models (LLMs) and small recommendation models (SRMs) in device-cloud settings. This approach seeks to combine the strengths of LLMs and SRMs, enhancing recommendation systems' ability to capture real-time user preferences efficiently. Strategies such as collaborative training and inference are being developed to improve the practicality and effectiveness of these hybrid models.

Furthermore, the field is witnessing advancements in benchmarking and evaluating the performance of different application types across heterogeneous cloud compute services. This research aims to provide insights into optimizing the use of cloud resources for various workloads, considering factors like cost, latency, and energy consumption.

In the realm of astronomical data processing, scalable solutions leveraging serverless cloud infrastructure are being introduced to facilitate large-scale inference tasks. These solutions aim to make deep learning models more accessible and efficient for processing astronomical images.

Lastly, the development of fully serverless data processing systems and adaptive caching frameworks for mobile edge LLM services represents a push towards more elastic, cost-efficient, and low-latency computing environments. These innovations address the challenges of performance variability and resource constraints in serverless and edge computing contexts.

Noteworthy Papers

  • Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing: Introduces a Bayesian optimization framework for cost-efficient MoE model deployment on serverless platforms, significantly reducing billed costs.
  • Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation: Proposes a device-cloud collaborative framework that integrates LLMs and SRMs for enhanced recommendation systems, focusing on real-time user preference capture.
  • Scalable Cosmic AI Inference using Cloud Serverless Computing with FMI: Presents a scalable solution for astronomical image data processing using serverless cloud infrastructure, improving accessibility and efficiency.
  • PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks: Develops a progressive inference system for LLMs that enhances throughput and reduces latency through cloud-edge collaboration.
  • Adaptive Contextual Caching for Mobile Edge Large Language Model Service: Introduces an adaptive caching framework for mobile edge LLM services, significantly improving cache hit rates and reducing retrieval latency.

Sources

Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation

Benchmarking Different Application Types across Heterogeneous Cloud Compute Services

Scalable Cosmic AI Inference using Cloud Serverless Computing with FMI

An Empirical Evaluation of Serverless Cloud Infrastructure for Large-Scale Data Processing

Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing

PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks

Adaptive Contextual Caching for Mobile Edge Large Language Model Service

MoE$^2$: Optimizing Collaborative Inference for Edge Large Language Models

Built with on top of