20.
nov
Gostujoče predavanje: Assoc. Prof. Dr. Minxian Xu
ob 13:00

Vabljeni na gostujoče predavanje, ki ga bo imel Assoc. Prof. Dr. Minxian Xu, z naslovom: Cloud-native System for Fine-grained and Dynamic Resource Management for LLMs. Predavanje bo potekalo 20. novembra 2025 na FRI od 13.00 do 14.00. Predavanje bo v angleščini.

 

Povzetek predaavnja:
Large Language Models (LLMs) are revolutionizing artificial intelligence by enabling advanced natural language processing applications. However, the computational demands of LLM inference require scalable, efficient, and robust system support. This talk investigates cloud-native techniques to optimize LLM inference serving. Specifically, it explores (1) batching inference requests to optimize key-value (KV) cache utilization for enhanced GPU performance and throughput, (2) leveraging containerization for fine-grained resource control to achieve lightweight scheduling and replication, and (3) dynamically balancing workloads through adaptive resource allocation and scaling. Collaborating with industry leaders, this research will integrate first-hand workload data into system design and validation. The outcomes aim to advance system-level support for LLMs, contributing to improved performance and reduced costs for cloud-native AI services.

 

O predavatelju:

Dr. Minxian Xu is currently an Associate Professor at Shenzhen Institutes of Advanced Technology, Chinese Academy Science (SIAT). His research interests are resource scheduling in Clouds, like load balancing and energy efficiency for Clouds, and microservice management under Cloud-native environment. He has published 80+ research papers in prominent journals and conferences (3 ESI highly cited papers), including CSUR x3, TSC x3, TMC, TAAS, TOIT, TSUSC x5, TNSM, TCC, TGCN, TASE, TCE, ICSOC x4 and ICWS. These research work has attracted 5000+ citations (Google Scholar data).