1) 08/01/2026
WHO: Rana Shahout, Harvard University
WHEN: Thursday, January 8th 2026 at 12:00
WHERE: BUILDING 503 (Computer Science) AUDITORIUM
Title:
Efficient LLM Systems: From Algorithm Design to Deployment
Abstract:
Large Language Models (LLMs) have transformed what machines can do and how systems are designed to serve them. These models are both computationally and memory demanding, revealing the limits of traditional optimization methods that once sufficed for conventional systems. A central challenge in building LLM systems is improving system metrics while ensuring response quality.
This talk presents approaches for reducing latency in LLM systems to support interactive applications, from scheduling algorithm design to deployment. It introduces scheduling frameworks that use lightweight predictions of request behavior to make informed decisions about prioritization and memory management across two core settings: standalone LLM inference and API-augmented LLMs that interact with external tools. Across both settings, prediction-guided scheduling delivers substantial latency reductions while remaining practical for deployment.
Bio:
Rana Shahout is a Postdoctoral Fellow at Harvard University, working with Michael Mitzenmacher and Minlan Yu. She received her Ph.D. in Computer Science from the Technion and previously worked as a Senior Software Engineer at Mellanox (now NVIDIA). Her research combines machine learning, systems, and algorithmic theory to design efficient and scalable AI systems. Rana is a recipient of the Eric and Wendy Schmidt Postdoctoral Award, the Zuckerman Postdoctoral Fellowship, the Weizmann Institute Women’s Postdoctoral Career Development Award, the VATAT Postdoctoral Fellowship, and first place in the ACC Feder Family Award for Best Student Work in Communications.