> Keynote Speakers
Prof. Michael R. Lyu
Title: Intelligent Knowledge Discovery for Reliable Cloud Operations
Abstract:
Reliable Cloud operations are vital to our daily lives because many popular modern software systems are deployed in cloud systems. An instant and proper response to any cloud incidents depends on heavy monitoring and on-call engineers’ rich experience about the cloud failures. In this talk, we introduce our experience in developing an AIOps (Artificial Intelligence for IT Operations) framework to improve the reliability of large-scale cloud systems with an intelligent knowledge discovery framework. The comprehensive AIOps framework with knowledge discovery includes the anomaly combination detection of key performance indicators, service dependency mining for dynamic dependency discovery, the propagation analysis of cascading failures, and the insightful system incident discovery for root cause analysis from various information sources like log, meter data, topology, alert, and incident tickets. We also conduct extensive experiments with production data collected from large-scale industrial cloud systems to demonstrate how the findings of intelligent knowledge discovery techniques can help human on-call engineers handle cloud incidences comprehensively. Finally, we discuss the potential roadmaps to automated knowledge graph construction for reliable cloud operations.
Speaker Short Bio:
Michael R. Lyu is a Choh-Ming Li Professor of Computer Science & Engineering at the Chinese University of Hong Kong. He received a B.S. in Electrical Engineering from the National Taiwan University, an M.S. in Electrical and Computer Engineering from University of California, Santa Barbara, and a Ph.D. in Computer Science from University of California, Los Angeles. His research interests include software reliability engineering, dependable computing, machine learning, artificial intelligence, and distributed systems. He published a widely cited McGraw-Hill Handbook of Software Reliability Engineering, and a Wiley book on Software Fault Tolerance. He is a Fellow of the IEEE, a Fellow of ACM, a Fellow of AAAS, and an IEEE Reliability Society Engineer of the Year. He also appears in The AI 2000 Most Influential Scholars Annual List in 2020.