Collaborative Research: CNS Core: Small: Towards Automated and QoE-driven Machine Learning Model Selection for Edge Inference (NSF-CNS-2007115 and NSF-CNS-2006630)

Overview

Edge devices, such as mobile phones, drones and robots, have been emerging as an increasingly more important platform for deep neural network (DNN) inference. For an edge device, selecting an optimal DNN model out of many possibilities is crucial for maximizing the user’s quality of experience (QoE), but this is significantly challenged by the high degree of heterogeneity in edge devices and constant-changing usage scenarios. The current practice commonly selects a single DNN model for many or all edge devices, which can only provide a satisfactory QoE for a small fraction of users at best. Alternatively, device-specific DNN model optimization is time-consuming and not scalable to a large diversity of edge devices. Moreover, the existing approaches focus on optimizing a certain objective metric for edge inference, which may not translate into improvement of the actual QoE for users. By leveraging the predictive power of machine learning and keeping users in a loop, this project proposes an automated and scalable device-level DNN model selection engine for QoE-optimal edge inference. Specifically, this project includes two thrusts: first, it exploits online learning to predict QoE for each edge device, automating deployment-stage DNN model selection; and second, it builds a runtime QoE predictor and automatically selects an optimal DNN model given runtime contextual information.

This project represents an important departure from and an essential complement to the current practices in DNN model optimization. It can bring the benefits of DNN-enabled intelligence to many more resource-constrained edge devices with an optimal QoE. Additionally, it provides novel observations, insights and principles for edge inference, catalyzing the transformation of the design of DNN models into a new user-centric paradigm. This project also enables new opportunities to improve curriculum design and attract students, especially under-represented minorities, to engage in science, technology, engineering, and mathematics fields.

Challenges

Along with advances in embedded and mobile hardware, the recent breakthroughs on DNN model compression (e.g., network pruning and weight quantization) have significantly reduced model sizes by orders of magnitude with an acceptable accuracy loss, successfully turning edge inference into reality. Naturally, to run inference on resource-constrained edge devices with a satisfactory user experience (which we refer to as quality of experience, or QoE), inference accuracy is not the sole metric to optimize; instead, the employed DNN model architecture must be, in an automated manner, tailored to specific edge device hardware and also adapted to users's input data sets at runtime, striking an optimal balance among various important metrics such as accuracy, latency and energy consumption. Nonetheless, this is challenged by the extremely high degree of heterogeneity for edge inference in terms of the underlying device hardware, usage scenarios and DNN models.

Principal Investigators

Selected Publications

Acknowledgement

This project is supported in part by the U.S. NSF under the grant CNS-2007115 and CNS-2006630. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.