Evaluating DeepMon, Tensor Comprehension, and Glow: Frameworks for Optimizing Convolutional Networks

By: Poma Panezai   |   Pages: 19 - 27  |   pdf icon   Open

Abstract

Neural networks share some problems with “normal” programs: they need to be executed and used on an always growing number of different architectures. Therefore, we need a way to execute our networks on all those devices efficiently without recreating and retraining our network. Ordinary programs have solved this problem by introducing compilers for each specific architecture to create the machine code for each device from the same codebase, and neural networks can follow the same approach. There is a certain number of different frameworks for optimizing and compiling a trained network with different architectures in the inference phase. In this paper we will compare “DeepMon”, a framework specialized in mobile GPUs, “Tensor Comprehension” and “Glow”, two popular frameworks for general usage. Based on our evaluations, DeepMon achieves a speedup of up to 5× over basic GPU frameworks on mobile devices, while TC and Glow outperform traditional frameworks like TensorFlow and Caffe2 on server-grade GPUs. Glow achieves inference speeds that are up to 2.7 times faster than those of TensorFlow, and Tensor Comprehension matches or exceeds the performance of CUBLAS in the majority of categories.
DOI URL: https://doi.org/10.64820/AEPJMLDL.22.19.27.122025