We are lucky to have three incredible keynotes at the workshop -- provided by Andreas Moshovos (University of Toronto), Pete Warden (Google Brain), and Laurens van der Maaten (Facebook AI Research) -- which will cover a rich spectrum of the embedded and mobile deep learning space.

What's stopping acceleration? Practical lessons from deploying on specialized hardware
Pete Warden
Google Brain

Hardware like DSPs or mobile GPUs are very attractive prospects for running deep learning models because of their low latency and power consumption compared to CPUs. These theoretical advantages can be very hard to achieve in practice though, mostly because it can be very tough to adapt a model trained by researchers into a form that's effective on a particular chip. Challenges typically include lower precision arithmetic, use of custom operations, memory consumption, and graph partitioning. This talk will discuss some examples of these challenges and approaches to overcoming them, with case studies from Qualcomm HVX and iOS Metal integration.

Pete Warden is the tech lead of TensorFlow's Mobile and Embedded team, and was previously CTO of Jetpac, acquired by Google in 2014.


Exploiting Value Content to Accelerate Inference with Convolutional Neural Networks
Andreas Moshovos
University of Toronto

Sufficiently capable computing hardware is essential for practical applications of Deep Learning. Until very recently computing hardware capabilities have been increasing at an exponential rate. As a result, around 2010 computing hardware capability reached the level necessary to demonstrate Deep Learning’s true potential. Unfortunately, semiconductor technology scaling, the key enabler of the past exponential growth in capability, has slowed down dramatically. Fortunately, specialized computing hardware design has the potential to deliver another 2 to 3 orders of improvements in computing capability.

Our goal is to develop the techniques necessary for boosting computing hardware capability thus enabling further innovation in Deep Learning. We are developing specialized computing hardware for Deep Learning Networks whose key feature is that they exploit properties in the value domain. In particular, our accelerators take advantage of expected properties in the runtime calculated value stream of Deep Learning Networks such as the value distribution of activations, or even their bit content. Using image classification convolutional neural networks, we have demonstrated up to 4.5x improvements over a state-of-the-art accelerator. In this talk we will review the need for specialized computing hardware for Deep Learning and summarize our efforts and future directions. We will also briefly touch upon the recently approved NSERC COHESA Strategic Partnership Network on Hardware Acceleration for Machine Learning. NSERC COHESA brings together 19 Researchers across multiple Canadian Universities and 8 Industrial Partners.

Andreas Moshovos teaches how to design and optimize computing hardware engines at the University of Toronto where he has the privilege of collaborating with several talented students on techniques to improve execution time, energy efficiency and cost for computing hardware. He has also taught at Northwestern University, USA, the University of Athens, Greece, the Hellenic Open University, Greece, and as an invited professor at the École Polytechnique Fédérale de Lausanne, Switzerland. He has received the ACM SIGARCH Maurice Wilkes award in 2010, an NSF CAREER Award in 2000, two IBM Faculty awards, a Semiconductor Research Corporation Inventor recognition award, and a MICRO Hall of Fame award. He has served at the Program Chair for the ACM/IEEE International Symposium on Microarchiteture and the IEEE International Symposium on the Performance Analysis of Systems and Software. He studied computer science at the University of Crete, Greece, at New York University, USA, and at the University of Wisconsin-Madison, USA.


Convolutional Networks that Trade-Off Accuracy for Speed at Test Time
Laurens van der Maaten
Facebook AI Research

Convolutional networks constitute the core of state-of-the-art approach to a range of problems in computer vision. Typical networks comprise of tens or even hundreds of layers of convolutions using learned filters, which require a lot of computational and memory resources. In this talk, I will introduce a new network architecture, called multi-scale DenseNets (MSDNets), that allows for the training of a cascade of multiple classifiers at intermediate layers of the network. This allows us to train a single network that, at prediction time, dynamically decides the size of the network: for "easy" images, only a small part of the network is evaluated, whilst for "difficult" images, we evaluate the full, high-quality network. MSDNets achieve state-of-the-art performances on image-classification benchmarks with much lower computational requirements. This talk presents joint work with Gao Huang, Danlu Chen, and Kilian Weinberger of Cornell University.

Laurens van der Maaten is a Research Scientist at Facebook AI Research in New York. Prior, he worked as an Assistant Professor at Delft University of Technology (The Netherlands) and as a post-doctoral researcher at University of California, San Diego. He received his PhD from Tilburg University (The Netherlands) in 2009. He is an editorial board member of IEEE Transactions of Pattern Analysis and Machine Intelligence and is serving as an area chair for the NIPS and ICML conferences. Laurens is interested in a variety of topics in machine learning and computer vision. Specific research topics include learning embeddings for visualization and deep learning, visual reasoning, object tracking, and cost-sensitive learning.