Fastest DNN Execution Framework on Web Browser

About MIL WebDNN

Run Trained DNN Model on Web Browser

Deep neural network (DNN) is getting much attention to use in many applications. However, it requires a lot of computational resources, and there are many tremendous processes to setup execution environment based hardware acceleration such as GPGPU. Therefore providing DNN applications to end-users is very hard. WebDNN solves this problem by using web browser as installation-free DNN execution framework. This framework optimizes trained DNN model to compress the model data and accelerate the execution, and executes it with novel JavaScript API such as WebAssembly and WebGPU to achieve zero-overhead execution. Empirical evaluations showed that it achieved more than 200x acceleration.

Inference-phase-specialized Optimization

To achieve speedier execution, optimizing the computation graph of DNN models is very important. Execution of DNN consists of two phases, the training phase and the inference phase. The training phase updates the parameters with a back propagation technique. The inference phase makes predictions (forward-propagation only) for the actual task. If the framework focuses on only the inference phase, it can optimize the computation graph more aggressively.

WebDNN focuses on only the inference phase execution on end user devices and supports aggressive optimization. This optimization pipeline can be applied for models trained with various DNN frameworks. It is not required to edit the training codes.

Next Generation JavaScript API

JavaScript is executed by an interpreter. Therefore, it requires computing overhead and it cannot completely harness the capacity of the CPU. The same problem is encountered in GPU. Modern web browsers support WebGL, which is a JavaScript API to use GPU. However, this API is designed for graphics processing and is not suitable for general purpose computation. In addition, using WebGL for general purpose computing incurs overhead costs.

WebDNN uses next generation JavaScript API, WebGPU for GPU execution, and WebAssembly for CPU execution. These APIs help to bring out the full performance of GPU and CPU.

Browser Compatibility

WebDNN supports 3 execution backend implementations: WebGPU,WebAssembly, and fallback pure javascript implementation. By using this 3 backend implementations,WebDNN works all major browsers.

WebGPU backend: Compute on GPU by WebGPU API. This backend is fastest in 3 backends, but currently WebGPU API is supported only in Safari Technology Preview.
WebAssembly backend: Compute on CPU by WebAssembly API. This backend is enough faster than GPU mode of Keras.js [1]. By using with asm.js, this backend works most of all modern browsers.
Fallback backend: Compute on CPU by ECMAScript3. This backend is only for backward compatibility, and not so faster.

Browser Compatibility Table

Internet Explorer	Edge	Safari	Chrome	FireFox
10 - 11WebAssembly/asm.js	- 15WebAssembly/asm.js	Technology PreviewWebGPU	- 58WebAssembly/asm.js	- 53WebAssembly/asm.js
10 - 11WebAssembly/asm.js		- 10.1WebAssembly/asm.js
- 9Fallback		- 10.1WebAssembly/asm.js

This Browser

WebGPU	WebAssembly/asm.js	Fallback
Not supported	Not supported	Not supported

In Safari Technology Preview, WebGPU API is disabled as default. To enable the API, see "Develop" > "Experimental Features" > "WebGPU" in menu bar.

Benchmark

We measured execution time for VGG16 [2] and ResNet50 [3]. Below figure shows the result compared with Keras.js. Computation time per image is shown in vertical axis as logarithmic scale. All tests were run on Mac Book Pro early 2015, Intel Core i5 2.7 GHz CPU, 16 GB Memory, Intel Iris Graphics 6100 GPU. The web browser is Safari Technology Preview 30.

Neural Style Transfer

This example runs Neural Style Transfer model [4]. Neural Style Transfer model are given 2 input images, one is content image and another is style image. Then this model generate an image based on the style of the style image and the content in the content image.

We use chainer [5] implementation provided in [6] and pre-trained model provided in [7]. The pre-trained model are transpiled by GraphTranspiler into graph descriptor, and then executed by DescriptorRunner. All computation are done by web browser, not by server.

Open This App

ResNet50 Image Classification

In this example you can run ResNet50 classification model trained by ImageNet [8]. Original pre-trained model is provided in [9]. All computation are done by web browser, not by server.

Open This App

References

https://github.com/transcranial/keras-js
K. Simonyan, and A. Zisserman."Very Deep Convolutional Networks for Large-Scale Image Recognition". the International Conference on Learning Representations (ICLR), 2014.
K. He, X. Zhang, S. Ren, and J. Sun,"Deep Residual Learning for Image Recognition", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
J. Johnson, A. Alahi, and L. Fei-Fei."Perceptual Losses for Real-time Style Transfer and Single Image Super-Resolution". International Conference on Machine Learning (ICML), 2015.
https://github.com/pfnet/chainer
https://github.com/yusuketomoto/chainer-fast-neuralstyle
https://github.com/gafr/chainer-fast-neuralstyle-models
J. Deng, W. Dong, R. Socher, L. Li, K. Li and L. Fei-Fei,"ImageNet: A Large-Scale Hierarchical Image Database", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
https://github.com/KaimingHe/deep-residual-networks