Norm matters: efficient and accurate normalization schemes in deep networks

Hoffer, Elad; Banner, Ron; Golan, Itay; Soudry, Daniel

Full-text links:

Download:

(license)

Current browse context:

stat.ML

< prev | next >

new | recent | 1803

Statistics > Machine Learning

Title: Norm matters: efficient and accurate normalization schemes in deep networks

Authors: Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry

(Submitted on 5 Mar 2018 (v1), last revised 8 Mar 2018 (this version, v2))

Abstract: Over the past few years batch-normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. We also improve the use of weight-normalization and show the connection between practices such as normalization, weight decay and learning-rate adjustments. Finally, we suggest several alternatives to the widely used L2 batch-norm, using normalization in L1 and L∞ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations.

Subjects:	Machine Learning (stat.ML); Learning (cs.LG)
Cite as:	arXiv:1803.01814 [stat.ML]
	(or arXiv:1803.01814v2 [stat.ML] for this version)

Submission history

From: Elad Hoffer [view email]
[v1] Mon, 5 Mar 2018 18:16:43 GMT (663kb,D)
[v2] Thu, 8 Mar 2018 13:37:48 GMT (665kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

arXiv.org > stat > arXiv:1803.01814

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Norm matters: efficient and accurate normalization schemes in deep networks

Submission history