Presentation slides.
Click here
Video should be available from
conference site at some point.
Bibliography
Optimization
- Ge, Huang, Jin, Yuan COLT 2015. "Evading
saddle points: online stochastic gradient descent for
tensor decomposition."
- Jin, Ge, Netrapalli, Kakade, Jordan ICML'17. How to escape saddle points efficiently. Blog post.
- 2nd order black box methods for deep learning. Paper 1 by
Agrarwal et al and Paper 2 by
Carmon et al.
- Blog post by Arora and Ma: Framework for analysing nonconvex optimization. (describes "measure of progress.")
- Nonblack box analyses of simpler problems (subcases of
simple neural nets). Topic modeling (Arora, Ge, Moitra),
Sparse coding, Matrix Completion,
- Ge, Jin, Zheng. No spurious
local minima in nonconvex low rank problems: A
unified geometric analysis.
- Analyses of multilayer linear nets.
- Hazan's tutorial
on optimization in ML.
Generalization
- Zhang, Bengio, Hardt, Recht, Vinyals. Understanding
deep learning requires rethinking generalization.
Belkin et al'18 To understand
deep learning we need to understand kernel learning.
- Blog post 1 by Arora. Generalization and Deep Nets: An Introduction.
- Blog post 2: Proving
generalization bounds for deep nets via compression.
- (Not) Bounding the true error. Langford and Caruana
- Various generalization bounds. Bartlett-Mendelson'02,
Neyshabur et
al'17, Neyshabur
et al'18, Arora et al'18.
- Chaudhari et al. Entropy SGD biasing gradient descent into wide valleys.
- Morcos et al. On the role of
single directions for generalization.
Expressibility/role of depth
- Eldan, Shamir COLT'16 Power of depth for feedforward neural networks. Telgarsky COLT'16 Benefits of depth in neural networks.
- Telgarsky lecture
notes.
- Arora, Cohen, Hazan ICML'18. Optimization of deep nets: implicit acceleration by overparametrization.
Unsupervised learning, GANs
- Representation
learning: A new review and some perspectives, by
Bengio, Courville, Vincent 2012.
- Blog post by Arora and Risteski. Unsupervised
learning: one notion or many? (Explains
possible gap in thinking of unsupervised learning as
distribution learning.)
- Goals
and Principles of Representation Learning, blog post
by Ferenc Huszar.
- NIPS'16
tutorial on GANs by Ian Goodfellow. (Good survey
circa 2016.)
- Generalization
and equilibrium in GANs, Arora et al ICML'18. Blog
post.
- Do
GANS learn the distribution? Some theory and empirics.
Arora, Zhang, Risteski ICLR'18. Blog
post A and Blog
post B.
Deep learning-free Text embeddings
- Arora Introductory post and Post 2 describing connection to compressed sensing
- Wieting et al.'16. Toward Universal
Paraphrasitic Sentence Embeddings.
- Our various linear text embeddings. SIF
(Simple but tough) ICLR'17, DisC
via compressed sensing ICLR'18, A la Carte
ACL'18.
- Above embeddings are inspired by theory of word
embeddings, described in this Blog
Post papers. Paper 1 TACL'16
(Rand-Walk model), and Paper 2
TACL'18 (How polysemy relates to word embeddings; Short
answer: linearly).