On Optimization and Expressiveness in Deep Learning
Understanding deep learning calls for addressing three fundamental questions: expressiveness, optimization and generalization. Expressiveness refers to the ability of compactly sized deep neural networks to represent functions capable of solving real-world problems. Optimization concerns the effectiveness of simple gradient-based algorithms in solving non-convex neural network training programs. Generalization treats the phenomenon of deep learning models not overfitting despite having much more parameters than examples to learn from. This talk will describe a series of works aimed at unraveling some of the mysteries behind optimization and expressiveness. I will begin by discussing recent analyses of optimization for deep linear neural networks. By studying the trajectories of gradient descent, we will derive the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that, sometimes, gradient descent can train a deep linear network faster than a classic linear model. In other words, depth can accelerate optimization, even without any gain in expressiveness, and despite introducing non-convexity to a formerly convex problem. In the second (shorter) part of the talk, I will present an equivalence between convolutional and recurrent networks --- the most successful deep learning architectures to date --- and hierarchical tensor decompositions. The equivalence brings forth answers to various questions concerning expressiveness, resulting in new theoretically-backed tools for deep network design.
Optimization works covered in this talk were in collaboration with Sanjeev Arora, Elad Hazan, Noah Golowich and Wei Hu. Expressiveness works were with Amnon Shashua, Or Sharir, Yoav Levine, Ronen Tamari and David Yakira.