Information Theory of Deep Learning - The computational benefits of the hidden layers
The surprising success of machine learning with deep neural networks poses two fundamental challenges. One is understanding why these multilayer networks work so well on many different artificial intelligence tasks, in some cases close or better than human performance. The second is: what can this success tell us about human intelligence and our biological brain.
Our recent Information Bottleneck theory of Deep Learning provides new insights and answers to the first question. It shows that the layers of deep neural networks achieve the optimal information theoretic tradeoff between training sample size and accuracy, and that this optimality is achieved through the noisy process of stochastic gradient decent. Moreover, it shows that the benefit of the multilayers structure is mainly computational - it exponentially boosts the time of convergence to these optimal representations. In this talk I will address the relevance of of these findings to the emergence of hierarchies in deep neural networks and in biological brains.