The Department of Mathematical Sciences will host Dr. Benoît Dhérin of Google on Monday, March 10, from 11:20 a.m. – 12:20 p.m., as part of its math seminar series. Dr. Dhérin will present “Why Neural Networks find Simple Solutions.” This free event will take place in Armitage Hall, Room 124. Contact sm1380@camden.rutgers.edu for more information. 

Abstract

Despite their ability to model very complicated functions and equipped with enough parameters to grossly overfit the training dataset, overparameterized neural networks seem instead to learn simpler functions that generalize well. In this talk, we present the notions Implicit Gradient Regularization (IGR) and Geometric Complexity (GC), which shed light on this perplexing phenomenon. IGR helps to guide the learning trajectory towards flatter regions in parameter space for any overparameterized differentiable model. This effect can be derived mathematically using Backward Error Analysis, a powerful and flexible method borrowed from the numerics of ODEs. For neural networks, we explain how IGR translates to a simplicity bias measured by the neural network GC. We will also show how various common training heuristics put a pressure on the GC, creating a built-in geometric Occam’s razor in deep learning.