Goal
To develop and investigate a new saddle-escaping optimiser for deep neural networks, which converges faster with less hyperparameter tuning than existing optimisers.
Background
Analysis of deep networks has established that there are an exponential number of local minima (almost all with similar error to the global minimum), but exponentially more saddle points than minima, known as “saddle point proliferation” [1, 2]. This suggests that getting stuck in a high-error local minimum is less of a problem than the optimiser slowing down near saddle points and plateaux. Furthermore, Newton-based second-order methods, while able to handle low-curvature regions appropriately, are attracted to saddle points, limiting their utility in deep learning. This project aims to explore a new class of cheap second-order methods that quickly escape plateaux without being attracted to saddle points.
Reading
[1] Dauphin et al., Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, NeurIPS, 2014.
[2] Choromanska et al., The loss surfaces of multilayer networks. J. Mach. Learn. Res., 2015.
[3] Henriques et al., Small steps and giant leaps: Minimal newton solvers for deep learning, ICCV, 2019.
Requirements
This is primarily a theory project. Fundamental knowledge of optimisation theory is essential and familiarity with deep learning and the PyTorch framework is recommended. Suggested courses: COMP4670, COMP4680, COMP4691.