Advancing parameter-free and architecture aware optimisation for deep networks

Project Description

This project focuses on developing architecture-aware, parameter-free optimisation algorithms for deep learning. It sits at the intersection of classical optimisation theory and deep learning, aiming to create optimisers that adapt automatically to different neural network architectures, removing the need for hyperparameter tuning.

Goal

The student will extend existing parameter-free optimisation techniques by integrating additional architectures, such as transformers and residual networks, and contribute both theoretically and practically. This work addresses key research challenges, particularly where traditional assumptions from optimisation theory may not hold in deep learning contexts.

Background

Classical optimisation theory often relies on assumptions such as Lipschitz smoothness to support gradient-based methods. However, recent research indicates that these assumptions may not consistently hold in deep learning, as shown in [1], motivating the development of theories explicitly tailored to neural architectures. This project builds on the parameter-free approach introduced in [2]. This work employs majorisation-minimisation techniques [3] to adaptively determine step sizes, by applying perturbation bounds specific to neural architectures, characterising, in that way, the optimisation landscape more effectively. The student’s work will contribute to advancing this field addressing these theoretical challenges and creating practical solutions to optimisation problems actively explored in the research community.

Reading: [1] Cohen, Jeremy M., et al. “Gradient descent on neural networks typically occurs at the edge of stability.” arXiv preprint arXiv:2103.00065 (2021). [2] Bernstein, Jeremy, et al. “Automatic gradient descent: Deep learning without hyperparameters.” arXiv preprint arXiv:2304.05187 (2023). [3] Lange, Kenneth. MM optimization algorithms. Society for Industrial and Applied Mathematics, 2016.

Requirements

This is primarily a theory project. Fundamental knowledge of optimisation theory is essential and familiarity with deep learning and the PyTorch framework is recommended. Suggested courses: COMP4670/STAT3040, COMP4680/COMP4691/MATH3514.

Outcomes

By participating in this project, the student will gain invaluable experience at the forefront of optimisation research, specifically in the growing field of parameter-free methods for deep learning. They will develop expertise in designing and implementing advanced optimisation techniques across diverse architectures, as well as insights into bridging classical optimisation principles with deep learning challenges. This project offers a unique opportunity to contribute to active research in both theoretical and applied contexts, building a skill set that is highly relevant to both academia and industry.

Contact

This project will be co-supervised by Evan Markou and Dylan Campbell. Please direct any expressions of interests to both.