Backpropagation-Free Gradient Estimation for Scalable Deep Learning

Project Description

As deep learning models grow in scale, traditional backpropagation faces computational challenges, particularly memory bottlenecks. This project explores an alternative approach by estimating gradients through forward-mode automatic differentiation, using random directional derivatives. This method avoids backpropagation, aiming to enhance training efficiency and scalability by approximating gradients within a structured subspace. Key methods involve leveraging neural network architectures and activation constraints to guide the gradient approximation, offering potential pathways toward efficient, backprop-free training of large models.

Goal

Develop and evaluate backpropagation-free gradient estimation techniques that can match or approximate backpropagation performance. Investigate the efficacy of gradient guessing methods, including Activation Perturbation, Activation Mixing, and others, to optimize training outcomes for deep learning models.

Background

Backpropagation has traditionally been the cornerstone of neural network optimisation but requires substantial memory to store intermediate states, limiting scalability. Recent studies [1-2] reveal that gradients in neural networks lie within predictable low-dimensional subspaces determined by network structure and activation patterns. This insight opens avenues for computationally efficient gradient estimation, where directional derivatives calculated through forward-mode differentiation offer a feasible alternative to gradient-based training in large networks.

Reading: [1] Baydin, Atılım Güneş, et al. “Gradients without backpropagation.” arXiv preprint arXiv:2202.08587 (2022). [2] Singhal, Utkarsh, et al. “How to guess a gradient.” arXiv preprint arXiv:2312.04709 (2023).

Requirements

This is primarily a theory project. Fundamental knowledge of multivariate calculus and optimisation theory is essential and familiarity with deep learning and automatic differentiation and the PyTorch framework is recommended. Suggested courses: MATH1115, MATH1116, COMP4670/STAT3040, COMP4680.

Outcomes

Students will gain hands-on experience in gradient estimation techniques and forward-mode automatic differentiation. This project offers a foundation in scalable training algorithms and presents opportunities for significant contributions to efficient neural network optimisation strategies.

Contact

This project will be co-supervised by Evan Markou and Dylan Campbell. Please direct any expressions of interests to both.