Approximate solutions and preconditioning using AMX, AVX and Tensor Cores

This project is intended as a full year Honours project and a scholarship is available to the right candidate.

Many large scale numerical problems involve the solution of large systems of equations using iterative techniques that require a number of repeated solutions to eventually converge to a required tolerance. Recent hardware developments, primarily intended to accelerate machine learning training, have introduced the capability to perform small matrix operations rapidly but at reduced precision.

Recently, several attempts have been made to co-opt these architectural advances for numerical computing problems with some success (see references below).

The two main hardware developments (available on Gadi) are

AMX BF16 and AVX512 FP16 extensions available on the Sapphire Rapids microarchitecture
CUDA Tensor Cores available on multiple generations of NVidia Compute/Graphics cards

This project would look at applying one or more of these hardware accelerated low precision matrix multiplications to numerical problems implemented with the spectral element method (SEM).

For more information, please see my homepage