[Domestic PhD scholarship] Towards a periodic table for large language models

$41,650 stipend plus training and travel allowances, includes placement with Gradient Institute

Picture of penny-kyburz.md Penny Kyburz

5 Dec 2024

This project is only available for domestic PhD students. Contact Penny Kyburz (penny.kyburz@anu.edu.au) with a short (<1 page) proposal outlining how you would approach this project and your skills/interest in this project by Monday 27 January.

This project is part of the Next Generation AI Graduates Program on Human-AI Interaction in the Metaverse: https://www.csiro.au/en/work-with-us/funding-programs/funding/Next-Generation-Graduates-Programs/Awarded-programs/Human-AI-Interaction-Metaverse

You will work in a multi-disciplinary team, in collaboration with industry partners and supervisors. The project includes a 41,650 stipend and a placement with our partner organisation, DSTG, as well as training provided by CSIRO and a training allowance of $5,000 per year (to cover courses, workshops, conferences, networking, collaboration) and travel allowance of $5,000 (in addition to ANU-provided funding). See funding details here: https://www.csiro.au/en/work-with-us/funding-programs/funding/Next-Generation-Graduates-Programs/NextGen-scholarship-information

Project Description

Despite the enormous power and ubiquity of large language models (LLMs) such as ChatGPT and Claude, we know remarkably little about why and how these models work as well as they do. We also have very little insight into what capabilities these models actually possess, and although we can assess their performance on certain evaluations, this isn’t enough to tell us what is truly going on at a deeper level. If we don’t understand their capabilities, how can we trust that they will be safe to deploy? And how can we build scientifically informed tools to control them? In this project, you will use recent research ideas from the field of “developmental interpretability” to help answer these questions, specifically the local learning coefficient from singular learning theory. The local learning coefficient reveals a connection between the mathematics of statistical learning and the internal structures and mechanisms of models. We will investigate the variation of local learning coefficient on different datasets through appropriate computational experiments as a way to unveil and enumerate the intricate skills these LLMs have. Think of this like trying to create a “periodic table” for LLM capabilities - we want to break down their functionalities into fundamental components to understand them at a more basic level. Our main goal is to then use this knowledge to better assess what a given LLM is truly capable of, and eventually find new ways to train and refine LLMs that give rise only to desirable capabilities, thus ensuring their safe deployment and use.

arrow-left bars search times