Verifying Indentation Sensitive Parsing

Picture of michael-norrish.md Michael Norrish

9 Oct 2023

Programming languages such as Haskell and Python (and the YAML notation) depend on whitespace as part of indentation to determine their syntax. The specification of this syntax is quite ugly, and very much tied to implementation. Luckily, a relatively recent paper:

Michael D. Adams, Principled Parsing for Indentation-Sensitive Languages (POPL 2013)

created some beautiful theory for capturing exactly how these languages really work. In even more recent work, my colleagues and I implemented parsing for a Haskell-like language (“PureCake”) using the PEG formalism. This is a formal implementation, but there is no proof connecting the PEG to a grammar in the style of Adams.

This project will require the student to:

  1. create a HOL theory of Indentation Sensitive Context Free Grammars
  2. create a sample Indentation Sensitive grammar for the YAML notation
  3. create a corresponding IS PEG for YAML
  4. prove that the PEG and grammar correspond

This project is suitable as a thesis project at either Honours or Masters level, and will require someone with a good background in theoretical computer science (formal language theory in particular) and who is keen to learn to use an interactive theorem-prover.

You are on Aboriginal land.

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

arrow-left bars search times arrow-up