Efficient validation and presentation of policies for MDPs

Picture of patrik-haslum.md Patrik Haslum

1 Dec 2023

The Markov Decision Process (MDP) is the canonical model of a sequential decision-making problem under uncertainty, and is solved by probabilistic AI planning methods. The solution to an MDP is a policy: a function that maps states of the process to the action to be taken. Validating a policy is the process of checking if it correct, and computing its expected value (or cost). This is, in principle, simply done by systematically simulating all possible executions of the policy.

However, practical MDPs are modelled in factored representation languages, like PPDDL or RDDL, which allow very large MDPs to be expressed compactly. This can also lead to very large policies, making validating them efficiently a challenge. The first aim of this project is to develop and implement efficient algorithms to validate policies for factored MDPs.

The second aim of the project is to create human-comprehensible presentations of such policies. Although policies may be very large, not all parts of them are equally important: in some states, the action to be taken is obvious, or dictated by necessity. Given as input some information about which decisions are considered important, we want to compute a smaller, abstract representation of a policy that focuses on the parts that matter, as shown in, for example, the graph below:

abstracted policy

This policy, which solves a small instance of a dice-rolling game, has 70 states and 462 actions: but the only decision that really matters, and that is shown in the graph above, is which dice to keep (i.e., not reroll).

Background reading

  • For an introduction to MDPs, their factored representation and basic policy validation algorithms, chapters 2 and 3.1-3.2 of Mausam and Kolobov’s book Planning with Markov Decision Processes is a good starting point.
  • Scott Sanner’s RDDL on github.
  • A description of probabilisitic PDDL and example domains from IPC 2004.
  • Basic policy validation and abstraction are implemented in inval.

Requirements

This is an algorithm design and implementation project. It will require good programming and software development skills.

You are on Aboriginal land.

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

arrow-left bars search times