In this lab, we cover type classes and ad hoc polymorphism, and how we can use these concepts to generalise functions that require some assumptions about the input type. We also continue the topic of trees from last lab, and introduce binary search trees, which are trees with a special ordering constraint that gives them a great advantage over binary trees in terms of computational efficiency.
- Pre-Lab Checklist
- Getting Started
- Types
- Type Classes
- Applications of Type Classes
- Instances
- Deriving Type Classes
- Binary Search Trees
- Exercise 4
- Exercise 5
- Exercise 6
- Exercise 7
- Exercise 8
- Extensions (Optional)
- Appendix: Bisection Search
Pre-Lab Checklist
- You have completed Lab 9, understand binary trees, and how to write recursive functions over them.
- You are comfortable with the concept of parametric polymorphism.
- You are comfortable with the recursive data type
Nat
from Lab 06
Getting Started
-
Go to the project’s dashboard under the Project tab and click on the Fork button. The gitlab url for this lab is https://gitlab.cecs.anu.edu.au/comp1100/2023s2/studentfiles/lab10-1100_s2_2023
-
You will be asked where to fork the repository. Click on the user or group to where you’d like to add the forked project. You should see your own name here. Once you have successfully forked a project, you will be redirected to a new webpage where you will notice your name before
> project-name
. This means you are now the owner of this repository. The url in your browser should reflect the change: https://gitlab.cecs.anu.edu.au/uXXXXXXX/lab10-1100_s2_2023 -
Finally, clone your forked repository to your computer according to the instructions in the Week 2 Lab.
-
Put your exercise solutions in the file
Lab10.hs
. This file is pre-filled with unfinished functions that match the exercises below.
This lab contains additional exercises for COMP1130 students. COMP1100 students are encouraged to attempt them once they have completed the compulsory exercises to receive the participation marks for the lab.
This is a long lab, and there are two main topics, both of which are of high importance. If you are still working on exercises 1-3 after the first hour, it is recommended that you skip to Exercise 4. Don’t stress though! This is just so you have a chance to ask your tutor questions about both type-classes, and binary search trees. As always, you can finish up any remaining questions afterwards before next week’s lab.
Types
We’ve seen before in Lab06 that some functions can be written to work on an arbitrary type, like length:
length :: [a] -> Integer
length list = case list of
[] -> 0
_:xs -> 1 + length xs
The actual values of the elements in the list do not concern us, we only want to know how many there are, which is why this function can be written in a way that’s parametrically polymorphic, that is, it will work regardless of the type of the elements in the list.
Now we would like to generalise other functions. Consider the following function
sumInteger
:
sumInteger :: [Integer] -> Integer
sumInteger list = case list of
[] -> 0
x:xs -> x + sumInteger xs
It’s possible to write the same function for types other than Integer
, like
Double
.
sumDouble :: [Double] -> Double
sumDouble list = case list of
[] -> 0
x:xs -> x + sumDouble xs
Clearly this function cannot be made parametrically polymorphic, as the addition
operator +
isn’t defined for all types. (We can’t add together strings or
Boolean’s, for example.)
So we want a way to define the sum
function to operate over all lists that
contain types that “act like numbers”, that is, types where addition is well
defined. This is a weaker form of polymorphism, called ad-hoc polymorphism.
If we had just tried to make the type of sum
as general as possible by
changing the type signature to
sumEverything :: [a] -> a
then we get the error message
No instance for (Num a) arising from a use of `+'
This is a rather cryptic way of saying “The input type can be anything, but
addition is only defined on things that are numbers, that is, types that belong
to the type class of Num
.” To solve this problem, we need to introduce type
classes.
Type Classes
Haskell handles ad-hoc polymorphism by using type classes, grouping types together into classes that all share similar properties, (they all act like numbers, they all have equality defined, etc.)
The main type classes you’re likely to see in Haskell are as follows:
Type Class | (some of the) functions defined | Description | Depends on? |
---|---|---|---|
Show |
show |
Can be printed as a string | N/A |
Eq |
(==) , (/=) |
Has equality defined | N/A |
Ord |
<= , max , compare |
Has ordering defined | Eq |
Num |
(+) , (*) , (-) , abs , negate , signum , fromInteger |
Numbers, can perform arithmetic | N/A |
Fractional |
(/) , etc. |
Fractional numbers, can be divided | Num |
Floating |
sqrt , exp , log , sin ,cos , pi , etc. |
Acts like real numbers, many mathematical functions defined | Fractional |
Real |
toRational |
Numeric types that can be written as a rational | Num ,Ord |
Enum |
succ , pred , toEnum , fromEnum , etc. |
Sequentially ordered types | N/A |
Integral |
div , mod , etc. |
Acts like whole numbers, with integer division | Real , Enum |
Some types depend on others, and the type classes form a hierarchy, as shown in the following diagram.
We’ve left out some functions here, if you want to see the full list, or if you
ever want to know more about a type, you can type in :info TypeName
into
GHCi
. It will also tell you what other types are instances, or belong to,
this type class.
ghci> :info Eq
class Eq a where
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
instance Eq a => Eq [a] -- Defined in ‘GHC.Classes’
instance Eq Word -- Defined in ‘GHC.Classes’
instance Eq Ordering -- Defined in ‘GHC.Classes’
instance Eq Int -- Defined in ‘GHC.Classes’
instance Eq Float -- Defined in ‘GHC.Classes’
instance Eq Double -- Defined in ‘GHC.Classes’
instance Eq Char -- Defined in ‘GHC.Classes’
instance Eq Bool -- Defined in ‘GHC.Classes’
...
Here, we can see that members of the type class Eq
have the functions (==)
and (/=)
defined. We can also see that a few familiar types, Double, Char,
Bool, Int
all have equality defined on them. Digging deeper, we find the
interesting lines
instance Eq a => Eq [a] -- Defined in ‘GHC.Classes’
instance (Eq a, Eq b) => Eq (a, b) -- Defined in ‘GHC.Classes’
which tell us that if a type a
has equality defined, then so too does the type
[a]
. This is very powerful, as it saves on extra code that we don’t need
to write. As soon as you’ve told haskell how to compare integers, it can
automatically work out that [1,2] == [1,3]
should evaluate to False
.
The same goes for tuples: since Integer
and Char
both have equality defined,
so does (Integer, Char)
without any further work.
Applications of Type Classes
Consider the following function:
sum :: Num a => [a] -> a
sum list = case list of
[] -> 0
x:xs -> x + sum xs
The notation
Num a => [a] -> a
means that the function has type [a]
as input, and returns an element of type
a
as output, where a
is any type from the type class Num a
. So [Integer]
-> Integer
, or [Double] -> Double
are special cases of Num a => [a] -> a
,
but [Double] -> Integer
or [Bool] -> Bool
is not.
If we want a type to be a member of two or more type classes, we can enforce this.
areTheyEqual :: (Eq a, Show a) => a -> a -> String
areTheyEqual x y
|x == y = (show x) ++ " is equal to " ++ (show y)
|otherwise = (show x) ++ " is not equal to " ++ (show y)
Here, we are enforcing that the input type has equality defined, and can be
turned into a string. Sometimes specifying two type classes is redundant, e.g.
(Ord a, Eq a) => ...
is the same as Ord a => ...
as Ord
is a subclass of
Eq
.
Exercise 1 (Optional)
This exercise is optional, and is here just to help you further get your head around type classes. If you feel confident, feel free to skip ahead to Exercise 2.
See if you can work out what type classes the following types belong to. You
might like to experiment and try out some operations in GHCi
(like trying to
compare two elements with <=
, or trying to add two elements together) to see
what is defined.
Integer
Double
Char
Bool
(Integer, Integer)
(Integer, Double)
String
[Integer]
Integer -> Integer
Maybe Integer
Exercise 2
You can check your answers by asking
GHCi
what it had inferred the type to be, by entering :type functionName
.
-- cow :: ???
cow x y z = x && (y `elem` z)
-- foo :: ???
foo x y z = x `elem` y && y `elem` z
-- bar :: ???
bar x y = case y of
Nothing -> x
Just z -> x + z
-- snap :: ???
snap x y
|x > y = show x
|otherwise = show y
Submission required: Exercise 2 inferring types
Instances
When we create a new type, we would like to tell Haskell what type class our new type shall belong to. We use the following code,
instance TypeClass MyNewType where
...
which breaks down as:
instance
is a reserved keyword indicating we are adding a new type to a type classTypeClass
is the type class we want our new type to belong toMyNewType
is the name of our new type
We then have to define all the functions that a member of that type class should have defined.
- For the
Eq
type class, we need only(==)
, (and we get(/=)
for free.) - For
Ord
, we need to already be inEq
, and then also define either(<=)
orcompare
(and Haskell can work out all the other operators like(<)
,(>)
,(>=)
for free.)
If you’re interested, you can look at the documentation for the Prelude, and under each type class, the minimal complete definition tells you the fewest functions that you need to define, and then the rest are automatically derived for you.
For example, (see Fruit.hs
) suppose we define a new type for fruit,
data Fruit = Apple | Banana | Orange
and we wanted to define equality on our fruit. We need to tell Haskell what
(==)
means for fruit.
instance Eq Fruit where
Apple == Apple = True
Banana == Banana = True
Orange == Orange = True
_ == _ = False
So we define equality in the obvious way. All fruits are equal to themselves, and any fruit is not equal to a different fruit.
Note that there was nothing forcing me to define equality in this way,
I could have defined Apple == Banana
to be True
but
Banana == Apple
to be False
. This doesn’t mean that we should, though.
To do so would be very poor style, and go against the
Haskell standard.
I can also define what it means to show a Fruit
.
instance Show Fruit where
show fruit = case fruit of
Apple -> "An apple."
Banana -> "A banana! Yum!"
Orange -> "Yuck, an orange."
So when I type Banana
into GHCi
, a Banana
is printed to the terminal using
my custom definition for show
.
> Banana
A banana! Yum!
I can also define ordering on fruit. I’m going to order fruit by how much I
enjoy eating them: Banana
being the best, Apple
being okay, and Orange
is
the worst. (Deepest apologies to those that love oranges.)
If we give a full definition of the (<=)
function:
instance Ord Fruit where
(<=) Orange Apple = True
(<=) Apple Banana = True
(<=) Orange Banana = True
(<=) Banana Apple = False
(<=) Apple Orange = False
(<=) Banana Orange = False
(<=) Banana Banana = True
(<=) Apple Apple = True
(<=) Orange Orange = True
then Haskell can use that to define all the usual ordering operators:
> (Banana > Orange)
True
You’ll note how tedious it is to write out all the cases. Surely if
Orange <= Apple
and Apple <= Banana
, then Orange <= Banana
, right?
Sadly, Haskell doesn’t place any sensible constraints on the <=
function we
define. It’s up to the user to define <=
in such a way that the usual
ordering
properties
you would expect, are satisfied.
Deriving Type Classes
Rather than doing the hard work ourselves, we can ask Haskell to define a lot of
these properties for us, using the deriving
keyword.
data Fruit = Orange | Apple | Banana
deriving (Ord, Show)
Haskell will assume the equality you wanted is the “obvious” implementation, that every fruit is equal to itself, and that no two different fruits are equal.
It will also place a ordering on fruit, on the order they appear in the
definition of Fruit
, so Orange
is less than Apple
, which is in turn less
than Banana
. The default definitions of equality and ordering Haskell provides
satisfy all the nice properties you would expect them to. The default
implementation of show
will also print each fruit exactly as they are defined.
> show Banana
"Banana"
Type classes where you can ask Haskell to do the hard work (defining all the
functions automatically) for you are called derivable classes. Not all type
classes are derivable. If we tried to add arithmetic to our fruit by making it a
member of the type class Num
,
data Fruit = Orange | Apple | Banana
deriving (Ord, Show, Num)
then Haskell will throw an error
Fruit.hs:4:14:
Can't make a derived instance of `Num Fruit':
`Num' is not a derivable class
and inform you that it doesn’t know how to define arithmetic on Fruit
. What
should Apple + Banana
be? What about Banana * Orange
? There’s no obvious way
to define it.
Exercise 3A
Recall our definition of Nat
from
Lab 06.
data Nat = Z | S Nat
deriving Show
We’d like to define equality on natural numbers: Two arbitrary natural numbers
are equal if they have the successor S
applied the same number of times.
(==)
for
the type Nat
.
Do not use deriving Eq
here. You should define equality yourself.
Exercise 3B
It’s possible to place an ordering on natural numbers as follows:
Z < (S Z) < S (S Z) < ...
So if we want to ensure Nat
belongs to the type class Ord
, we need only to
define a definition of (<=)
, and Haskell can work out corresponding
definitions for all the other operators defined on Ord
: (<)
, (>)
, max
,
min
, compare
.
(<=)
for
the type Nat
.
Do not use deriving Ord
here. You should define equality yourself.
> Z >= (S Z)
False
> max (S Z) (S (S Z))
(S (S Z))
> (S Z) == (S Z)
True
> (S Z) == (S (S Z))
False
Submission required: Exercise 3A, 3B, instance Eq
, Ord
Binary Search Trees
We saw in Lab 9 how we can write functions over binary trees, and over rose trees, but we haven’t yet seen how efficient the tree data structure can be. You might have noticed that for the exercises in last lab, you were forced to look through all the elements in a tree to check if something was there, or to find the maximum/minimum element, as you would have to do with a list. So far, trees appear to just be lists, but more annoying to deal with.
We need to add one extra property to provide trees with much more power than they currently have, a property we call the binary search ordering constraint.
A binary tree satisfies the binary search ordering constraint, if either:
- It is a
Null
tree - Every element in the left hand sub-tree is strictly less than the root, and every element in the right hand sub-tree is strictly more than the root. Furthermore, both sub-trees also satisfy the binary search ordering constraint.
Binary trees that meet this constraint are called binary search trees.
Note that it is very important for the sub-trees to also pass the ordering
constraint, given the following example tree.
(This tree is included for you as notBStree
.)
See how on the left sub-tree, it is true that all elements \(\{1,3,2\}\) are less than the root node of 5. But if we consider the left sub-tree in isolation:
We can see that the left sub-tree does not satisfy the ordering constraint, as the root node 3, is bigger than the biggest element in the right sub-tree, 2.
This means the original tree is not a binary search tree.
In Haskell, binary search trees are no different structurally from binary trees, and so we define them appropriately
type BSTree a = BinaryTree a
while keeping in mind that any instance of a BSTree
had
better satisfy the ordering constraint, as there’s no easy way to enforce
that property in the type itself.
This also means the type a
that a BSTree
contains must be
a member of the type class Ord
.
This ordering property is what gives binary search trees a huge improvement in performance on lists. We can search through the tree, moving left if the root node is too large, and moving right if it’s too small. To see an example, see the appendix at the end of this lab. Otherwise, you can move on with the exercises.
Exercise 4
elemTree
function from Lab 9
but this time you may assume
that the input tree satisfies the binary search ordering constraint.
elemBSTree :: (Ord a) => a -> (BSTree a) -> Bool
Do not use the same function as before, you should be able to search more efficiently. As a reminder, this function should take an element, and a binary search tree, and check if that element is present in the tree.
> elemBSTree 5 goodTree
True
> elemBSTree 30 goodTree
False
Submission required: Exercise 4 elemBSTree
Exercise 5
treeMaximum
and treeMinimum
functions,
again assuming the input tree is a binary search tree. Be efficient!
treeBSMax :: (Ord a) => BSTree a -> a
treeBSMin :: (Ord a) => BSTree a -> a
> treeBSMax goodTree
31
> treeBSMin goodTree
1
Submission required: Exercise 5 treeMaximum
, treeMinimum
Exercise 6
isBSTree
that takes a binary tree as input, and
checks if the binary search constraint holds.
isBSTree :: (Ord a) => BinaryTree a -> Bool
> isBSTree goodTree
True
> isBSTree unbalancedTree
True
> isBSTree notBSTree
False
Submission required: Exercise 6 isBSTree
Exercise 7
treeInsert
that takes a binary search tree, and an element, and inserts
that element into the tree, ensuring the binary search ordering constraint
still holds.
treeInsert :: (Ord a) => BSTree a -> a -> BSTree a
(If the element is already in the tree, leave the tree unchanged.)
You may find the printTree
function useful for this exercise, so
that you can verify that the function inserts a new element into the correct
place.
> treeInsert smallTree 3
Node (Node Null 1 (Node Null 3 Null)) 5 (Node Null 10 Null)
> treeInsert smallTree 20
Node (Node Null 1 Null) 5 (Node Null 10 (Node Null 20 Null))
> treeInsert smallTree 1
Node (Node Null 1 Null) 5 (Node Null 10 Null)
Submission required: Exercise 7 treeInsert
Exercise 8
flattenTreeOrd
that flattens a binary tree, but preserves the ordering.
(That is, when a binary search tree is flattened, the resulting list should be sorted.)
flattenTreeOrd :: BinaryTree a -> [a]
> flattenTreeOrd smallTree
[1,5,10]
> flattenTreeOrd notBSTree
[1,3,2,5,6,7,9]
Submission required: Exercise 8 flattenTreeOrd
Extensions (Optional)
Extension 1
Write a function
treeDelete :: (Ord a) => (BSTree a) -> a -> (BSTree a)
that takes a binary search tree, and an element, and removes it from the tree (if it is present).
Note that this is much harder than insertion. Take the example tree above.
Deleting 5 is obvious, as we just trim off that sub-tree and replace it will
Null
. But what should we do if I wanted to delete 4, or even worse, delete
the root note, 3?
The entire tree falls into two pieces, and must be restructured into a new
tree that contains the remaining elements, while maintaining the ordering
property. I recommend drawing out a few examples by hand to get your head
around it first.
Submission required: Extension 1 treeDelete
Extension 2 (Tricky)
Depending on how much you care about efficiency, this problem can be very difficult to implement efficiently, and may require some external research. Many different kinds of trees (Red-Black, AVL, etc.) are specifically defined to make certain operations like balancing easier to do.
Write a function
treeBalance :: (Ord a) => BSTree a -> BSTree a
that takes a binary search tree of integers, and rearranges the structure of the tree so it is now balanced. You may have to do some research as to how to implement this. (Really Tricky: Do it without rebuilding the entire tree from scratch.)
Submission optional: Extension 2 treeBalance
Extension 3
Nat
acts like a number, so we’d also like it to be a member of the type class
Num
.
For those that are mathematically inclined, the operators (+)
and (*)
can
actually be far more general than just addition and multiplication. The general
term for any structure containing a set of values, a definition for \(+\) and
for \(\times\) is called an algebraic ring, which you may see if you go on
to study formal mathematics. In general, (+)
and (*)
can be any operator
that satisfies a set of “nice proprieties”, such as the law of
distributivity: x * (y + z) == (x * y) + (x * z)
. The full set of laws that
(+)
and (*)
should satisfy is available
here.
As an example, if we choose (+)
to be logical exclusive or, and (*)
to be
logical and, then we can define Bool
as an instance of Num
, and all the nice
properties hold.
(+),(-),(*),abs,signum,fromInteger
for the Nat
type.
Note that for an instance a
of Num
:
abs :: a -> a
is the absolute value functionsignum :: a -> a
returns the sign of a number, \(+1\) if the number is positive, \(0\) if the number is zero, and \(-1\) if it is negative.fromInteger :: Integer -> a
takes anInteger
, and converts it to the new number typea
.
You can get away without a definition for negate
, as once (-)
is defined, Haskell can work out negate
for free.
(Given that natural numbers cannot be negative anyway, the negate
function
is not very useful to us.)
Note that some of the functions required to make Nat
an instance of Num
may be sometimes undefined. There is no “minus one”
in Nat
, after all. Try to ensure as many functions are defined
as possible, for as many inputs as possible.
You should throw an error for attempting to perform any operation
undefined in the natural numbers, like trying to compute 2 - 3
.
> (S Z) + (S (S Z))
(S (S (S Z)))
> (S Z) - (S Z)
Z
> (S (S Z)) * (S (S (S Z)))
S (S (S (S (S (S Z)))))
> (fromInteger 2) :: Nat
S (S Z)
> (S Z) - (S (S Z))
*** Exception: Can't have negative natural numbers
Submission optional: Extension 3 Nat
as an instance
of Num
Extension 4
We’d like to be able to define equality on binary trees based on the elements in
the tree. If we just used deriving Eq
, then two trees would be identical if
they have precisely the same elements and structure.
For the purposes of this question, we will define equality on binary trees to be the following weaker definition: Two trees are defined to be equal if they share the same elements, with the same ordering (although the structure need not be the same).
Since we want this definition to work on any tree with elements who have equality defined on them, we use the following notation.
instance (Eq a) => Eq (BinaryTree a) where
-- (==) ...
which means that we are defining what ==
means for anything
of type BinaryTree a
, where a
must be a member of the type class Eq
.
> goodTree == unbalancedTree
True
Try come up with some other test cases to verify this function works. You may wish to draw them out by hand first.
Submission optional: Extension 4 instance (Eq a) => Eq (BinaryTree a)
Appendix: Bisection Search
Imagine you were trying to find a name in the telephone book, and it was stored as a list. You’d have to go through the names, one by one, till you found the one you were looking for. Not very efficient. In practise when we are trying to find a name in the telephone book, we go to the middle of the book, and see if the name we are looking for comes before or after the letter M, which is in the middle of the alphabet. We then take the section that’s left over, and divide it in half too. By using this method of bisection search, so called because we are halving, or bisecting, the search space in half each time, we can find a single name in a book of millions, in only a minute or so. Not bad. As we will see, binary search trees share some of these properties, which makes them very useful for quick retrieval of data.
Take for example the binary search tree below, included as goodTree
. (The
dashed circles represent nodes that we haven’t explored yet.)
If we were searching to see if the number 16 was present or not, we would look at the root node, and see if it was 16 (which it isn’t). 16 is bigger than 15, so if 16 was in the tree, it must be to the right, by the ordering constraint. So we can immediately discard the entire left subtree, and only check the right.
Is 23 the element we are looking for? No, it’s too big, so that means if 16 exists, it has to be to the left.
17 is still bigger, so we keep looking left.
There it is! And look at how much of the tree we never needed to look at! Had we been looking for 26, we could abandon the search even earlier, as we would not need to go down to the deepest layer. Had these numbers been in a list instead, we would have to manually check every single element until we found 16.
This is an ideal case however, as our tree was perfectly balanced, that is,
this tree cannot be restructured in any other way without increasing the depth.
We could have the same elements arranged in the following tree (included as
unbalancedTree
):
This tree is still a valid binary search tree, all elements to the left are smaller, all elements to the right are bigger. It’s just not a very useful tree, as we’d have to go through half of the elements in the tree to find 31, much more than the balanced tree. However, 1 is much easier to find, only after a single comparison.
Might there be circumstances when we would we want an unbalanced tree in this fashion? Think about this and discuss with a neighbour.
In the absolute worst case, all the elements are listed one by one in the right
subtrees, and all left subtrees are always Null
. We call such a tree a
degenerate tree, as it’s essentially the same as a list.
Usually, we want our trees to be balanced, to minimise the access time of any element.
For a balanced tree with \(n\) elements, on average, how many lookups would you have to perform to check if a particular element is present? What if the tree was degenerate? We will revisit these questions in more depth in Lab 11.