This lab covers two aspects of code quality: style, which is the way to write readable code; and verifying correctness via testing.

Table of Contents

Pre-lab Checklist

  • You should have watched the lectures from week 06.
  • You should be able to write doctests, as introduced in Lab03.
  • You should have completed all lab tasks till at least Lab05.

Learning Outcomes

  • Develop techniques to write Haskell code with consistent style
  • Develop understanding behind black box and white box testing and be able to write tests appropriately

Getting Started

  1. Go to the project’s dashboard under the Project tab and click on the Fork button. The gitlab url for this lab is https://gitlab.cecs.anu.edu.au/comp1100/2024s2/2024s2studentfiles/comp1100-2024s2-lab07

  2. You will be asked where to fork the repository. Click on the user or group to where you’d like to add the forked project. You should see your own name here. Once you have successfully forked a project, you will be redirected to a new webpage where you will notice your name before > project-name. This means you are now the owner of this repository. The url in your browser should reflect the change: https://gitlab.cecs.anu.edu.au/uXXXXXXX/comp1100-2024s2-lab07

  3. Finally, clone your forked repository to your computer according to the instructions in the Week 2 Lab.

In the following sections, the <TEXT> symbol means that you should replace the symbol with appropriate text, such as a module name, function name etc.

Haskell Style

This section will lead you through developing a program with good Haskell style. Assignment 2 will contain marks explicitly assigned to coding style, so it is worth your time to learn to write readable code according to these guidelines. This section has been heavily influenced by some online Haskell style guides; please see references at the end of the lab.

In this section, let us assume we have an online library database that stores videos: movies and shows along with their availability.

Module

A module is the name given by Haskell to a .hs file containing a collection of related types and functions.

Naming

Module names should

  • start with a capital letter
  • use CamelCase
  • be singular, for example Tree instead of Trees

Documentation

The first item in your module should be the module level documentation, using the following format:

{-|
Module      : <module name>
Description : <short general purpose of module>
Maintainer  : <your email>

<Add a longer description of the module here if appropriate.>

-}
module <module name> where
...

Exercise 1

Create a module (in VSCodium, create a new file) named Library.hs. (Does this obey all the naming rules? If the name does not obey the rules, modify it.)

You can create a new file in VSCodium by following these instructions:

  1. Right click in the left “Explorer” bar

  2. Select “New File” New File

  3. Give the file an appropriate name New File

Submission required: Exercise 1


Exercise 2

Add the module level documentation, following the rules and template provided above.

Submission required: Exercise 2


Type Declarations

Naming

Type names should

  • start with a capital letter
  • use CamelCase

Documentation

Constructors are documented like this, with comments for each constructor:

-- | This is the documentation for the datatype.
data T
  = C1 a b  -- ^ This is the documentation for the 'C1' constructor
  | C2 a b  -- ^ This is the documentation for the 'C2' constructor
  | ...

Exercise 3

Copy the code below to give type aliases for Name, Year and Availability:

type Name = String
type Year = Int
type Availability = Bool

Create a datatype for a video named Video with two constructors Movie and Show. Each constructor should have two fields Name and Year. Follow the style guideline above and write documentation for the datatype and the constructor fields.

Once you have defined Video, we can define a Library as:

type Library = [(Video, Availability)]

Submission required: Exercise 3


List Declarations

If a list is too long to be written down without exceeding 80 characters of width, you will have to use several lines. Align these lines with consistent indentation. As an example, we can list the availability of a number of videos:

exampleLibrary :: Library
exampleLibrary =
    [ (Show "Westworld" 2017, False)
    , (Movie "Harry Potter and the Prisoner of Azkaban" 2004, False)
    , (Show "Game of Thrones" 2011, True)
    , (Movie "Thor: Ragnarok" 2017, False)
    , (Movie "Avengers: Endgame" 2019, False)
    , (Show "Attack on Titan" 2009, True)
    , (Show "Stranger Things" 2016, False)
    , (Movie "Star Wars: The Force Awakens" 2015, True)
    , (Show "The Walking Dead" 2010, True)
    , (Movie "Deadpool" 2016, True)
    ]

You may find it useful to copy this list of videos in the current module.

Functions

Naming

Function names should

  • start with lower case
  • use camelCase e.g. sumList, sumUpTo
  • be descriptive to reflect the intended use of the function

Type Declaration

  • Every top-level function must have an explicit type declaration.

    For example, write

    lengthList :: [a] -> Int
    lengthList list = case list of
        []   -> 0
        _:xs -> 1 + lengthList xs
    

    instead of just

    lengthList list = case list of
        []   -> 0
        _:xs -> 1 + lengthList xs
    

    Not only does it make your code easier to read, it also means that Haskell will return nicer error messages if you try to input the wrong thing, as it will know what type you intended the function to have.

Documentation

  • Most top-level declarations (data types and functions) should be preceded by a short comment, written with Haddock syntax.

    -- | The 'find' function checks if a given element is inside
    -- a list.
    find :: Integer -> [Integer] -> Bool
    find elem list = case list of
        [] -> False
        x:xs -> x == elem || find elem xs
    
  • Comments should say what the code does, not how it does it. It should be obvious how the code works, just by looking at it. If this is not the case, you need to rewrite the code.

  • Do not over-comment.

    Very many or very long comments (especially within the body of a function) are more distracting than helpful. Long comments may appear at the top of a file or section of code, if you need to explain the overall design of the code or refer to sources that have more information about the algorithms or data structures. All other comments in the file should be as short as possible. Judicious choice of variable names can help minimise the need for comments.

  • Avoid comments that state the obvious:

    -- | This function increments its argument
    inc :: Integer -> Integer
    inc int =  int + 1
    
  • Use proper English.

    Comments need not always be written in complete sentences, but when they are, standard rules of English grammar apply. Spelling also counts.


Exercise 4

Using the guidelines outlined above, define a function, using an appropriate name, that uses the name of a video to check whether a movie or a show is available in a given library.

What would be the type signature of this function? Once you figure it out, check with your classmates and/or your tutor to see what they think.

Hint: You may find it useful to define a helper function getName that takes a Video as input and returns the name.

Submission required: Exercise 4


Formatting

Line Length

Maximum line length is 80 characters. If your code is longer than 80 characters, you should

  • think about helper functions to do part of task
  • otherwise, wrap your code syntactically

Indentation

  • Do not use tabs. Use spaces for indenting.
  • Indent your code blocks with at least two spaces, and be consistent throughout your code.

Exercise 5

Using all the guidelines outlined above, define a function, using an appropriate name, that takes a library and returns a list of available videos. The function should return an empty list if none of the videos in the input library is available or if the library is empty.

What should be the type signature of this function? Discuss this with your classmates and/or tutor.

Submission required: Exercise 5


Warnings

Code you write should always be compiled with -Wall. There should be no warnings, if there are, make sure to fix your code to get rid of them.

Testing in Haskell

The aim of this section is to guide you through the formulation of unit tests which will help you to establish the correctness of functions that you write.

In the provided Haskell module BoxTesting.hs, we will be writing a function that finds the second smallest value in a list of Int, if there is one.

The type signature of the function will be:

secondMinimum :: [Int] -> Maybe Int

DO NOT START WRITING CODE YET!!!

As discussed in Code Quality lecture, we will use two testing techniques:

  • Black box testing: testing without looking at the code, and usually designed before the code is written;
  • White box testing: testing inspired by the code that is written.

Exercise 6: Black Box Testing

Our input is a list of integers, so we will want to choose lists of integers that help us to test various behaviours of our future program.

  • On an empty list we should return Nothing.
  • Are there any other cases that should return Nothing? Could you separate them into different testing groups? What examples from each group would you choose to test?
  • This code is supposed to return the second smallest value; so on the list [1,2,1] it should return 2. Is this a good example to test? Why or why not?
  • Are there any other testing groups suggested by this problem, before you write a line of code? What examples from these groups could you test?

After you have thought of as many instances of black box tests as you think you need, write these tests down as doctests. Please re-visit Lab 3 if you are unsure how to do this.

Submission required: Exercise 6: write doctests for second minimum


Exercise 7

Write the function secondMinimum. Check that it passes all your black box tests.

Ensure you have followed the style guidelines discussed in the first section of the lab.

Submission required: Exercise 7


Exercise 8: White Box Testing

As you write the function, you may notice that you require more tests to be sure that your function is correct. For example, you may decide to use one or more helper functions; you can write tests specifically for these functions. These white box tests are motivated by your code, and because there are many different ways to write secondMinimum, your white box tests may be quite different from those of others in your lab.

In particular, once you have finished your code, you should see if you require any further tests motivated by the following code features:

  • If your function contains case statements or guards, you should include tests for each case or guard. If there are boundaries between your cases or guards where they meet or overlap, write tests especially for those boundaries;
  • If you use recursion, you should test the base case, the single element case and the general step case.

Write white box tests as needed (there is no need to repeat testing groups already covered by your black tests). Check that your program passes all tests. Use a comment to clearly label the separation between your black box and white box tests, so that your tutor can verify that you have completed both exercises.

Submission required: Exercise 8 write white box tests

Note: We chose to use doctest as the testing framework in this lab, since this is familiar to you from previous labs. You will notice that using a large number of doctests can make your source file look quite “bulky”. It is better style in such a case to use a testing framework involving a separate file, as in the assignments, but you are not required to do so for this lab.


Exercise 9

Add some tests to the functions isAvailable and availableVids in the Library.hs module from previous section if you have not done so already.

Submission required: Exercise 9

You are required to submit your attempts at the exercises above in order to receive full participation marks for this lab. You have until the start of your next lab to do this. You should submit by committing and pushing all your changes to your Gitlab repository.


Extension

[Optional]

Given the task to evaluate an expression like "3 + 4 * 5 ^ 6" when typed into GHCi as a String, how would you go about it? Of course one has to think about operator precedence, but leaving that aside, first think how you would evaluate an expression like "3 + 4 - 5 + 6" and maybe expressions with floating point numbers like "3.2 - 4.2 - 5.3 + 6.3".

Before we go too far ahead, we need to think about how to represent the expression effectively. Having an expression like that represented as a String while you are working on it will turn into a programming massacre very quickly. Hence, every evaluation system—compiler, interpreter, or even your GHCi—will always perform what is called lexical scanning or tokenisation, converting the program text you type into a more structured (typed) form first. For instance we could store expressions like the one above as a “list of tokens”:

type Expression = [Token]
data Token = Plus | Minus | Num Double
    deriving (Show, Eq)

So "3.2 - 4.2 - 5.3 + 6.3" translates into:

[Num 3.2, Minus, Num 4.2, Minus, Num 5.3, Plus, Num 6.3]

This translation is done by going through the original string once, from left to right, and converting it into a list of tokens. This function is often called a scanner (since it scans from beginning to end), or a lexer (because it recognises the lexical units or lexemes of some language) or a tokeniser (as lexemes are often called tokens). It is an essential step for lexical analysis as the first phase of translating programs written in a given programming language. If you have the pleasure of implementing a programming language in a few years (perhaps even one you design yourself), then you will certainly learn a lot more about this.

For now, we have done part of the hard work for you with the following functions present in Expression.hs:

tokenise :: String -> Expression
showExpression :: Expression -> String
evalStringExpression :: String -> String

Extension 10

Your task is to write a function eval that will evaluate such expressions, as in:

>>> eval [Num 3.2, Minus, Num 4.2, Minus, Num 5.3, Plus, Num 6.3]
[Num 0.0]

so that you can evaluate strings which correspond to arithmetical expressions:

>>> evalStringExpression "3.2 - 4.2 - 5.3 + 6.3"
"0.0"

In this simplified version of this problem, we will work only with non-negative numbers, so that we do not have to deal with the unary minus operator, as in expressions such as 3.2 + (-4.2) - 5.3 where (-) is used as both a unary and a binary operator.

The idea will be to break the problem into handy parts. For instance, if a token list starts with: [Num x, op, Num y, opNext, …] we could simplify the job by evaluating the first bit ([Num x, op, Num y]) so that it will become a single number, and then re-evaluating the now smaller list [Num (x op y), opNext, …]. If we keep doing this, will this eventually result in a single number? When might it not work?

Start implementing your eval function and see how far you get. We recommend the following steps:

  1. Determine the type signature of the function
  2. Figure out the shortest lists the function should be able to handle (“base case(s)”)
  3. Implement Plus and Minus

Plus and Minus

If you have a function signature (or you want to see what Haskell thinks of your function first), then start with the following pattern:

Num x : op : Num y : nextOp : remainingExp

Analyse carefully each element in this pattern. What does it mean? Which variables correspond to what values? What are the types of the individual elements? What length of lists can this pattern match?

Before you lose sleep over what to return, the return value corresponding to the above pattern is as follows:

eval ((eval [Num x, op, Num y]) ++ nextOp : remainingExp)

We are breaking down the evaluation of the whole list into the two problems:

  1. The first operator
  2. The smaller list

You will need to handle the shorter case that we just used in the result, of course, which is [Num x, op, Num y]. Make sure you understand what this pattern could match, too, and use a second case expression inside the main one to solve it for each op.

Test your eval function on something simple with Plus and Minus in it, without any negative numbers:

eval [Num 3.0, Minus, Num 1.0, Minus, Num 1.0, Minus, Num 1.0]

Make sure you follow the style and testing guidelines provided early in the lab throughout the design of eval and any helper functions.


The rest of these exercises are optional for all students.

Negative Numbers

With the above exercise complete, we can now work on negative numbers. The tokeniser does not link minus signs to numbers, since we only have the single token Minus for both negation and subtraction. Consider the following expression:

[Num 3.1, Plus, Minus, Num 4.2, Minus, Num 5.2, Plus, Num 6.3]

You should write a pattern to match this situation, and the beginning of the list should translate into:

[Num 3.1, Plus, Num (-4.2), ...]

Test your implementation a few times to make sure it works.

More operations

Implement operations with different operator precedence, such as Times, DividedBy and Power. You will need to extend your Token** data type.

data Token = Plus | Minus | Times | DividedBy | Power | Num Double
    deriving (Show, Eq)

You will also need to extend the functions tokenise and showExpression to cater for these extra tokens as well.

Order of Operations

Now for the final step: operators with different binding power, or precedence. Times, DividedBy are of higher precedence (binding power) than Plus and Minus, and Power has the highest precedence of all.

Consider the expression that we wrote earlier:

eval ((eval [Num x, op, Num y]) ++ nextOp : remainingExp)

This is only correct if the precedence of op is greater than or equal to that of nextOp. Otherwise, we want to evaluate nextOp first, with the following pattern:

eval (Num x: op: eval (Num y: nextOp: remainingExp))

You need two things for correct evaluation:

  1. A function that gives you the binding power or precedence of an operation (call it bindingPower).
  2. A guard to pick the right evaluation depending on the binding power of the two operations (bindingPower op >= bindingPower nextOp)

If you are confused how to add a guard to a case expression, there is an example in the code for tokenise. Make sure that bindingPower returns larger values for Power than Times and DividedBy, and that those are in turn larger than Plus and Minus.

After you add the guards to the right place in your eval function, your function will be complete! You should now be able to evaluate any valid expression based on the given tokens. Check this carefully by testing against meaningful examples that you have developed through black box and white box testing.


References

  1. Haddock: A Haskell Documentation Tool https://haskell-haddock.readthedocs.io/en/latest/index.html
  2. Haskell Style Guide
  3. Haskell Style Guide by Johan Tibbel https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md
bars search times