Lab 9 - Time complexity. More practice on debugging and programming.

Objectives#

The purpose of this week’s lab is to:

  • Analyse time complexity of certain codes.
  • Practice debugging code again.
  • Practice more programming problems.

If you have any questions about any of the material in the previous labs, now is a good time to ask your tutor for help during the lab.

Note: If you do not have time to finish all exercises (in particular, the programming problems) during the lab time, you should continue working on them later.

Time complexity#

As we have seen in the lecture last week, the big-O notation is used to denote the worst-case time complexity of an algorithm to solve a certain problem. If the problem size is \(n\) and the algorithm needs \(c \times n\) numerical operations to solve it, where \(c\) is a constant number (thus independent of \(n\)), we say that the algorithm has a linear time complexity \(O(n)\). Here, we assume that each operation has a constant time complexity \(O(1)\). (Otherwise, the algorithm won’t have a linear complexity anymore.)

For example, the following function finds the index of an element in a sequence, (returning -1 if the element is not found):

def find_element(sequence, element):
    for i in range(len(sequence)):
        if sequence[i] == element:
            return i
    return -1        

It has time complexity of \(O(n)\), where \(n\) is the length of the sequence. This is because the main for loop needs \(n\) comparisons in the worst-case, namely when the element we are looking for does not appear in the sequence.

As a rule of thumb, a for or while loop with \(n\) iterations indicates that the code may have a complexity of \(O(n)\). If there are two loops nested within each other, the code might have a quadratic time complexity of \(O(n^2)\).

Let’s look at an example: the following code computes the minimum absolute difference between any pairs of elements in a numerical sequence:

def smallest_difference(sequence):
    min_diff = abs(sequence[0]-sequence[1])
    for i in range(len(sequence)-1):
        for j in range(i+1, len(sequence)):
            diff = abs(sequence[i] - sequence[j])
            if min_diff > diff:
                min_diff = diff
    return min_diff

Let’s analyse the time complexity of this code. Let \(n\) be the length of the input sequence. The outer for loop has \(n-1\) iterations using index \(i\). In the first iteration of the outer loop, i.e., when \(i=0\), the inner loop iterates in range(1,n), each doing four operations (subtraction, abs, comparison, and assignment), totalling \(4(n-1)\) operations. In the 2nd iteration of the outer loop (\(i=1\)), the inner loop iterates over range(2,n), which means \(n-2\) iterations. And so on. Summing all together, the total number of operations is:

\[4(n-1) + 4(n-2) + \ldots + 4 = 4 n \frac{n-1}{2} = 2n^2 - 2n\]

Now, when \(n\) is large enough, the \(n^2\) term will dominate over the \(n\) term. We can also ignore the constant \(2\) multiplying this leading term. That means that the function smallest_difference has time complexity \(O(n^2)\).

Exercise 1.0#

Each of the following three functions takes as input an integer n. For each function, give its computational complexity in big-O notation in terms of n.

def func_a(n):
    total = 0
    for i in range(n):
        for j in range(n):
            total = total + i * j
    return total

def fun_b(n):
    total = 0
    for i in range(100 * n):
        for j in range(10):
            total = total + i * j
    return total

def fun_c(n):
    total = 0
    for i in range(100):
        for j in range(n):
            total = total + i * j
        for k in range(n):
            total = total + i * k
        for l in range(100):
            total = total + i * l
    return l

Exercise 1.1#

It turns out that there is another algorithm for smallest_difference with a time complexity of only \(O(n \log(n))\). Can you figure that out? Write another function with that time complexity.

HINT: What if the sequence is sorted? What is the time for sorting?

Exercise 1.2#

Here is an implementation to compute the maximum absolute difference in a similar merit to smallest_difference:

def largest_difference(sequence):
    max_diff = abs(sequence[0]-sequence[1])
    for i in range(len(sequence)-1):
        for j in range(i+1, len(sequence)):
            diff = abs(sequence[i] - sequence[j])
            if max_diff < diff:
                max_diff = diff
    return max_diff

It has a time complexity of \(O(n^2)\). You can of course follow the same idea from exercise 1.1 to reduce the complexity to \(O(n \log(n))\).

However, there is another algorithm with a linear time complexity of only \(O(n)\). Can you figure that out? Write another function with that time complexity.

Exercise 1.3#

We now want to plot the real running times of these codes in a computer. Run the following code to define a plot_time function:

import random
import time
import matplotlib.pyplot as plt
def plot_time(func, steps):
    seq = [ random.random() for i in range(steps) ]
    elapsed_times = []
    for n in range(2,steps):
        start_time = time.time()
        # run 10 times to reduce fluctuation
        for i in range(10):
            func(seq[:n])
        end_time = time.time()
        elapsed_times.append((end_time - start_time)/10)
    plt.plot(range(2,steps),elapsed_times)
    plt.show()
    print("Total runtime: " + str(sum(elapsed_times)) + " seconds")

Then you can run plot_time(smallest_difference, 300) to plot the running time (in seconds) of the smallest_difference function for sequences of up to 300 randomly-generated floats.

If you have a faster implementation, say, for example smallest_difference2, you can run plot_time(smallest_difference2, 300) and compare these two plots to see how different the runtimes are. Do your observations match the time complexity analysis of the algorithms?

Debugging problems#

Learning to read, understand and debug code is a very important skill, so here are some debugging exercises to practice this skill.

Exercise 2.1#

The following are attempts to define a function that takes three (numeric) arguments and checks if any one of them is equal to the sum of the other two. For example, any_one_is_sum(1, 3, 2) should return True (because 3 == 1 + 2), while any_one_is_sum(0, 1, 2) should return False.

(a) All of the functions below are incorrect. For each of them, find examples of arguments that cause it to return the wrong answer.

Function 1#

def any_one_is_sum(a,b,c):
    sum_c=a+b
    sum_b=a+c
    sum_a=b+c
    if sum_c == a+b:
        if sum_b == c+a:
            if sum_a == b+c:
                return True
    else:
        return False

Function 2#

def any_one_is_sum(a,b,c):
    if b + c == a:
        print(True)
    if c + b == a:
        print(True)
    else:
        print(False)
    return False

Function 3#

def any_one_is_sum(a, b, c):
    if a+b==c and a+c==b:
        return True
    else:
        return False

(b) For each of the three functions above, can you work out how they are intended to work? That is, what was the idea of the programmer who wrote them? What comments would be useful to add to explain the thinking? Is it possible to fix them by making only a small change to each function?

Exercise 2.2#

Here is a function that is meant to return the position (index) of a given element in a sequence; if the element does not appear in the sequence, it returns the length of the sequence. For example, find_element([3,2,1,4], 1) should return 2, since that is the index where we find a 1.

def find_element(sequence, element):
    i = 0
    while sequence[i] != element:
        if i < len(sequence):
            i = i + 1
        i = i + 1
    return i

However, the function is not correct. For some inputs it will cause a runtime error. Find an example of arguments that cause an error to occur. Can you correct the error without introducing another?

Exercise 2.3#

Here is a function that is meant to count the number of repetitions of a substring in a string. For example, if the string is 'abcdabcf' and the substring is 'ab', the count should be 2. (Of course, for any of the substrings 'a', 'b', 'c', 'ab', 'bc' or 'abc', the count should be 2.)

def count_repetitions(string, substring):
    '''
    Count the number of repetitions of substring in string. Both
    arguments must be strings.
    '''
    n_rep = 0
    # p is the next position in the string where the substring starts
    p = string.find(substring)
    # str.find returns -1 if the substring is not found
    while p >= 0:
        n_rep = n_rep + 1
        # find next position where the substring starts
        p = string[p + 1:len(string) - p].find(substring)
    return n_rep

However, the function is not correct. For some inputs it will return the wrong answer, and for some inputs it will get stuck in an infinite loop. Find examples of arguments that cause these errors to happen.

Among the many string methods, there is one that does what this function is meant to do: 'abcdabcf'.count('abc') will return 2. Can you fix the function above without using the string count method, i.e., keeping the idea of the original function and only correcting the error?

Exercise 2.4#

Here is another string function. This function is meant to remove all occurrences of a substring from a string, and return the result. For example, if the string is 'abcdabcf' and the substring is 'bc', we want the function to return 'adaf'.

def remove_substring_everywhere(string, substring):
    '''
    Remove all occurrences of substring from string, and return
    the resulting string. Both arguments must be strings.
    '''
    p = string.find(substring)
    lsub = len(substring) # length of the substring
    while p >= 0:
        string[p : len(string) - lsub] = string[p + lsub : len(string)]
        p = string.find(substring)
    return string

This function is also not correct, and should cause a runtime error on almost any input. Like in the previous problem, try to make the function do what it is supposed to while keeping the idea of the original function Can you also find a string method that does (or can be made to do) what this function is intended to do?

Programming problems#

Note: We don’t expect everyone to finish all these problems during the lab time. If you do not have time to finish these programming problems in the lab, you should continue working on them later (at home, in the CSIT labs after teaching hours, or on one of the computers available in the university libraries or other teaching spaces).

Reading image files#

Portable pixmap, or ppm, is a simple, non-compressed image format. A ppm file can be stored in text or binary form; let’s start with reading it in text form.

A text-form ppm file starts with the magic string P3. That means the first two characters in the file are always P3. This identifies the file format. After this comes three positive integers (each written out with the digits 0-9): the first two are the width and height of the image, and the third is the maximum colour value, which is less than or equal to 255. Then follows width times height times 3 integers (again, each written with digits 0-9); these represent the red, green and blue value for each pixel. The pixels are written left-to-right by row from the top, meaning the first triple of numbers is the colours of left-most pixel in the top row, then the second from the left in the top row, and so on.

Here is a small example:

P3
3 2
255
255   0   0
0   255   0
0     0 255
255 255   0
255 255 255
0     0   0

This image has width 3 and height 2, which means it has 6 pixels. All the numbers in the file are separated with whitespace, which can be one or more spaces (' ') or a newlines ('\n'). The format does not require that all pixels in a row are on one line. In the example above, they are written one pixel per line, but the following would also be a correct representation of the same image:

P3
3 2
255
255   0   0	0   255   0	0     0 255
255 255   0	255 255 255	0     0   0

To display the image after reading it, you can use the imshow function from the matplotlib.pyplot module. You will need to create a 3-dimensional array, where the sizes of the dimensions are the width, the height, and 3. The array entries at i,j,0, i,j,1 and i,j,2 are the red, green and blue colour values for the pixel at row i column j in the image. For example, to show the image above, you can do the following:

import numpy as np
import matplotlib.pyplot as plt

image = np.zeros((2,3,3))     # create the image array (fill with 0s)
image[0,0,:] = 1.0, 0.0, 0.0  # RGB for top-left (0,0) pixel
image[0,1,:] = 0.0, 1.0, 0.0  # RGB for top-middle (1,0) pixel
image[0,2,:] = 0.0, 0.0, 1.0  # RGB for top-right (2,0) pixel
image[1,0,:] = 1.0, 1.0, 0.0  # RGB for row 2 left (0,1) pixel
image[1,1,:] = 1.0, 1.0, 1.0  # RGB for row 2 middle (1,1) pixel
image[1,2,:] = 0.0, 0.0, 0.0  # RGB for row 2 right (2,1) pixel
plt.imshow(image, interpolation='none')
plt.show()

Note that each of the colour values above (1.0 or 0.0) is the result of taking the corresponding colour value from the image file (255 or 0) and dividing it by the maximum colour value (255). The extra argument interpolation='none' to imshow disables image interpolation, which it may otherwise do if the image has low resolution.

Write a function that displays the image read from a file. You need to create an array of the right size (as read from the image file) and replace the part that fills in the values above with some kind of loop that fills in the values read from the file.

Here are two image files (of the kind that the internet is most famous for) that you can test your program on: cat_picture_1.ppm, cat_picture_2.ppm.

Advanced: As mentioned above, the ppm format also has a binary form. It is very similar the text format, except that each colour value is stored as a single byte, rather than written out as text. The magic string for this format is P6. The width, height and max colour value are still written out with digits, and there should be a newline before the start of the binary image data.

Modify your program so that it can read images in both text and binary format. (To decide which format a given file is, you will need to open it and read the first two characters.) Note that to read the binary format correctly, you will have to open the file in binary mode. You can also use function file filein.read(3) to read three consecutive bytes for the pixel colours.

Here are the two image files above encoded in the binary form: cat_picture_1_binary.ppm, cat_picture_2_binary.ppm.

bars search times arrow-up