Lab 4

Note: Some of the content of this lab depends on material that we will only cover on Monday’s lecture. Please make it a priority to attend or listen to the lecture prior to attempting the lab questions.

Note 2: There may be more exercises in this lab than you can finish during the lab time. If you do not have time to finish all exercises (in particular, the programming problems), you can continue working on them later. You can do this at home (if you have a computer with python set up), in the CSIT lab rooms outside teaching hours, or on one of the computers available in the university libraries or other teaching spaces.

Objectives#

The purpose of this week’s lab is to:

  • understand the indexing of (1-dimensional) sequence types, such as lists and strings;
  • write functions using loops (for and while) to solve problems that require iterating through sequences;
  • investigate some of the functions available for doing things with different sequence types.

The string type#

We have already seen a number of times that all values in python have a type, such as int, float, str, etc. To determine the type of a value we can use the function type(_some expression_). The type of a value (“object”) determines what we can do with it: For example, we can do arithmetic on numbers, and we can ask for the length of a string (using the len function), but we cannot ask for the length of a number, or do much arithmetic with strings. Some operations can be done on several value types, but have different effects for different types. For example, using the + operator on two strings concatenates them: thus, "1.234" + "1.234" evaluates to "1.2341.234", whereas adding two floating-point numbers 1.234 + 1.234 evaluates to 2.468.

Exercise 0#

You should begin the lab by working in a python shell

Run the following in the python shell:

In [1]: string1 = "1.234"
In [2]: print(string1)
...

In [3]: type(string1)
Out [4]: ...

In [5]: string2 = string1 + string1

In [6]: print(string2)
...

In [7]: len(string1)
Out [7]: ...

In [8]: len(string2)
Out [8]: ...

In [9]: float1 = float(string1)

In [10]: type(float1)
Out [10]: ...

In [11]: float1 + float1
Out [11]: ...

Writing string literals#

Recall that a literal is an expression that represents a constant value - like a 1 represents the integer one, or "abc" represents the three-letter string abc. String literals in python are written by enclosing them in either single, double or triple quotation marks:

In [1]: str1 = 'a few words'

In [2]: str2 = "a few words"

In [3]: str3 = '''a few words'''

In [4]: str1 == str2
Out [4]: ...

In [5]: str2 == str3
Out [5]: ...

In [6]: type(str1)
Out [6]: ...

(Note that the triple quote is written by typing three single quotes.) There is no difference between single- and double-quoted strings. The triple quote has the special ability that it can stretch over several lines:

In [1]: many_line_string = '''This is line 1.
This is the second line.
And this is line three.'''

In [2]: many_line_string
Out [2]: ...

In [3]: print(many_line_string)
...

Note the difference between how many_line_string is displayed when you evaluate the expression and when you print it. What is the '\n' character?

Remember that a string enclosed in one type of quotation marks can not contain the same type of quotation marks (unless they are escaped - see the lecture slides on strings), but can contain the other types. (If you happen to like video tutorials better than lecture slides, here are two elementary ones on strings: http://www.youtube.com/watch?v=jECazLigEH8 and http://www.youtube.com/watch?v=yjRkEX4OTyc. The second one also looks at some string methods, which show up a little later in this lab.)

Exercise 1#

Write string literals for each of the following sentences:

  • The possessive form of ‘it’ is ‘its’
    - “it’s” is an abbreviation of “it is”.
  • A ‘’’ is three ‘ but two ‘ do not make a “.

Note that the first sentence should be on two lines.

Use the print function to verify that your strings are displayed correctly. Try writing them both with and without using triple quotes.

Character encoding#

A string is a sequence (ordered collection) of characters.

Characters (like every other type of information stored in a computer) are represented by numbers. Interpreting a number as a character requires an encoding; python 3 uses the unicode standard for encoding characters in strings.

Python provides the ord and chr functions for translating between numbers and the characters they represent: ord(a_character) returns the corresponding character code (as an int) while chr(an_integer) returns the character that the integer represents.

Try the following:

In [1]: ord('a')
Out [1]: ...

In [2]: ord('A')
Out [2]: ...

In [3]: chr(91)
Out [3]: ...

In [4]: chr(92)
Out [4]: ...

In [5]: chr(93)
Out [5]: ...

In [6]: chr(21475)+chr(20986)
Out [6]: ...

In [7]: chr(5798) + chr(5794) + chr(5809) + chr(5835) + chr(5840) + chr(5830) + chr(5825) + chr(5823)
Out [7]: ...

Remember that characters outside the ASCII range (unicode numbers above 255) may not display properly, if the computer you are using does not have a font for showing those characters. Also, many of the characters below number 32 are so called “non-printable” control characters, which may not be displayed.

Introduction to Lists#

Like strings, lists are another sequence type that exists in python. However, where each of the elements in a string has to be a character, the elements of a list can be of any type, str, int, bool, etc., even other lists.

For today’s lab we will largely only consider lists where every element is a number, and we will return to lists again after the break to examine them in more detail.

There are several ways to create lists. For each of the following, write a small piece of python code to create that list. Try finding at least two different ways of making each of the lists.

  • Create a list containing 100 elements that all have the same value, e.g. all are 1. Try to find three different approaches. Discuss your three alternatives with your neighbour - did you find the same three, or different ones?

  • Create a list of 100 integers whose value is the same as their index (position in the list) plus 1, that is, mylist[0] == 1, mylist[1] == 2, mylist[2] == 3, etc.

Just like strings, the list data type has several methods that can be helpful in solving these problems. For example, read help(list.append), help(list.insert) and help(list.extend) to see the documentation of some of the methods that modify lists.

List comprehension (optional)#

Python has a mechanism for writing compact expressions that create lists, called list comprehension. The general form of a comprehension is

[ element_exp for varname in iterable_exp ]

iterable_exp should be an expression that evaluates to an iterable type (for example, a sequence), and the comprehension creates a list whose elements are the result of evaluating element_exp for each element in the iterable value. The variable varname is assigned each value from the iterable in turn, and can be used in the element_exp. For example,

[ 2 ** k for k in [0, 1, 2, 3, 4] ]

will create the list with elements 2 ** 0, 2 ** 1, etc, up to 2 ** 4, i.e, the list [ 1, 2, 4, 8, 16 ] and

[ ord(char) for char in "Hello!" ]

will create a list whose elements are the numbers corresponding to the encoding of the characters in the string "Hello!".

You can find out more about them in Section “List comprehensions” of Chapter 19 in Downey’s book, or Section 7.10 in Punch & Enbody’s book.

Try writing expressions to generate the lists in the exercise above using comprehensions.

Indexing sequences#

Every element in a sequence has an index (position). The first element is at index 0. The length of a sequence is the number of elements in the sequence. The index of the last element is the length minus one. The built-in function len returns the length of any sequence.

Indexing a sequence selects a single element from the sequence (for example, a character if the sequence is a string). Python also allows indexing sequences from the end, using negative indices. That is, -1 also refers to the last element in the sequence, and -len(seq) refers to the first.

Exercise 2(a)#

Run the following examples. For each expression, try to work out what the output will be before you evaluate the expression.

In [1]: my_string = "as if on"

In [2]: my_string[1]
Out [2]: ...

In [3]: my_string[4]
Out [3]: ...

In [4]: my_string[-1]
Out [4]: ...

In [5]: L = len(my_string)

In [6]: my_string[L - 1]
Out [6]: ...

In [7]: my_string[1 - L]
Out [7]: ...

Exercise 2(b)#

The statement

In [1]: my_list = [1, 2, 3, 4, 5, 6, 7, 8]

assigns the variable my_list a list of the integers 1,2,…,8. (Thus, the length of this list is the same as the length of the string used in the previous exercise.) Try the indexing expressions from the previous exercise on my_list instead of my_string. They should all work. Is the result of all expressions what you expect?

Iteration over sequences#

Python has two kinds of loop statements: the while loop, which repeatedly executes a suite as long as a condition is true, and the for loop, which executes a suite once for every element of a sequence. (To be precise, the for loop works not only on sequences but on any type that is iterable. All sequences are iterable, but later in the course we will see examples of types that are iterable but not sequences.)

Both kinds of loop can be used to iterate over a sequence. Which one is most appropriate to implement some function depends on what the function needs to do with the sequence. The for loop is simpler to use, but only allows you to look at one element at a time. The while loop is more complex to use (you must initialise and update an index variable, and specify the loop condition correctly) but allows you greater flexibility; for example, you can skip elements in the sequence (increment the index by more than one) or look at elements in more than one position in each iteration.

The syntax and execution of while and for loops is described in the lectures, and in the text books (Downey: Sections “The while Statement” in Chapter 7 and “Traversal with a for loop” in Chapter 8; Punch & Enbody: Sections 2.1.4, and 2.2.10 to 2.2.13).

Exercise 3(a)#

From here onwards you should switch from working in the shell, to working in files.

The following function takes one argument, a list, and counts the number of elements in the list that are negative. It is implemented using a while loop.

def count_negative(input_list):
    count = 0
    index = 0
    while index < len(input_list):
        if input_list[index] < 0:
            count = count + 1
        index = index + 1
    return count

Rewrite this function so that it uses a for loop instead.

To test your function, you can use the following inputs - note that while we use list comprehensions to quickly create lists to ‘test’ our functions, you are not required to be familiar with list comprehensions for the mid-semester exam:

  • [i for i in range(-2, 2)] (2 negative numbers)
  • [i for i in range(2, -3, -1)] (2 negative numbers)
  • [i for i in range(-3, 2)] (3 negative numbers)
  • [math.sin(2 * math.pi * i / 50) for i in range(50)] (25 negative numbers)
  • [0 for i in range(10)]] (0 negative numbers)

You can create more test cases by making variations of these, or using other list-creating functions.

Exercise 3(b)#

Write a function called is_increasing that takes an array (of numbers) and returns True if and only if the elements in the array are in (non-strict) increasing order. This means that every element is less than or equal to the next one after it. For example,

  • for [1, 5, 9] the function should return True
  • for [3, 3, 4] the function should return True
  • for [3, 4, 2] the function should return False

Is it best to use a for loop or a while loop for this problem?

Test your function with the examples above, and with the examples you used for exercise 3(a).

Also test your function on an empty list (that is, a list with no elements). An empty list can be created with the expression [] or list(). Does your function work? Does it work on a list with one element?

Try calling your function with a string argument (for example, "a B c D"). Does it work? Does it return a correct answer? (Remember that character comparison is based on the corresponding unicode number: a < B is the same as ord('a') < ord('B').)

Slicing#

Python’s built-in sequence types (which include type str) provide a mechanism, called slicing, to select parts of a sequence (that is, substrings if the sequence happens to be a string). It is done using the notation sequence[start:end]. There is also an extended form of slicing, which takes three arguments, written sequence[start:end:step].

(Punch & Enbody’s book has a detailed description of slicing, including its extended form, in Section 4.1.5 (page 183). Downey’s book discusses slicing in Section “String Slices” in Chapter 8; the extended form of slicing is only briefly mentioned in Exercise 8-3.)

Exercise 4(a)#

For these exercises 4(a), 4(b) and String Methods, switch back to writing your code in the shell.

To make sure you understand what the arguments in a slicing expression mean, run through the following examples. For each expression, try to work out what the output will be before you evaluate the expression.

In [1]: my_string = "Angry Public Swamp Methods"

In [2]: L = len(my_string)

In [3]: my_string[1:L]
Out [3]: ...

In [4]: my_string[0:L - 1]
Out [4]: ...

In [5]: my_string[0:L:2]
Out [5]: ...

In [6]: my_string[L:0:-1]
Out [6]: ...

In [7]: my_string[6:6+6]
Out [7]: ...

In [8]: my_string[11:11-6:-1]
Out [8]: ...

In [9]: my_string[2*L]
Out [9]: ...

In [10]: my_string[0:2*L]
Out [10]: ...

(“Angry Public Swamp Methods” - Why such a strange, non-sense string? It’s designed to make it easier for you to see what happens when you try different slicing expressions. Can you see what’s special about it? Hint: Remember from the previous exercise that ord('S') is not equal to ord('s') - ‘S’ and ‘s’ are different characters.)

Exercise 4(b)#

As in Exercise 2(b) above, use the statement

In [1]: my_list = [i for i in range(1, 27)]

to assign my_list an array of integers of the same length as the string used in the previous exercise, and try the slicing expressions on my_list instead of my_string. Is the result of all expressions what you expect?

String methods#

A “method” is the same thing as a function, but it uses a slightly different call syntax. A method is always called on a particular object (value). A method call,

object.method(...)

can be thought of as “on this object, perform that method”.

For now, we will just explore some of the rich set of methods that python provides for performing operations on built-in data types, such as strings. You can find the documentation of pythons string methods at http://docs.python.org/3/library/stdtypes.html#text-sequence-type-str, or by using the built-in help function in the python shell.

Try the following:

In [1]: sentence = "the COMP1730 lectures are boring, but the labs are great"

In [2]: sentence.capitalize()
Out [2]: ...

In [3]: sentence.swapcase()
Out [3]: ...

In [4]: sentence.count("e")
Out [4]: ...

In [5]: sentence.find("lectures")
Out [5]: ...

In [6]: sentence.find("exciting")
Out [6]: ...

In [7]: sentence.split(" ")
Out [7]: ...

In [8]: sentence.upper().find("LECTURES")
Out [8]: ...

In [9]: sentence.find("LECTURES").upper()
Out [9]: ...

In [10]: sentence.swapcase   # Note the lack of `(` and `)`
Out [10]: ...

After each method call, try print(sentence). Which of the methods modify the string? Why?

Exercise 5#

For the remainder of the lab, you should switch back to writing your code in files.

There are five string methods that modify case: capitalize, title, swapcase, upper and lower.

  • Look up each of these methods in the python on-line documentation or using the built-in help function. (Note: To find the documentation of a string method using help, you must prefix the method name with str.; that is, use help(str.capitalize) instead of help(capitalize).)

  • Based on your understanding, can you predict what will be the effect of each of these methods on the following strings:

    s1 = "turner"
    s2 = "north lyneham"
    s3 = "AINSLIE"
    s4 = "NewActon"?
    

    To check if your understanding is correct, write python code to perform the operation, run it and see.

(From Punch & Enbody, Chapter 4: Question 14, on page 221.)

Programming problems#

Note: These are more substantial programming problems. We do not expect that everyone will finish them within the lab time. If you do not have time to finish them during the lab, you should continue working on them later (at home, in the CSIT labs after teaching hours, or on one of the computers available in the university libraries or other teaching spaces).

Closest matches#

(a) Write two functions, smallest_greater(seq, value) and greatest_smaller(seq, value), that take as argument a sequence and a value, and find the smallest element in the sequence that is greater than the given value, and the greatest element in the sequence that is smaller than the given value, respectively.

For example, if the sequence is “qazrfvujm” and the target value is ‘h’, the smallest greater element is ‘j’ and the greatest smaller element is ‘f’.

  • You can assume that all elements in the sequence are of the same type as the target value (that is, if the sequence is a string, then the target value is a letter; if the sequence is a list of numbers, then the target value is a number).
  • You can not assume that the elements of the sequence are in any particular order.
  • You should not assume that the sequence is of any particular type; it could be, for example, a list, a string, or some other sequence type. Use only operations on the sequence that are valid for all sequence types.
  • What happens in your functions if the target value is smaller or greater than all elements in the sequence?

(b) Same as above, but assume the elements in the sequence are sorted in increasing order; can you find an algorithm that is more efficient in this case?

Counting duplicates#

If the same value appears more than once in a sequence, we say that all copies of it except the first are duplicates. For example, in [-1, 2, 4, 2, 0, 4], the second 2 and second 4 are duplicates; in the string “Immaterium”, the ‘m’ is duplicated twice (but the ‘i’ is not a duplicate, because ‘I’ and ‘i’ are different characters).

Write a function count_duplicates(seq) that takes as argument a sequence and returns the number of duplicate elements (for example, it should return 2 for both the sequences above). Your function should work on any sequence type (for example, both lists and strings), so use only operations that are common to all sequence types. For the purpose of deciding if an element is a duplicate, use standard equality, that is, the == operator.

bars search times arrow-up