Homework 4

This is the fourth homework assignment. Your goal in this assignment is to write a function that performs a calculation on sequence data and returns a value. Like the previous two homework exercises, we provide you with a testing framework which will run tests of your function. It is important that you learn to use the testing program effectively, since we will be using this kind of automated testing for the remaining homework as well as for the programming exam at the end of the course.

Practical information#

The assignment is due on Sunday the 15th of April, 2018, at 6pm. To submit your solution, you will upload a single python file via wattle. Here is the assignment submission link.

In addition to submitting your solution, you must attend the following lab (in week 7). In the lab, your tutor will ask you some questions about your submission, and give you feedback if there is anything you need to improve. This discussion with the tutor is also part of the assessment. If you do not show up for the discussion with the tutor, you will not receive any marks for this assignment. If you do not submit a solution, you may still get the marks that are based on the tutor’s assessment of your understanding.

The assignment can be done together in pairs, but not in a group of more than two students. You may also do the assignment on your own if you prefer.

If you work in a pair, both students must submit solution files, and both students must attend the following lab and answer the tutors questions. In your solution file, write a comment (using python comment syntax) to say who you worked together with - you should write their ANU id. Both of you must be able to explain every part of your submitted solution. The tutor will choose who to address each question to, and only the student addressed may answer. It is not acceptable to divide the assignment up so that one student does half and the other student the other half.

As usual, you should have followed last week’s lectures and worked through the exercises in lab 4 before starting on the assignment. The assignment should not take more than one or two hours to complete.

The problem#

The relative frequency of a letter in a string is the number of times that the letter appears in the string divided by the total number of letters in the string. Note that this is always a number between 0 and 1.

Write a function max_relative_frequency(s) which takes as argument a string and returns the highest relative frequency of any letter in the string. This value is unique. There can be several letters that occur with an equal (highest) frequency, but in that case their relative frequency is the same.

  • Consider only letters of the English alphabet (that is, 'a' through 'z').
  • For the purpose of counting occurrances, consider letters that differ only by case to be the same letter. For example, the letter 'n' occurs three times in the string "Non-even" (once in upper case, twice in lower case).
  • For the purpose of counting the total number of letters in the string, count only letters. For example, the string '0 << c++ << 9' contains only one letter, 'c', so the relative frequency of 'c' in this string is 1.
  • If the string contains no letter, the relative frequency is undefined and your function should return 0.

Examples:

  • The string 'sufficit' has 8 letters, and the most frequent letters are f and i, which both occur twice. Thus, the highest relative frequency is 2/8.

  • The string 'Non-even' has 7 letters (- is not a letter), and the most frequent letter is n which occurs three times (one N and two n). Thus, the highest relative frequency is 3/7.

Assumptions and restrictions:

  • You can assume that the argument is a string.

  • Your function must return a number between 0 and 1 (inclusive).

As a starting point, we provide you with a skeleton code file: homework4.py. Download this file and write in it your implementation of the function.

Using the testing program#

To use the testing program, you must first download these two files:

Save both of them in the same directory as the file homework4.py. To run the testing program, you just need to run test_hw4.py. The testing program will read the file homework4.py and test the function max_relative_frequency defined in that file, and print out results of the tests. If your function fails any of the tests, the program will print a detailed error message and stop.

Remember that the testing program will test the file named homework4.py which is located in the same directory. If you change the name of the file with your implementation, you must also change the file name on the last line in the testing program to test the right file.

For the testing program to work, your code file must meet certain requirements. These requirements also apply to the file that you submit.

  • It must be syntatically correct python code.
  • It must contain only function definitions and comments.
  • It must define a function called max_relative_frequency that takes one argument.
  • You may not use import statements for this homework, not even for standard library modules.

Your function definitions may contain docstrings (as shown in the week 2 lecture), but it is not permitted to use strings as comments anywhere other than on the first line inside a function suite.

If your submitted file does not satisify these requirements it will not be marked.

Marking#

Upload the file homework4.py with your implementation.

Remember that you must upload a single python code file. The name of the file does not matter, except that it must end with the extension .py. Do NOT zip it or convert it to another format.

As mentioned above, you must also attend the following lab (in week 7) and answer your tutor’s questions about your solution. This discussion is part of the assessment. You should be prepared to answer or demonstrate to the following questions:

  • Can you download the file that you submitted from wattle?
  • Can you run that file in the python interpreter (using an IDE of your choice) on the CSIT lab computer?
  • If the file has syntax errors, can you use the error messages from the interpreter or IDE to identify where the syntax errors are?
  • Does your submitted file meet the requirements stated above? Does it contain anything that is not a function definition, an import statement, or a comment? If so, can you point it out?
  • Can you download and run the testing program?
  • Does your implementation pass all the tests run by the testing program?
  • What is the type of the value returned by your function?
  • In order to find the relative frequency of each letter in the input you needed to iterate over the characters in the string (or you used some function or method that does). How many times does your solution iterate through the entire string? What is the smallest number of complete iterations over the string that is necessary to find the highest relative frequency?
  • Did you use any of python’s built-in string methods? If so, can you explain how you would write code to do what those methods do, if you had to produce a solution without using them?
  • What values do you store (in variable) while computing the answer? What data structure, if any, do you use to store them? Is there any other data structure that could be used instead?
  • Did you implement your solution with just one function, or divide it into several functions? Does the functional abstraction make your code easier to read and understand?
  • Are the functions in your submitted file documented? (with docstrings and/or comments). Does the function documentation adequately describe what the function does and what its assumptions and limitations are? Are the names of variables, parameters and auxiliary functions descriptive of their purpose?

The marking scale for this assignment is as follows:

  • The submitted file is syntactically correct and meets the submission requirements: 0.5 marks.
  • The implementation of the max_relative_frequency function returns correct values for all valid arguments: 1 mark.
  • The submitted code has the right level of documentation and commenting: 1 mark.
  • Your ability to use the tools (including the testing program) and your understanding of possible solutions to the problem, as demonstrated in your discussion with the tutor: up to 2.5 marks.

The assignment is worth 5% of your final mark.

bars search times arrow-up