Homework 5

This is the fifth and final homework assignment. Your goal in this assignment is to implement a function that calculates the average rank of columns in a table, based on a comparison over several rows.

Practical information#

The assignment is due on Monday the 2nd of May, at 9:00am (Canberra time).

To submit your solution, you will upload a single python file via wattle. Here is the assignment submission link.

In addition to submitting your solution, you must attend the following lab (in semester week 9). In the lab, your tutor will ask you some questions about your solution, and give you feedback if there is anything you need to improve. This discussion with the tutor is also part of the assessment.

If you fail to show up for the discussion with the tutor, you will receive zero marks for this assignment. If you do not submit a solution, you may still get partial marks if you are able to show the tutor that you have a solution that is at least partially functional.

The homework is individual. You must write your own solution, and you are expected to be able to explain every aspect of it.

As usual, you should have followed previous week’s lectures and worked through the exercises in the labs before starting on the assignment. The assignment should not take more than a few hours to complete.

The problem#

In the data science lecture in week 6, you have learnt about how to store and process a table represented as a list of lists. In this homework, you are given a table of numeric values. The aim is to compute the rankings of all elements in each row, which are then used to calculate the rankings of all columns.

Here is an example of a table with 8 rows and 11 columns:

col 1 col 2 col 3 col 4 col 5 col 6 col 7 col 8 col 9 col 10 col 11
row 1 40 16 7 10 52 4 80 67 52 121 51
row 2 571 200 352 187 616 147 914 406 635 712 1914
row 3 353 108 216 202 204 146 373 778 303 595 449
row 4 9 2 9 280 2 0 4 1 1 0 0
row 5 95 495 1201 704 47 3646 45 9 5 19 29
row 6 41 434 1897 215 17 536 2 2 0 0 18
row 7 1428 88 9 47 122 0 161 3 5 1 4
row 8 350 0 0 0 5 0 60 30 860 53 50

A higher value means a better score for the column in a given row. For each row, we can calculate a ranking of the columns, where the best column gets rank 1, the second best gets rank 2, and so on. For example, the rankings corresponding to the table above are:

col 1 col 2 col 3 col 4 col 5 col 6 col 7 col 8 col 9 col 10 col 11
row 1 7 8 10 9 4.5 11 2 3 4.5 1 6
row 2 6 9 8 10 5 11 2 7 4 3 1
row 3 5 11 7 9 8 10 4 1 6 2 3
row 4 2.5 5.5 2.5 1 5.5 10 4 7.5 7.5 10 10
row 5 5 4 2 3 6 1 7 10 11 9 8
row 6 5 3 1 4 7 2 8.5 8.5 10.5 10.5 6
row 7 1 4 6 5 3 11 2 9 7 10 8
row 8 2 9.5 9.5 9.5 7 9.5 3 6 1 4 5

Note that in some cases there are two (or more) columns with equal score (for example, column 5 and column 9 in the first row). In this situation, we assigned all of them the average of the ranks that they occupy (in the first row, rank 4 and rank 5) in the overall order.

Each row gives a (potentially different) ranking of all the columns. From this, we can calculate the average rank for each column, as the average of its ranking over all the rows. From the table above, we get the average ranks:

col 1 col 2 col 3 col 4 col 5 col 6 col 7 col 8 col 9 col 10 col 11
4.1875 6.75 5.75 6.3125 5.75 8.1875 4.0625 6.5 6.4375 6.1875 5.875

Task:

Your task is to implement a function average_rank(table) that computes the average rank for each column in a table.

The table is represented as a list of lists, with one sublist per row. You can assume that

  • The table has at least one row.
  • All rows in the table have the same number of columns.
  • All values are numeric (integers or floating point numbers).
  • When ranking the values in each row, a higher value is better, and therefore receives a lower rank.

Your function should return the list of average ranks. The list must be the same length as the number of columns in the table and the averages ordered in the same way as the columns of the table.

As a starting point, we provide you with a skeleton code file: average_rank.py. Download this file and write in it your implementation of the function.

Testing#

The skeleton file has two testing functions: test_average_rank_set1() and test_average_rank_set2(). Both will run some tests on your average_rank function, and will raise an error if any of the tests fail. If all tests pass, each testing function prints the message “all set 1 tests passed” or “all set 2 tests passed” at the end. Tests in set 1 are a bit simpler, while the tests in set 2 introduce more corner cases.

Remember that testing only checks a small number of predefined cases; it can never prove that your function works correctly for all valid arguments. You should examine the test cases that are provided, and think about whether there are any important ones that are missing.

Note that you can define additional functions, if you think it helps you decompose the problem or write a better solution. Your function definitions should contain docstrings, but you may not use strings as comments anywhere other than on the first line inside a function, or at the beginning of the file.

Marking#

Code quality

In this homework (like the last two) we will also be marking your submission for its code quality. This includes aspects such as:

  • Using good function, parameter and variable names. The name of the function you have to write is fixed, but if you define additional functions (to decompose the problem) then they should be given descriptive names.
  • Appropriate use of comments and docstrings.

    This means not too little comments but also not too much. Comments should be accurate, relevant, and readable. A docstring should appear as the first statement in every function definition.

  • Good code organisation.

    This includes appropriate use of functions to decompose a problem and avoid code repetition. Also, do not import modules that you do not use.

What to submit

You should edit the skeleton file average_rank.py, then upload only this file with your implementations of the function using the assignment submission link on wattle.

Remember that you must upload a single python code file. Do NOT zip it or convert it to another format.

The file that you submit must meet the following requirements:

  • It must be syntatically correct python code.
  • Like the file you downloaded, it should contain only function definitions, and, optionally, import statements. However, it is not necessary to use any module to solve the problem, and you should only import modules that you actually use. Comments, including docstrings (if they are used appropriately) are of course ok to include. Anything that is not a function definition or import statements will be ignored when we check your submission.
  • The library scipy.stats has a rankdata function to do similar task. You are NOT allowed to use this function.
  • You are NOT allowed to use the pandas library.

As mentioned above, you must also attend the following lab and answer your tutor’s questions about your solution. This discussion is part of the assessment. You should be prepared to answer or demonstrate to the following questions:

  • Can you download the file that you submitted from wattle?
  • Can you run that file in the python interpreter (using an IDE of your choice)?
  • If the file has syntax errors, can you use the error messages from the interpreter or IDE to identify where the syntax errors are?
  • Does your submitted file meet the requirements stated above? Does it contain anything that is not a function definition? If so, can you point it out?
  • Does your implementation pass all the tests run by the unmodified testing functions?
  • Is your implementation correct for all valid arguments?
  • Does your function always return a value of the correct type?
  • Did you think of any other test cases that should be used to test the function, in addition to or in place of those provided?
  • What is the difference between the print function and the return statement?

In marking this assignment we will consider the following:

  • Does your submitted file satisfy the requirements specified above?
  • Does your implementation compute the correct value for all valid arguments?
  • The quality of your submitted python code, including its organisation, naming and documentation (with docstrings and comments).
  • Your ability to use the tools (e.g., the IDE or python interpreter), your understanding of python’s error messages, and your understanding of the solution, as demonstrated in your discussion with the tutor.

The assignment is worth 4% of your final mark. 2 marks are based on the functionality of your submission, and 2 marks on the quality and readability of your code.

bars search times arrow-up