Homework 5

This is the fifth and final homework assignment. Your goal in this assignment is to implement a function that calculates the average rank of a set of items (“competitors”), based on a comparison over several tests.

Practical information#

The assignment is due on Friday the 1st of October, at 11:55pm (Canberra time). Note that this is the Friday in semester week 8; this is because the Monday in week 9 is a public holiday. To submit your solution, you will upload a single python file via wattle. Here is the assignment submission link.

In addition to submitting your solution, you must attend the following lab (in semester week 9). In the lab, your tutor will ask you some questions about your solution, and give you feedback if there is anything you need to improve. This discussion with the tutor is also part of the assessment.

If you fail to show up for the discussion with the tutor, you will receive zero marks for this assignment. If you do not submit a solution, you may still get partial marks if you are able to show the tutor that you have a solution that is at least partially functional.

The homework is individual. You must write your own solution, and you are expected to be able to explain every aspect of it.

As usual, you should have followed previous week’s lectures and worked through the exercises in the labs before starting on the assignment. The assignment should not take more than a few hours to complete.

The problem#

In the lecture on data science, we discussed the problem of identifying from a candidate set of models the one that gives the best result in a set of tests. The data from this example is:

model 1 model 2 model 3 model 4 model 5 model 6 model 7 model 8 model 9 model 10 model 11
test 1 40 16 7 10 52 4 80 67 52 121 51
test 2 571 200 352 187 616 147 914 406 635 712 1914
test 3 353 108 216 202 204 146 373 778 303 595 449
test 4 9 2 9 280 2 0 4 1 1 0 0
test 5 95 495 1201 704 47 3646 45 9 5 19 29
test 6 41 434 1897 215 17 536 2 2 0 0 18
test 7 1428 88 9 47 122 0 161 3 5 1 4
test 8 350 0 0 0 5 0 60 30 860 53 50

(Note that we have flipped the table here so that the models are the columns and the tests are the rows.)

A higher value means a better result for the model in a given test. For each test, we can calculate a ranking of the model, where the best model gets rank 1, the second best gets rank 2, and so on. For example, the rankings corresponding to the table above are:

model 1 model 2 model 3 model 4 model 5 model 6 model 7 model 8 model 9 model 10 model 11
test 1 7 8 10 9 4.5 11 2 3 4.5 1 6
test 2 6 9 8 10 5 11 2 7 4 3 1
test 3 5 11 7 9 8 10 4 1 6 2 3
test 4 2.5 5.5 2.5 1 5.5 10 4 7.5 7.5 10 10
test 5 5 4 2 3 6 1 7 10 11 9 8
test 6 5 3 1 4 7 2 8.5 8.5 10.5 10.5 6
test 7 1 4 6 5 3 11 2 9 7 10 8
test 8 2 9.5 9.5 9.5 7 9.5 3 6 1 4 5

Note that in some cases there are two (or more) models with equal score (for example, model 5 and model 9 in the first test). In this situation, we assigned all of them the average of the ranks that they occupy (in the first test, rank 4 and rank 5) in the overall order.

Each test gives a (potentially different) ranking of all the models. From this, we can calculate the average rank for each model, as the average of its ranking over all the tests. From the table above, we get the average ranks:

model 1 model 2 model 3 model 4 model 5 model 6 model 7 model 8 model 9 model 10 model 11
4.1875 6.75 5.75 6.3125 5.75 8.1875 4.0625 6.5 6.4375 6.1875 5.875

Here is a different example. The following table lists, for a few countries, the best time achieved by a swimmer from that country in the qualifying round of some of the women’s swimming events at the latest olympics:

Aus Den Swe USA
50m freestyle 24.02 24.12 24.26 24.37
100m freestyle 52.13 52.96 52.91 53.21
200m freestyle 1:55.87 -- -- 1.55.28

Again, for each event we can calculate a ranking of the countries (though here a lower time is better, and therefore gets a lower rank):

Aus Den Swe USA
50m freestyle 1 2 3 4
100m freestyle 1 3 2 4
200m freestyle 2 3.5 3.5 1

Because neither a Danish nor a Swedish swimmer competed in the 200m freestyle event, we assign both the averaged last two ranks. From this table, we can compute the average rank, over these three events, of each of the countries:

Aus Den Swe USA
4/3 8.5/3 8.5/3 3

Task:

Your task is to implement a function average_rank(table) that computes the average rank for each column in a table. Each column represents one “competitor”, and each row represents one “test”, which ranks each of the competitors.

The table is represented as a list of lists, with one sublist per row. You can assume that

  • The table has at least one row.
  • All rows in the table have the same number of columns.
  • All values are numeric (integers or floating point numbers); you do not have to consider missing values.
  • When ranking the values in each row, a higher value is better, and therefore receives a lower rank (like in the first example above).

Your function should return the list of average ranks. The list must be the same length as the number of columns in the table (i.e., the number of “competitors”) and the averages ordered in the same way as the columns of the table.

As a starting point, we provide you with a skeleton code file: average_rank.py. Download this file and write in it your implementation of the function.

Testing#

The skeleton file has two testing functions: test_average_rank_set1() and test_average_rank_set2(). Both will run some tests on your average_rank function, and will raise an error if any of the tests fail. If all tests pass, each testing function prints the message “all tests passed” at the end. Tests in set 1 are a bit simpler, while the tests in set 2 introduce more corner cases.

Remember that testing only checks a small number of predefined cases; it can never prove that your function works correctly for all valid arguments. You should examine the test cases that are provided, and think about whether there are any important ones that are missing.

Note that you can define additional functions, if you think it helps you decompose the problem or write a better solution. Your function definitions should contain docstrings, but you may not use strings as comments anywhere other than on the first line inside a function, or at the beginning of the file.

Marking#

Code quality

In this homework (like the last two) we will also be marking your submission for its code quality. This includes aspects such as:

  • Using good function, parameter and variable names. The name of the function you have to write is fixed, but if you define additional functions (to decompose the problem) then they should be given descriptive names.
  • Appropriate use of comments and docstrings.

    This means not too little comments but also not too much. Comments should be accurate, relevant, and readable. A docstring should appear as the first statement in every function definition.

  • Good code organisation.

    This includes appropriate use of functions to decompose a problem and avoid code repetition. Also, do not import modules that you do not use.

What to submit

You should edit the skeleton file average_rank.py, then upload only this file with your implementations of the function using the assignment submission link on wattle.

Remember that you must upload a single python code file. Do NOT zip it or convert it to another format.

The file that you submit must meet the following requirements:

  • It must be syntatically correct python code.
  • Like the file you downloaded, it should contain only function definitions, and, optionally, import statements. However, it is not necessary to use any module to solve the problem, and you should only import modules that you actually use. Comments, including docstrings (if they are used appropriately) are of course ok to include. Anything that is not a function definition or import statements will be ignored when we check your submission.

As mentioned above, you must also attend the following lab and answer your tutor’s questions about your solution. This discussion is part of the assessment. You should be prepared to answer or demonstrate to the following questions:

  • Can you download the file that you submitted from wattle?
  • Can you run that file in the python interpreter (using an IDE of your choice)?
  • If the file has syntax errors, can you use the error messages from the interpreter or IDE to identify where the syntax errors are?
  • Does your submitted file meet the requirements stated above? Does it contain anything that is not a function definition? If so, can you point it out?
  • Does your implementation pass all the tests run by the unmodified testing functions?
  • Is your implementation correct for all valid arguments?
  • Does your function always return a value of the correct type?
  • Did you think of any other test cases that should be used to test the function, in addition to or in place of those provided?
  • What is the difference between the print function and the return statement?

In marking this assignment we will consider the following:

  • Does your submitted file satisfy the requirements specified above?
  • Does your implementation compute the correct value for all valid arguments?
  • The quality of your submitted python code, including its organisation, naming and documentation (with docstrings and comments).
  • Your ability to use the tools (e.g., the IDE or python interpreter), your understanding of python’s error messages, and your understanding of the solution, as demonstrated in your discussion with the tutor.

The assignment is worth 4% of your final mark. 2 marks are based on the functionality of your submission, and 2 marks on the quality and readability of your code.

bars search times