This is the 4th homework assignment. Your goal in this assignment is to implement a solution to the common problem of approximating an unknown function based on a sample of function values.

Your solution to this homework will also be marked on code quality (1%), which is distinct from code functionality (2%). To gain full marks, your submission must be correct with high code quality.

Practical information#

The assignment is due Sunday, end of semester break, 14th April 2024 at 11:55pm. To submit your solution, you will upload a single python file via wattle. Here is the assignment submission link.

NOTE: No excuses about technical problems like internet connection or computer issues are accepted. You can actually submit the homework as many times as you like, as the last submitted file will be marked. Do not wait until the last minute!

The homework is individual. You must write your own solution, and you are expected to be able to explain every aspect of it. You are not allowed to share your solution (code) with other students; this includes posting it (or parts of it) to the discussion forum, or to any other on-line forum. You may be contacted for an additional oral assessment, which may result in a change of mark.

If you have questions?

The problem#

Linear inter- and extra-polation, called together as linear prediction, is a method of computing the approximate value of a function in one argument, given only samples of the function at a set of points. This is commonly used where the values of a function are difficult or expensive to obtain. For example, we may have GDP or population data of a country only for some certain time points (e.g., years) and we want to predict these data for other time points (e.g., at a certain day or month).

Suppose we know the function value at a set of points \(y_i = f(x_i)\), for \(i = 1\ldots n\). To approximate the function at a new point \(x'\), we find two nearby points \(x_{near1}\) and \(x_{near2}\), where \(x_{near1} < x_{near2}\). If \(x'\) is between the smallest and largest samples \(x\) (interpolation case), we pick \(x_{near1}\) and \(x_{near2}\) as the closest below and above points to \(x'\). If \(x'\) is smaller than all samples \(x\) (extrapolation case), we pick \(x_{near1}\) and \(x_{near2}\) as the two smallest samples \(x\). If \(x'\) is larger than all samples \(x\) (also extrapolation), we pick \(x_{near1}\) and \(x_{near2}\) as the two largest samples \(x\).

We then draw an infinite straight line through \((x_{near1}, y_{near1})\) and \((x_{near2}, y_{near2})\), and take the value \(y'\) where this line is at \(x'\). The general form of the straight line is \(a * x + b\), where the slope \(a = (y_{near1} - y_{near2})/(x_{near1} - x_{near2})\) and the intercept \(b = y_{near1} - a * x_{near1}\). Using this, we can calculate \(y' = a * x' + b\). This is similar to linear regression introduced in Lecture 7 considering only two (nearby) points.

Example:

Suppose we have the sample points \(f(1) = 1\), \(f(3) = 9\) and \(f(5) = 25\).

If we want to compute an approximation of the value of \(f\) at \(x' = 2\): Because \(x' = 2\) is between the smallest (\(1\)) and largest (\(5\)) samples of \(x\), we find the first nearby point that is less than \(2\) (namely \(1\)) and the other that is greater than \(2\) (namely \(3\)), and draw a straight line through the known points \((1, 1)\) and \((3, 9)\). The line equation becomes \(4 * x - 3\), from which we get the answer \(4 * 2 - 3 = 5\). This is the interpolation case.

Now if we want to approximate the function value at \(x' = 0.5\): Because \(0.5\) is smaller than all samples \(x\), the two nearby (smallest) points are \(1\) and \(3\). The line is the same as above: \(4 * x - 3\), and we get \(y' = 4*0.5-3 = -1\). This is the extrapolation case.

Another example is to approximate the value of \(f\) at \(4\) and \(6\). The two nearest points for both cases are \(3\) and \(5\), from which we get the line \(8 * x - 15\). Therefore, we get \(8*4-15=17\) and \(8*6-15=33\) as the answers, respectively.

The following figure illustrates how these values were calculated:

example of linear prediction

Task:

Your task is to implement a function linear_prediction(x, y, x_test) that computes the linear prediction of the unknown function f at a new point x_test. The sample is given in the form of two sequences x and y. Both sequences have the same length, and their elements are numbers. The x sequence contains the points where the function has been sampled, and the y sequence contains the function value at the corresponding point. In other words, y[i] = f(x[i]).

Assumptions and restrictions:

  • You can assume that the arguments are as described: that is, x and y are sequences, both have the same length of at least 2, and their elements are numbers.
  • You should NOT make any assumption about what type of sequence the x and y arguments are (e.g., list, tuple).
  • You can assume that the values in x are unique (that is, there are no repeated x-values).
  • You cannot assume that the values in x are ordered.
  • You can assume that x_test is a number.
  • x_test might possibly equal to a value in the sequence x. If x_test is equal to a sample value (a value in the input x sequence), your function should simply return the corresponding function value from y.
  • Your function must return a number.
  • There are ready-to-use functions like scipy.interpolate and pandas.DataFrame.interpolate from scipy and pandas libraries, which performs various kinds of interpolation, including linear interpolation. Obviously, you may not use such function, or any other functions that provides a ready-made solution to the problem, since the goal of the assignment is for you to demonstrate that you can implement the function yourself.
  • You may import the math, numpy, and matplotlib (for plotting) module if you wish.
  • You must not import any other modules.

We provide you with a skeleton code file: linear_prediction.py. Download this file and write in it your implementation of the function. The function linear_prediction in this file has a pass statement, which you should replace it with your own code. You may define additional functions (for example, to break the problem up into smaller subproblems) in your solution file.

NOTE: the parameters to the function will satisfy the assumptions stated above, thus you do not need to explicitly check their validity.

The other is a testing function that you should use to test if your function is working correctly (see below) and a plotting function plot_linear_prediction which you can use to visualise the results for a sequence of testing \(x'\) values.

Testing#

The skeleton file has a testing function that you must not modify:

  • test_linear_prediction() This function runs a number of tests of your function, and produces an error message if any of the tests fail. If all tests are ok, it will print the message “all tests passed”.

Note that this testing function only test a limited number of test cases. Passing all the tests does not necessarily means that your code is correct. But failing at least one test case means that your code is wrong.

Marking#

Code quality

In this homework we will also be marking your submission for its code quality. This includes aspects such as:

  • Using good function, parameter and variable names. The names of some functions in the homework are fixed, but if you define additional functions (to decompose the problem) then they should be given descriptive names.

  • Appropriate use of comments and docstrings. This means not too little comments but also not too much. Comments should be accurate, relevant, and readable. A docstring should appear as the first statement in every function definition.

  • Good code organisation. This includes appropriate use of functions to decompose a problem and avoid code repetition.

What to submit

You should edit the skeleton file linear_prediction.py, then upload only this file with your implementations of the function using the assignment submission link on wattle.

Remember that you must upload a single python code file. Do NOT zip it or convert it to another format.

The file that you submit must meet the following requirements:

  • It must be syntactically correct python code.
  • It must be named linear_prediction.py.
  • It must contain your uID and name to sign the agreement of academic integrity.
  • Like the file you downloaded, it should contain only function definitions, comments and docstrings.

In marking this assignment we will consider the following:

  • Does your submitted file satisfy the requirements specified above?
  • Does your implementation compute the correct value for all valid arguments?
  • The quality of your submitted python code, including its organisation, naming and documentation (with docstrings and comments).

If you are selected for additional oral assessment, you must attend the appointed lab and answer your tutor’s questions about your solution. The discussions may include, for example:

  • Can you download the file that you submitted from wattle?
  • Can you run that file in the python interpreter (using an IDE of your choice)?
  • Can you explain how you solved the problem?
  • If the file has errors, can you use the error messages from the interpreter or IDE to identify where the errors are?
  • Does your submitted file meet the requirements stated above? Does it contain anything that is not a function definition? If so, can you point it out?
  • Does your implementation pass all the tests run by the unmodified testing function?
  • Is your implementation of the function correct for any valid arguments?
  • Does your function always return a value of the correct type?
  • Did you think of any other test cases that should be used to test the function, in addition to or in place of those provided? Why do you think these test cases are useful. What situations or inputs are not covered by the existing test cases?
  • What is the difference between the print function and the return statement?

The assignment is worth 3% of your final mark. 2 marks are based on the functionality of your submission, and 1 mark on the quality and readability of your code.

bars search times arrow-up