Week 12: Lab 8

Note: this is a long, exploratory study which may consume more time for reading and consideration than for doing the actual code refactoring. You may be concerned with other (more important) issues right now, and decide not to do the whole exercise meticulously. That’s OK, but do try to learn an important lesson here — what is a lazy data structure, how it can be defined and why can it be useful.

Objectives

Functional programming has a number of defining traits including immutability, first class functions, pure functions (referential transparency), and lazy evaluation. Like other “industrial” programming languages such as C++, C#, and JavaScript, Java is not a functional language, but it has recently adopted new programming features inspired by the functional programming paradigm. We have learnt some of these features, like λ-expressions and stream methods which require λ-expressions as an argument (and may therefore be described as high-order functions). We have also discovered that a stream pipeline which does not involve a terminal operation defines a lazy computation: the result of the stream pipeline is defined, but is not calculated immediately. A stream pipeline is executed only if it is consumed by a terminal operation.

In our final lab, we will learn how a traditional data structure (a container such as a list) can be defined lazily such that its elements are defined but not actually created (present in memory) until the time when they are actually required by the program. Lazy data structures are common place in true functional languages like Haskell, but their construction in Java requires certain inventiveness.

The exercises below are based on the discussion in Chapter 14.3 of “Java 8 in Action” book by Urma, Fusco and Mycroft.

Exercise One: Recursively Streaming Primes

Recall Homework 3, the factorisation of an integer, and the way it can be done using a stream pipeline based approach (this is sketched in slide 5 of Lecture F5; here we are using somewhat different naming).

We want to compute the list of all primes. We start with an infinite stream of integers, which are filtered by the predicate of primality, defined in another method. The method which returns the primes and the primality tester are defined as follows:

public static Stream<Integer> primes(int n) {
    return Stream.iterate(2, i -> i + 1)
                 .filter(i -> isPrime(i))
                 .limit(n);
}

public static boolean isPrime(int candidate) {
	int candidateRoot = (int) Math.sqrt((double) candidate);
	return IntStream.rangeClosed(2, candidateRoot)
		.noneMatch(i -> candidate % i == 0);
}

The program which calculates (first n) primes using two of the above methods is FindingPrimes.java in the labs/lab8 of the repository. You should update your repository (u1234567/comp6700-2017) as instructed in Lab 7. Compile and run the code, and make sure you understand its operation before continuing.

Computationally, this is a very expensive solution because we iterate through every number every time to check if it divides the candidate value. You have probably already guessed that you need only check each candidate for divisibility by the already discovered primes in the primality tester — this is used in the Sieve of Eratosthenes algorithm of prime computation. Thus, an approach to improve the above computation of primes which involves streams can be specified as the following steps:

  1. Issue a stream of numbers from which primes will be selected

  2. Take the first element — the so called head — from the list (it starts from 2 which is the first prime)

  3. Take the numbers from the tail of the stream (tail is what remains after removing head), and filter out all divisible by head

  4. The resulting stream (“filtered tail”) is a new stream which is recursively used to repeat the same procedure (filter out on the next head, 3, then next …).

Let’s write the above steps as methods (below the interface java.util.stream.IntStream is featured; it has the method findFirst which return the value of OptionalInt, a value wrapped into a specialized Optional container (java.util.OptionalInt) which is extracted by the method getAsInt:

// Step 1
static IntStream numbers(){
      return IntStream.iterate(2, n -> n + 1);
}
// Step 2
static int head(IntStream numbers){
      return numbers.findFirst().getAsInt();
}
// Step 3
static IntStream tail(IntStream numbers){
      return numbers.skip(1);
}

If the method head(stream) gives us the first element, then the filtering can be done as follows:

IntStream numbers = numbers();
int head = head(numbers);
IntStream filtered = tail(numbers).filter(n -> n % head != 0);

and the last step written like this:

// Step 4
static IntStream primes(IntStream numbers) {
    int head = head(numbers); // terminal op, stream's closed -:(
    return IntStream.concat(
             IntStream.of(head),
             primes(tail(numbers).filter(n -> n % head != 0))
           );
}

The principle looks OK, but there are problems with the above code (it won’t compile).

Exercise Two: Lazy Streams

The first problem is that the code above has a terminal operation inside the stream pipeline. The head method calls the terminal operation findFirst to get the first element of the stream, thereby closing the stream. After this, any further calls to intermediate stream methods such as filter will throw a java.lang.IllegalStateException.

The second problem is that the static method IntStream.concat needs two instances of a stream, and the second instance is an expression which recursively calls primes resulting in an infinite recursion (no base case).

The approach which originally seemed so elegant now looks like a “no-go”. Surprisingly, one can salvage it by resorting to good old OO.

Instead of using the standard API streams, we will use lazy lists — custom defined list containers which have operational properties of streams, but are lists (there are no terminal methods which can exhaust them), yet without advance presence in memory (they are lazy).

By using lists instead of streams, we shall solve the first problem (a consumed stream cannot be reused). The second problem will stay, but it is expected since we want to compute all the primes (their number is infinite, Euclid knew it 2,000 years ago). Either memory, or, rather, method call stack limits will prevent achieving this lofty goal, though.

Lazy Lists are defined as implementation of a List (MyList here, since we will use a custom interface methods) interface using Linked List data structure (discussed in A1 and practiced Lab 7). A Linked List is an object with two fields:

  • head: T (list element of type T)
  • tail: LinkedList<T> (reference to remaining list)

LinkedList

A Lazy List also has head and tail, but the latter is not a value (reference to an object of Linked List type), but a supplier function which returns such object when evaluated:

  • head: T
  • tail: () -> LazyList<T>

LazyList

In the code provided to you in this lab, the LazyList class is defined by implementing a custom MyList interface:

interface MyList<T> {
     T head();
     MyList<T> tail();
     default boolean isEmpty() {
         return true;
     } 
}

If to use a lazy list to represent primes instead of a normal stream, the crucial Step 4 from the algorithm above can be rewritten as follows:

// Step 4 revisited
public static MyList<Integer> primes(MyList<Integer> numbers) {
    return new LazyList<>(
        numbers.head(),
        () -> primes(
            numbers.tail()
                   .filter(n -> n % numbers.head() != 0)
            )
    );
}

Although it may look a bit intimidating, the primes method reads quite simply:

the list of primes is created by taking the first prime number from the list of all positive integers > 1, and concatenating it with the list of emaining numbers which a filtered to retain only those which are not divisible by the first number.

It is a direct implementation of the Sieve of Eratosthenes algorithm.

The filtering requires a method (lazy list are not streams, remember?) included in MyList interface and implemented accordingly in LazyList:

public MyList<T> filter(Predicate<T> p) {
    return isEmpty() ?
        this :  // new EmptyList() is also possible
        p.test(head()) ?
            new LazyList<T>(head(), () -> tail().filter(p)) :
            tail().filter(p);
}

Study the LazyList implementation. The code also contains two implementations of MyList interface: Empty and LazyList. The significance of EmptyList is obvious from the filter definition. Refactor FindingPrimes.java to make use of LazyList to represent primes. As an additional exercise, you can retain the original code (which performance is timed in the main method), and perform comparative performance measurements for calculating first 100,000 primes (if this will not break the method call stack – check when this happens, if the limit is less, try to work with a smaller figure) using both the original stream based method and algorithmically optimized one, which uses the lazy lists.

Exercise Three (advanced)

A correct solution to Exercise 2 is recursive, and therefore the primes computation will not run forever because at some point the method stack overflow exception will be thrown and the program will terminate. (Have you experienced this?) Java, because of the JVM design limitations, will not perform an automatic Tail-Call Elimination (we discussed this in A4 lecture), and the only way of removing the stack depth limitation is to optimize the code by hands. Explore possibilities of TCE in the recursive solution of Exercise 2.

Updated:  13 Jun 2017/ Responsible Officer:  Head of School/ Page Contact:  Alexei Khorev