Week 06: Lab 4

Objectives

  • Get more experience in using the String class and arrays, learn how to process a text using String’s methods like split(), and/or using the Scanner class and its methods;

  • Learn a stream-based text processing techniques using NIO.2 API.

This lab is rather long, and you do not have to complete all exercises during the class. But do find time and finish them afterwards — this lab is the last one before the mid-semester break, so consider it as our partying gift to you. -:)

Exercise One (warm up)

Given the array:

int num[] = {1,2,3,6,8,10,12,14,15,17,19,21};

write a program that inspects each element of the array and does the following:

  1. If the number is divisible by 2 (num % 2 == 0) then print out “The number is divisible by 2”.

  2. If the number is divisible by 3 (num % 3 == 0) then print out “The number is divisible by 3”.

  3. If the number is divisible by 5 (num % 5 == 0) then print out “The number is divisible by 5”.

Then modify the program so that this time the numbers are read in from the command line, ie, if your program is called ListNonPrimes then you will enter a command like:

% java ListNonPrimes  1 2 3 4 5 6 9 12 15 18 19

Use of Homework Three (Factorisation problem) is quite appropriate, of course.

Additional: A Unix/Linux redirection facility, when the standard input can be read from a file, and the standard output (or, standard error channel) can be written to a file (either overwriting content in an already existing file, or appending new content destined to the standard output), can be very useful if you need to run the same program many times, and want to have an easier opportunity to control what is going into the program as a command-line input (to edit a file from which an input is redirected is simpler than edit the command-line every time; more importantly, it is easier when you run the same program multiple times using a shell script).

To discover the redirection facility, first change the code in ListNonPrimes.java so it reads in the integers from the standard input at the program request:

% java ListNonPrimes
Type in integers: 1 2 3 4 5 6 9 12 15 18 19

(just as above – all the numbers at once, blue means the user typing). Then make the program to read in the numbers (Scanner.nextLine() method is most suitable), examine the numbers one-by-one, and print the same output as above. The change in the program should be minimal: instead of using args array of strings, use String.split() to obtain an array of strings, by splitting the string returned by Scanner; the rest is then processed as before.

Typing the same input into a program when you run it multiple times is awkward. Therefore, learn how to explore the shell (command-line) redirecting facilities. First, copy the command-line arguments into a plain text file – call it data.in, and run your program as follows:

% java ListNonPrimes < data.in

Then you should see the same outcome. The standard output/error redirection is achieved with the > operator – the output is written into a file (anew if the file already exists), or, if used with the “append” redirection >>, the output is added to file if it already exists (the latter is particularly useful for logging information which a long running program, like an OS, or a server, continuously generates to the standard output/error). Try this:

% java ListNonPrimes < data.in > data.out

and examine the result; try the double-redirection, too.

Exercise Two

Write a program which reads in a string of ten characters or less and copies each character in the string to an array of characters:

char[] charArray = new char[10];

Note: You may find the charAt() method of String objects to be useful. You probably use it as follows:

String s = "hello";
char firstChar=s.charAt(0);

What happens if you enter more than ten characters? Can you guard against this possibility using an if branch? Try to rewrite your program to quit with a warning message if more than ten characters are entered.

Exercise Three

Write a program which will prompt a user to input their full name. Then write this name back out again with the surname first followed by a comma and then the first two names:

For example, the program would respond to the input:

John Stuart Average

by typing

Average, John Stuart

You can assume that the user will input 3 names.

There are two ways of writing this program.

  1. You can just read in a string and then search that string for spaces (a single space,” “, or multiple spaces) which separate the names. This will involve a lot of counting and careful referencing to make sure that you split up the string properly. It is very good practice.

  2. A better way is to use java.util.Scanner class (see examples from the lecture J10, and discover more applications of this very useful class by reading the textbook, and/or API documentation). Here you need to create a Scanner instance and attach it to the standard input (similar to the example ByteCounter2.java, a version of ByteCounter.java, discussed in lectures, in which java.io.StreamInput was used):

    String input = new Scanner(System.in).nextLine();
    String[] names = input.split("\\s+");
    

    java.lang.String and java.util.Scanner are two very important classes, and it pays to practice the methods the provide to process various text data. You will find it much easier to learn effective use of standard API, and then use them constantly in your code, then to descent to a low level of programming and reinvent the wheel again and again.

Exercise Four: token delimiter in Scanner

Simple

Write a short program which reads in a text consisting of words and numbers in which that latter were meant to mark the line count, but for some reason the formatting had got messed up, and now the entire text is one very long line with numbers and words mixed up. Try to make Scanner to split the lines of text and restore the original format. As example, use the file total_mess.txt (you can assume that the only numbers which occur in this text are the line number markers).

More interesting

Now, let’s practice the use of Scanner with non-default delimiters, when the read-in text is split into tokens not against the whitespace delimiter, but against a different string, or a pattern of regular expression.

An example presented in J10 when Scanner.useDelimiter() was used to count paragraphs in a text:

try (Scanner sc = new Scanner(new File(filename)); 
                 sc.useDelimiter("(?m:^$)");) {
    int npara = 0;
    while (sc.hasNext()) {
       npara++;
    }  
    System.out.printf("%s has %3d paragraphs%n", filename, npara);
}      

To extend this program (we assume that you have embedded it already inside a class and its main) to count number of sentences in each paragraph would not be difficult at all – the only problem to solve is to figure out which regular expression to use to catch the end of a sentence.

Exercise Five: streaming data with NIO.2

The following simple program reads in the command-line arguments, inserts them into a map, and then reports the frequency of each word:

import java.util.*;

public class Freq {
   public static void main(String[] args) { 
      Map<String,Integer> m = new TreeMap<>();
      for (String word : args) {
         m.put(word, m.getOrDefault(word, 0) + 1);
      } 
      m.forEach((w, f) -> System.out.format("frequency %d: %s\n", f, w));
   }
}

(its code is Freq.java). The method forEach called on the map object is a new form of (internal) iterator which can be executed on a container type object (Collection, List, Set, Map). For a map, the elements of the container are (key,value) pairs. The above iterator simply prints all the pairs. (As an exercise, you can write the above statement as an explicit iterator, which would need the call java.util.Map.keySet() to extract the keys and iterate through them).

The NIO.2 API (namely, java.nio.file.Files, java.nio.file.Paths and java.nio.file.Path) allow an easy way to read in the contents of a text file line by line; the examples are provided in two sample programs JacksonSamplerNIO.java and JacksonSamplerStream.java of Assignment One (inside parsing directory).

Combine the approaches in the class Freq above and the Jackson sample programs to build the word-frequency dictionary (map) in a text file. Before including a word in a map, it has to be normalised — converted to lower case and stripped of any trailing punctuation characters. Hint: the methods java.lang.String.trim() to remove leading and trailing white spaces and java.lang.String.split("[\\P{L}]+") to split a string against non-letters can be useful. You can test your program on any text file (take care in choosing the right StandardCharsets – it was discussed in the lectures), but the essay In the Beginning was the Command Line by Neal Stephenson is as good as any.

Updated:  28 Mar 2017/ Responsible Officer:  Head of School/ Page Contact:  Alexei Khorev