Description#
The first lab is intended to familiarise you with the build
environment and structure of the project files, while also
implementing the scanner for the MoJo
language. The scanner takes input from the standard input stream
(System.in) and writes output to the standard output stream
(System.out). The scanner should print each recognized token and its
corresponding lexeme on the standard output one line at a time. Input
files can be any sequence of MoJo tokens, and are not required to be
valid MoJo programs.
This lab is not assessed, but will be important to you starting in on the first assessed project assignment to build the parser for MoJo.
Getting Started#
You should fork the Lab 1
repository
on gitlab. It contains a template file, a grading script, and some
example test cases. The template for the
JavaCC is included in the file
src/mojo/Parser.jj and should look like this:
/* Copyright (C) 1997-2023, Antony L Hosking.
* All rights reserved. */
options {
DEBUG_PARSER = false;
DEBUG_LOOKAHEAD = false;
DEBUG_TOKEN_MANAGER = false;
STATIC = false;
JDK_VERSION = "1.9";
}
PARSER_BEGIN(Parser)
public class Parser {}
PARSER_END(Parser)
/**************************************************
* The lexical spec starts here *
**************************************************/
TOKEN_MGR_DECLS :
{
int comment, pragma;
public static void main(String[] args) {
SimpleCharStream stream = new SimpleCharStream(System.in);
ParserTokenManager scanner = new ParserTokenManager(stream);
while (true) {
try {
Token token = scanner.getNextToken();
for (Token t = token.specialToken; t != null; t = t.specialToken)
System.out.println(tokenImage[t.kind] + " " + t);
if (token.kind == EOF) break;
System.out.println(tokenImage[token.kind] + " " + token);
} catch (TokenMgrError e) {
System.err.println(e.getMessage());
System.exit(-1);
}
}
}
}
/* WHITE SPACE */
SKIP : { " " | "\t" | "\n" | "\r" | "\13" | "\f" }
/* KEYWORDS */
TOKEN :
{
"break" | "class" | "const" | "else" | "extends" | "for" | "if" | "loop" | "method" |
"override" | "proc" | "return" | "struct" | "type" | "until" | "val" | "var" | "while"
}
/* OPERATORS */
TOKEN :
{ "||" | "<" | "<=" | "+" | "-" | "{" | "}" | ";" | ","
| "&&" | ">" | ">=" | "*" | "/" | "(" | ")" | ":" | "."
| "==" | "!" | "!=" | ".."| "%" | "[" | "]" | ":="| "=" | "^" }
/* TODO: comments */
SKIP : { "/* */" }
/* TODO: identifiers */
<ID> TOKEN : "TODO"
/* TODO: numbers */
<NUMBER> TOKEN : "42"
/* TODO: characters */
<CHAR> TOKEN: "'a'"
/* TODO: texts (strings) */
<TEXT> TOKEN: "\"TODO\""
``
To run the scanner you will need first to compile things with the
command `make`, and then invoke the class `ParserTokenManager` to read
from standard input (`System.in`):
```sh
java -cp bin ParserTokenManager
Expected Output#
Your program should tokenize its input, ignoring whitespace and comments. The output should be the tokens your scanner recognizes, one per line, like this:
<TOKEN> lexeme
<TOKEN> lexeme
For example, if you enter 'c' you should see the recognised token
echoed back as <CHAR> 'c'.
The lexeme portion comes from the input program. It is the actual
characters that were matched for that token. If the input contains an
invalid token, your program should produce !ERROR on the standard
output and then exit (note that an error message will also be produced
to the standard error).
Testing#
The project repository from which you fork contains some test cases
and the expected outputs. You can run these tests using the grade.sh
script contained at the top level of the repository. Feel free to
devise additional test cases (we don’t promise that what we have given
you is comprehensive).
If you’re stuck, then you can reach out for help anytime—the course help page or discussion forum is a good place to start.