Week 5 Code Critique Tutorial
In this week’s lab, you will learn about:
- Code review: a process in which code in reviewed by someone other than the author
- Best practices of writing code; following the principles of code comprehension, code consistency across codebase, through testing. The principles mentioned would be agnostic with respect to programming languages / domain. In general, additional principles should also be followed for different programming languages (as probably mentioned in their documentation), however these principles would provide a solid foundation.
Prerequisites
- You have installed
Java
(preferably with IntelliJ as the IDE), and have some experience in programming in it. - (Preferable) You also have
GitKraken
installed.
Overview
The general principles of good code are inspired from MIT’s code review tutorial, under the CC BY 4.0 license [1]
We would be using 3 specific questions from Advent of Code 2021. It is a set of algorithmic programming puzzles for people in a variety of skillsets / levels.
Aside: Code Review Flow @ Google
From the SE Engineering @ Google book, a typical code review goes through the following steps:
- A user commits and pushes changes to the codebase into their workspace, until it is ready to be merged. Upon being ready, a PR is also created with the appropriate description of the changes.
- Automated code reviews (such as Continuous Integration tests) run and allow the author to self-review the changes. When the author is satisfied, they mail the change to one/more reviewers.
- The reviewer opens the change and checks the
diff
of the feature. If something needs to be worked upon, they post comments on the diff, requesting explicit resolution/extra information (move to step 4). Else, if they are happy with the latest stage of the change, they accept it by posting the commentLGTM
(Looks good to me) (move to step 5). Only one LGTM by any reviewer is required by default. - The author modifies the codebase based on the requested changes and uploads additional commits based on that. They then reply back to the reviewers (move back to step 3). Note that step 3 and 4 may be repeated multiple times.
- After a change is marked LGTM, the author is allowed to merge the changes to the codebase.
Since Google uses a Centralized VCS, they follow the strategy of review-before-commit (i.e. reviews before each and every commit). Other projects (especially Open Source) which use git
use the policy of reviewing code before merging it in the main
branch
Reading: Task 1
For this task, we would be comparing on how 2 solutions which are functionally equivalent to a given program can vary greatly in design, and what are some good code practices to follow.
Discuss on what makes code have good design by following the principles of readability and maintainabity.
We would be looking at a solution with bad code practices being refactored into one with good set of practices for the given Advent of Code 2021’s question: https://adventofcode.com/2021/day/2
For this task, compare the codebases provided by:
See how the refactoring was done by following the principles listed below.
-
<a href=# onclick=”showSelectiveLink(1)”> Don’t repeat yourself (DRY) </a>
-
<a href=# onclick=”showSelectiveLink(2)”> Comments where needed </a>
-
<a href=# onclick=”showSelectiveLink(3)”> Fail fast </a>
-
<a href=# onclick=”showSelectiveLink(4)”> Avoid magic numbers / strings </a>
-
<a href=# onclick=”showSelectiveLink(5)”> One purpose for each variable </a>
-
<a href=# onclick=”showSelectiveLink(6)”> Use good names </a>
-
<a href=# onclick=”showSelectiveLink(7)”> Use formatting and whitespace to help the reader </a>
-
<a href=# onclick=”showSelectiveLink(8)”> Don’t use static global variables </a>
-
<a href=# onclick=”showSelectiveLink(9)”> Functions should return results, not print them </a>
-
<a href=# onclick=”showSelectiveLink(10)”> Avoid special case code </a>
Don’t repeat yourself (DRY)#
Example: Bad Quality Code Snippet#
if (direction == "up") {
this.direction = Direction.UP;
}
else if (direction == "down") {
this.direction = Direction.DOWN;
}
else if (direction == "forward") {
this.direction = Direction.FORWARD;
}
else if (direction == "backward") {
this.direction = Direction.BACKWARD;
}
else {
throw new IOException("Invalid Input");
}
Example: Good quality Code Snippet#
this.direction = switch (direction) {
case "up" -> Direction.UP;
case "down" -> Direction.DOWN;
case "forward" -> Direction.FORWARD;
case "backward" -> Direction.BACKWARD;
default -> throw new IOException("Invalid input");
};
Duplicated code is a risk to safety. If one has identical / very similar code in two places, then the risk of having a bug is being increased unecessarily (by the virtue of the increased code base’s size), and also, if there is a bug found, then one would have to find out all the copies where the bug happened and the code maintainer has to ensure that the code is fixed in all instances.
In the above case, we see that this.direction
(as a class field) and direction
(as the parameter) is only being defined once.
Comments where needed#
Example: Bad Quality Code Snippet#
public ArrayList<DirectionContainer> read_input(String filename) {
// Somehow reads files: don't change!
ArrayList<DirectionContainer> arr = new ArrayList<>();
...
return arr;
}
Example: Good quality Code Snippet#
/**
* Reads an input file according to the specification of each line
* being a word and a number separated by a whitespace
*
* @param filename the input file name to be read
* @return An array of Movement Directions
*/
public ArrayList<DirectionContainer> read_input(String filename) {
// List of direction steps to be taken in type-safe form
ArrayList<DirectionContainer> arr = new ArrayList<>();
// Use BufferedReader for performance
// See: https://docs.oracle.com/javase/8/docs/api/java/io/BufferedReader.html
try (BufferedReader reader = new BufferedReader(new FileReader(filename))) {
String line;
// While reading till EOF
while ((line = reader.readLine()) != null) {
// The first word is the direction (in String)
// and the second word is the number of steps
String[] command = line.split("\\s+");
arr.add(new DirectionContainer(command[0], Integer.parseInt(command[1])));
}
} catch (IOException x) {
System.err.format("Input file: failed reading %s%n", x);
}
return arr;
}
While one should strive for self-documenting code, care should be taken in writing judicious comments. Good comments increases readability, reduces bugs (because the assumptions of pre/post conditions have already been documented), and readier for change.
The code with bad style does not use any comments for context for the reader, and the single comment is not helpful to the reader. In contrast, the below example uses the following principles:
- Specification in the form of
javadoc
, which appears above a function or above a class and documents the behaviour of the function/class. - Writing the reasoning of some design choices in the code (such as using
BufferedReader
), although this can also be mentioned separately in a report. - Specifies the source of a piece of code that was copied/adapted/used in reasoning. Again, we see that the reasoning for the use of
BufferedReader
. - Bad and unnecessary comments (for example literal meaning of each statement) is not given. One should understand that the reader has the proficiency to read
Java
syntax.
Fail fast#
Example: Bad Quality Code Snippet#
switch (md.getDirection()) {
case "forward" -> x += md.getSteps();
case "backward" -> x -= md.getSteps();
case "down" -> y += md.getSteps();
case "up" -> y -= md.getSteps();
default -> throw new IOException("Bad Input");
}
Example: Good quality Code Snippet#
// In Submarine.calculatePosition
switch (md.getDirection()) {
case FORWARD -> x += md.getSteps();
case BACKWARD -> x -= md.getSteps();
case DOWN -> y += md.getSteps();
case UP -> y -= md.getSteps();
}
Failing fast means that code should reveal its bugs as early in development as possible. Static languages such as Java
provide reasonable type support as compared to dynamically typed languages like Python
, which is more prone to runtime errors. In the example given above, after getting the input, we assume that it is sanitized and store it as a new enum
type in Java, such as when a switch statement is run, we can be assured that most of the input would already be validated and corruption in function exection would not happen.
Avoid magic numbers / strings#
Example: Bad Quality Code Snippet#
s.read_input("inputs/input.txt");
Example: Good quality Code Snippet#
public static final String INPUT_FILE = "inputs/input.txt";
...
s.read_input(INPUT_FILE);
According to wikipedia, Magic Numbers are unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants.
In the example given above, we don’t know what is “inputs/input.txt” just from the use of it. It is generally better to rename it as INPUT_FILE
within the respective class, so as to make it easier to change (as constants - generally noted by the final
keyword in Java by convention, need to change in the future). Note that we have defined INPUT_FILE
as static
to save memory.
One purpose for each variable#
Example: Bad Quality Code Snippet#
public ArrayList<DirectionContainer> read_input(String filename, ArrayList<DirectionContainer> arr) {
...
// Make operations with arr
...
return arr;
}
Example: Good quality Code Snippet#
public ArrayList<DirectionContainer> read_input(String filename) {
// Define arraylist locally
ArrayList<DirectionContainer> arr = new ArrayList<>();
...
// Make operations with arr
...
return arr;
}
In the bad example given above, the parameter arr
is reused to compute a very different value compared from the beginning (by appending) – the return value of the function. Instead, it would be better to make this a pure function, with only one parameter fileName
and defining arr
in the local scope of the function.
Try not to reuse parameters and reuse variables. Their declaration does not carry a heavy overhead in programming, and minimalizating change of state would also tend to make a function with less/no side-effects. Embedding functional programming principles in Java would have the properties of safety-from-bugs and ready-to-change.
Use good names#
Example: Bad Quality Code Snippet#
public static void main(String[] args) {
Question q = new Question();
ArrayList<DirectionContainer> md = q.read_input(INPUT_FILE);
int fd = s.fDive(s.fPos(movementDirections));
System.out.printf("Final multiplication position relative to origin: %d", fd);
}
Example: Good quality Code Snippet#
// Read the input file and calculate the final dive
public static void main(String[] args) {
Submarine s = new Submarine();
ArrayList<DirectionContainer> movementDirections = s.read_input(INPUT_FILE);
Position finalPos = s.calculatePosition(movementDirections);
int finalDive = s.calculateDive(finalPos);
System.out.printf("Final multiplication position relative to origin: %d", finalDive);
}
Good function/variables names are generally long and self-descriptive in nature. For example, instead of:
int tmp = 3600; // tmp is the number of seconds in an hour
we can write:
int secondsPerHour = 3600;
Java-like languages also use snake-case naming to distinguish global constants (as seen from INPUT_FILE
). But the local variables declared inside the function are read in camelCase notion (such as movementDirections
).
Effectively Naming Software Thingies and Robert Martin’s Clean Code (Chapter 2) is highly recommended for best code practices in OOP languages.
Use formatting and whitespace to help the reader#
Example: Bad Quality Code Snippet#
ArrayList<DirectionContainer> md = q.read_input("inputs/input.txt", new ArrayList<>());
Example: Good quality Code Snippet#
// Think about what would happen in multiple arguments
ArrayList<DirectionContainer> md = q.read_input("inputs/input.txt",
new ArrayList<>());
Having consistent style guidelines (not too restrictive though) can help maintain code more easily (easy to refactor / increasing readability / etc). It is followed by the following principles:
- IntelliJ has an auto code formatter invoked by
Ctrl + Alt + Shift + L
which helps in using constistent identation / whitespaces in code. - Putting linebreaks and appropriate spaces if the function signature is too large (like in the example provided above).
- Using spaces over tabs (although holy wars have been fought over this). See [2]
Don’t use static global variables#
Example: Bad Quality Code Snippet#
// In Question.java
public static int x, y;
Example: Good quality Code Snippet#
// In Position.java
public class Position {
private int x, y;
public Position(int x, int y) {
this.x = x;
this.y = y;
}
...
// getters and setters
From [3]:
Static variables are generally considered bad because they represent global state and are therefore much more difficult to reason about. In particular, they break the assumptions of object-oriented programming (OOP). In OOP, each object has its own state, represented by instance (non-static) variables. Static variables represent state across instances which can be much more difficult to unit test. This is mainly because it is more difficult to isolate changes to static variables to a single test.
If one is going to declare a global variable, one should aspire to make it immutable (to be used as an environment variable / global constant). This can be done by using the final
keyword.
Functions should return results, not print them#
Example: Bad Quality Code Snippet#
public void calculateDive(Position p) {
System.out.println(Math.abs(x * y));
}
Example: Good quality Code Snippet#
public Position calculateDive(Position p) {
// Return value instead of printing it within the function
return Math.abs(p.getX() * p.getY());
}
In the bad example provided above, calculateDive()
sends the string to standard output, thereby limited the exposure of usign the result in other contexts than what is present in the function scope. If this is somewhat needed in the future due to less maintainability, the part of the codes using this function would have to rewritten. Hence, in the below example calculateDive
returns the Integer result to be potentially used in further computer / seen by the programmer.
In general, only the highest-level parts of a program should interact with the human user or the console. Lower-level parts should strive to be pure functions (without side-effects). The sole exception here is debugging output, which can of course be printed on standard output. But that kind of output should not reach the design of production code, only a part of how you debug your message.
Avoid special case code#
Example: Bad Quality Code Snippet#
public void calculatePosition(ArrayList<DirectionContainer> movementDirections) {
if (movementDirections.size() == 0) System.out.println("0");
int x = 0;
int y = 0;
for (DirectionContainer md : movementDirections) {
switch (md.getDirection()) {
case FORWARD -> x += md.getSteps();
case BACKWARD -> x -= md.getSteps();
case DOWN -> y += md.getSteps();
case UP -> y -= md.getSteps();
}
...
}
Example: Good quality Code Snippet#
public void calculatePosition(ArrayList<DirectionContainer> movementDirections) {
// Remove special case handling
int x = 0;
int y = 0;
for (DirectionContainer md : movementDirections) {
switch (md.getDirection()) {
case FORWARD -> x += md.getSteps();
case BACKWARD -> x -= md.getSteps();
case DOWN -> y += md.getSteps();
case UP -> y -= md.getSteps();
}
...
}
The starting if
statement in the bad code example is unnecesary. It it were ommitted, and the movementDirections
array is empty, then the block in for
loop is not run and we get the same expected output. In face, handling this special case separately has led to a potential bug - a difference in the way an empty array is handled, compared to a non-empty array that happens to have no direction instructions to it.
Hence, actively resist the temptation to handle special cases separately. Tackle the general case first, and see whether all the scenarios could be handled by it’s current version / very slight modification of it.
Write a clean, simple, general-case algorithm first, and optimize it later (if possible through special cases), only if there is evidence that it would actually help.
Task 2
Code reviews generally require a combination of a process and a tools supporting that process. We learnt about part of the process last week by using tools such as the git
with Trunk-Based development, and GitLab for code review.
This week would require us to also see how to main code more readable and maintainable while following this process. You would be improving at the codebase provided here. This is the solution to Question 2 of Advent of Code 2020 edition.
Discuss and work in groups (with the same methodology as in previous labs - forking the repo, adding developers and branching strategy) on improving the codebase using the style guidelines mentioned in the previous exercise.
When you are satisfied with the code changes, show your work to the tutor for additional review.
Task 3
Pair Programming is a knowledge-sharing process while developing software collaboratively. The classic pair programming context has 2 roles:
- The Driver: The person at the keyboard, focused on completing the specific goal at hand, ignoring the larger picture for the moment. He follows the tactical aspects of the work by focusing on the current task.
- The Navigator: Reviews the code being typed, and guides the driver on what to do according to how the strategic direction of the work being done.
The Driver and the Navigator frequently switch roles over a certain period (generally after every 30 mins), so as to facilitate shift of perspective, and in the process exchange new ideas.
{ .do-box } Form groups of 2 people each. Look at the question provided here, and decide on how you would go about this as a pair.
If you cannot get the exercise done in time of the tutorial, you can complete the question by remote pair programming. There are tools such as Visual Studio Live Share available, which can help you complete the task. The question for the extension was given as a simple requiring less code to complete, however, if you feel upto the task and want to solve a more difficult problem, look here.
Extension: Task 4
Look at Google’s guidelines on code formatting and linting practices for a specific language, and apply them in further Assignments or previous projects.
References
[1] MIT 6.031 - Code Review Recitation
[2] Tabs vs Spaces