## Outline#

In this week’s lab you will:

1. Learn about new C operators and control flow statements
2. Learn to manipulate character arrays and strings and read/write characters and strings from/to standard input/output
3. Learn to do pointer arithmetic and pass pointers/arrays to functions as arguments
4. Learn how function pointers work in C
5. Practice writing interesting utility programs in C

## Preparation#

In lab 1 you forked the lab template in GitLab, and cloned it to your computer. Now we need to update it with the materials for this week’s lab! This means telling Git about the remote repo to fetch changes from, and merging these changes into your local repo.

1. First, we register the remote repo we want to fetch changes from. To do this, run the following in a terminal open at your local repo root. No need to change the URL here: this time we really do want to use the template repo URL.
git remote add upstream https://gitlab.cecs.anu.edu.au/comp2310/2022/comp2310-labs.git


This command tells Git that the name ‘upstream’ refers to the URL we gave it. We could have picked any name we liked, but ‘upstream’ is typical for this purpose.

2. Next let’s verify that the remote was added correctly. Run the following command
git remote --verbose


If all went well, you should see the following

\$ git remote --verbose
origin  https://gitlab.cecs.anu.edu.au/uXXXXXXX/comp2310-labs.git (fetch)
origin  https://gitlab.cecs.anu.edu.au/uXXXXXXX/comp2310-labs.git (push)
upstream        https://gitlab.cecs.anu.edu.au/comp2310/2022/comp2310-labs.git (fetch)
upstream        https://gitlab.cecs.anu.edu.au/comp2310/2022/comp2310-labs.git (push)


Where uXXXXXXX is your UID. If the origin remote does not contain your UID, you haven’t cloned your personal fork!

3. If one of the URLs is wrong, we can fix it with the git remote set-url command. For example, to fix the origin remote to point to your fork, run the following (with uXXXXXXX replaced with your UID)
git remote set-url origin https://gitlab.cecs.anu.edu.au/uXXXXXXX/comp2310-labs.git

4. Now that we’ve confirmed the remotes are set up correctly, we can pull in changes from the upstream remote. Do this with the following commands
git fetch upstream

git pull --no-ff --no-edit upstream main


The --no-ff flag tells Git that it can pull in changes more aggressively. Depending on your Git configuration, Git may refuse to do the pull if you do not include this flag.

The --no-edit flag tells Git to use the default message for the merge commit. If you leave out this flag Git might ask you to write a message to describe what is being pulled into your repo.

And that’s it. You should now have a folder called lab2 in your repo. Open that folder in VS Code and continue on with the lab.

If you have any questions about the above, ask your tutors for help! It’s important to do this correctly from the beginning, so you are comfortable with it for future labs and the assignments.

## Introduction#

This tutorial introduces new operators and statements in C. We also explore useful functions from the C standard library for reading and writing characters from standard input (keyboard) and reading strings. You will also gain more familiarity with C’s pointer arithmetic, and write a few interesting programs to practice your C skills. For instance, you will write a utility program that counts the number of words a user types from the keyboard. This is a poor man’s version of the wc command in Linux distributions. In addition to the word count program, you will write a number of utility programs to manipulate text and strings. Finally, this tutorial introduces generic pointers and pointers to functions that are useful for implementing generic types and data structures in C.

We cannot cover all aspects of the language in a few tutorials. We therefore provide reference material for you to consult (as needed) along the way.

Some sections in today’s tutorial require a lot of reading. Don’t worry. You will practice writing C code after the reading is over.

## Operators#

The C programming language has a rich set of operators. We will cover three important types of operators below. See this tutorial and the wikipedia page if you are interested in a complete list of operators in C.

### Bitwise Operators#

You have seen bitwise AND/OR operations on binary numbers and also shifts and rotations in COMP2300. C provides a number of bitwise operators for bit-level manipulation. These operators can only be applied to integral operands.

Operator Meaning
& Bitwise AND
| Bitwise inclusive OR
^ Bitwise exclusive OR
<< Left shift
>> Right shift
~ One’s complement / bitwise NOT (unary)

The bitwise AND operator & can be used to turn off bits. The bitwise OR operator | can be used to turn bits on. The shift operators perform left and right shifts of their left operand by the number of bit positions given by their right operand. Consider the following examples

int n = 255;
printf("n = %i\n",n);
n = n & 0xFF;
printf("n = %i\n",n);
n = n & 0xF;
printf("n = %i\n",n);
n = n | 0xFF;
printf("n = %i\n",n);


The final value of n printed on screen is the same as the initial value. Can you see why? Can you predict the intermediate values of n?

Consider the following statements

int k = 1;
int m;
m = k << 10;
m = k >> 1;


We first do a left shift of k by 10. We then do a right shift of k by 1. Notice the placement of the shift operators in relation to the number we want to shift left or right, and the shift amount.

Open the file src/bitwise-and-shifts.c and predict the outcomes of the statements in the program. Now is a good time to practice the bitwise operators!

Open the file src/bitwise-puzzles.c and write missing code for the functions in the file. Each function involves solving a puzzle with the provided constraints. The comments provide the detail for solving each puzzle including the constraints, such as, which operators to use for each puzzle. Write a main() function to test your code.

### Relational Operators#

The relational operators include

Operator Meaning
== Equal to
!= Not equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to

Each of these operators return 1 if the specified relationship is true, and 0 if it is false. The result type is int. Consider the following example

int a = 4;
int b = 10;
int c = (a == b);
int d = (a <= b);
int e = (c == d);


The variable c will have the value of 0 after the third statement executes. Similarly, the variable d will have the value of 1. Can you predict the value of e? Write a simple program to test your hypothesis if you want confirmation. Relational operators are widely used in conjunction with control statements that we cover next.

## Control Flow#

In this section, we will look at

• Selection statements (if and switch)
• Iterative statements (for and while)
• Jump statements (continue and break and return)

### if#

The if statement enables the C programmer to conditionally execute a substatement depending on the value of a ‘condition expression’. The condition expression determines which statements are executed. The expression typically evaluates a condition using a relational operator. We can thus write code to produce different outputs based on different inputs. Consider the following if statement

int sel = 1;
int x = 0;

if (sel == 1) {
x = 2;
printf("x = %i\n", x);
}


In the above example, the variable x is assigned a value of 2 and then printed only if sel is 1. Recall == is an operator that checks if the values of two operands are equal or not. If yes, then the condition becomes true. Since sel is indeed 1, the condition in the if expression evaluates to true or 1. Therefore, the contents of the if block are executed.

It is common pitfall to forget the curly braces around the statements in the if block. If we omit the curly braces from the above example, only the assignment x = 2 will execute conditionally. The printf statement will execute unconditionally.

The following if statement includes an else clause

if (sel == 1) {
x = 2;
} else {
x = 3;
}


In this form, statements inside the if block are executed when the corresponding expression (sel == 1 in this case) is true. Otherwise, the statements in the else block are executed. One of the two blocks is always executed, but never both.

In both forms, the conditionally executed block can be another if statement. A common way to use the so-called if...else ladder is shown below

void printgrade(unsigned int marks) {
if (marks >= 80) {
} else if (marks >= 70) {
} else if (marks >= 60) {
} else if (marks >= 50) {
} else {
}
}


The above if...else ladder tests the value of the unsigned int parameter marks to test its range (declaration of marks is not shown). It first tests if the parameter is greater than or equal to 80. If so, the function prints the student’s grade. Otherwise, it tests whether marks is greater than or equal to 70, and so forth down the if...else ladder. We could equivalently write it as

void printgrade(unsigned int marks) {
if (marks >= 80) {
} else {
if (marks >= 70) {
} else {
if (marks >= 60) {
} else {
if (marks >= 50) {
} else {
}
}
}
}
}


### switch#

The switch statement works like the if...else ladder with the exception that the controlling expression must have an integer type. Consider the following example code with a switch statement

switch(marks / 10) {
case 10:
case 9:
case 8:
break;
case 7:
break;
case 6:
break;
case 5:
break;
default:
}


The switch statement causes control to jump to one of the many cases, depending on the value of the controlling expression (marks / 10) and the constant expression in each case label (10, 9, 8, etc.). If none of the cases match the result of the controlling expression, the default case is executed. Following the jump, code is executed sequentially until the next control flow statement is reached. The break statement terminates the execution of the switch statement, causing control to jump to the execution of the statement directly following the overall switch statement. Forgetting to include break before the next case label is a common source of errors.

Why did we exclude break in the first two cases above? What happens if we omit the first two cases from our switch statement? Compile and run the code in src/grades.c to test your understanding. Note that you can comment out a statement by using // at the start of a statement.

### while and for#

Iterative statements cause substatements (or compound statements) to be executed zero or more times, subject to a termination criteria. Iteration means repetition and thus iterative statements are called loops. The while statement causes the loop body to execute repeatedly until the condition expression evaluates to zero. Consider the following example

int x = 4;
while (x > 0) {
printf("%i is greater than 0\n", x);
x = x - 1;
}

// Output:
// 4 is greater than 0
// 3 is greater than 0
// 2 is greater than 0
// 1 is greater than 0


The evaluation x > 0 happens at the start of every loop iteration. If x is initially 0 or less than 0, then the while loop exits without doing anything. Once the end of the loop is reached, the controlling expression is tested again.

The for statement repeatedly executes a statement and its typical use is when the number of iterations are known in advance. Consider the following code

for (int i = 0; i < 10; i = i + 1) {
printf("i = %i\n", i);
}

// Output:
// i = 0
// i = 1
// i = 2
// ...
// i = 8
// i = 9

• The clause int i = 0 declares the loop counter i and initialises it to 0. The above for loop prints the loop counter i on the screen from 0 to 9.
• The condition expression i < 10 is evaluated before each execution of the loop body.
• The expression i = i + 1 is evaluated after each execution of the loop body.

It is legal to declare or initialise i outside the loop body. But if initialised as shown above, the scope of i is limited to inside the loop body.

The expression i = i + 1 is so popular that C provides an increment operator (++). Therefore, we could use i++ instead of i = i + 1 in the above example.

Common errors involving for loops include forgetting to provide a termination condition such as, i < 10, or forgetting to include a way for the loop to progress, such as, i++. Note that there is also a decrement operator (--) to subtract 1 from an integer variable. The decrement operator i-- can be used in place of i = i - 1.

Despite their popularity, writing C loops demands care. Can you spot the problem in the following for loop and fix it? The programmer’s intention is to print the integers 0 to 9 on screen.

for (int i = 1; i <= 10; i = i + 1) {
printf("i = %i\n", i--);
}


### continue and break#

We can use the continue statement inside a loop to skip the remaining execution of the current loop iteration. A break statement terminates execution of a switch or iteration (for, while) statement. We have already seen the use of break in the switch statement. When used in loops, a break causes the loop to terminate and the program execution to resume at the statement following the loop. Can you predict the outcome of the code below?

int i = 0;
while (1) {
i++;
if (i == 5) {
continue;
}
if (i == 10) {
break;
}
printf("i = %i\n", i);
}

printf("loop finishes with i = %i\n", i);


### return#

The return statement terminates the execution of the current function and returns control to the caller. When an arbitrary function A calls another function B, we say that A is the caller and B is the callee. Recall that functions take input arguments and (optionally) return a value. You have already seen examples of return statements in the previous tutorial. Remember that a return statement can simply return, or it can return an expression. Within a void function (a function that doesn’t return a value), the return statement simply returns. Consider the following examples

int main() {
int a = 1; // declare a variable a of type int with value 1
int b = 2;
int c = 0;
swap(a, b);
c = sum(a, b);
return;
}

void swap(int a, int b) {
// some code...
return;
}

int sum(int a, int b) {
return a + b;
}


In the above example, main is the caller and swap and sum are two callee functions. You will be writing functions later in the tutorial. So familiarize yourself with how functions in C look like, and come back here for reference if needed.

Open the program src/loops.c. Can you guess what will be printed on the screen by the two printf statements in the main function? Run the program to test your guesses.

## Character Input/Output#

C uses char type to store characters and letters. Interestingly, under the hood, the char type is an integer type. C stores integer numbers instead of characters. More specifically, in C, a char variable is stored as one byte in memory with value ranging from $-128$ to $127$. To represent characters, the computer must map each integer to a corresponding character using a numerical code. ASCII is the most common numerical code and it stands for American Standard Code for Information Interchange. Take a look at the ASCII table.

The original ASCII defined characters with values from 0 to 127. Later, many countries used the remaining 128 values in a byte to support their local character set, or more symbols. This lead to the possiblility that an email sent from one country could appear corrupted when read in another. The contents were identical, but the computers were picking different (most likely nonsensical) characters to display!

The following code checks to see if a char value is a valid letter from the English alphabet. Note that the condition checks for both lower-case and upper-case letters.

char c = 'M';
if (
(('a' <= c) && (c <= 'z')) ||
(('A' <= c) && (c <= 'Z'))
) {
printf("Valid letter!\n");
}


You might be tempted to write 'a' <= c <= 'z' like Python allows, but in C this would not evaluate as you might expect. It first evaluates 'a' <= c to 0 or 1, and then compares if this is less than or equal to 'z', which is always true! This is why we split the comparisons and join the results with &&.

Checking if c is between the first and last letters works in ASCII because the numerical values of letters are consecutive from a to z and from A to Z (but not from z to A! There is a gap with symbols in between).

The header file ctype.h provides a number of useful utilities to manipulate characters. You can find several resources with a list and description of useful methods. Here is one useful resource.

### getchar and putchar#

It is useful to be able to read one character at a time from keyboard. Each time it is called, getchar reads one character from the keyboard. The code

int c = getchar();


reads a character from the keyboard and assigns it to the integer variable c. The function putchar prints a character each time it is called:

putchar(c);


prints the contents of the integer variable c as a character on the screen.

When interacting with your program through a terminal, getchar does not return as soon as you press a key. The character data is available only when the user presses Enter to indicate they are happy with what is typed.

### Practice Writing Utility Functions#

We will now look at two programs that use control flow and character manipulation. But before that, we want to introduce two useful elements for writing cool programs. The first one is another compiler directive, namely #define. The #define directive allows the definition of constant values to be declared for use throughout your code. Consider the code below

#define HIGH 1
#define LOW 0

int main() {
int high = HIGH;
int low  = LOW;
printf("High value is %i\n", high);
printf("Low  value is %i\n", low);
return 0
}


Once we define HIGH and LOW to be 1 and 0, respectively, we can use them anywhere in the code. Any occurrence of HIGH will be replaced with 1 and LOW with 0. More generally, HIGH is a symbolic constant and 1 is the replacement text. Symbolic constants are useful for defining constants at the start of a program and judicious use can improve code readability.

A useful symbolic constant defined in stdio.h is EOF (end-of-file) and it indicates there is no more possible input to read. Detecting the end of input is useful when reading input characters from the keyboard as we will see shortly.

The standard input (keyboard) is treated as a special file in C. A program can read input from the keyboard until the end-of-file is encountered. A user can indicate end-of-file by typing a special key combination. If you are typing at the terminal and you want to provoke an end-of-file, use Ctrl-D (Linux, macOS), or Ctrl-Z (Windows). The following C program reads input characters from screen and prints them on screen until the end-of-file is typed.

int c = getchar();
while (c != EOF) {
putchar(c);
c = getchar();
}


Note that EOF is not a character and therefore we must use the larger int type for the variable c against which we compare EOF.

Open the file src/copy1.c and carefully read the code. Compile and run the code. Try to write a condensed version of src/copy1.c in src/copy2.c. Hint: You can do an assignment and test a condition as part of the condition condition in the while loop To test the compiled program, make sure to press Enter each time you input a character from the keyboard.

Open the file src/lines.c and carefully read the code. Can you guess what the code is doing? Compile and run the code to test your guess.

Write a program in src/wordcount.c that counts the number of words typed on the standard input (via keyboard). We define the word as follows: a sequence of characters that does not contain a blank or newline. The program terminates when EOF is encountered.

## C Strings#

Strings are used for representing text and exchange information between users and applications. Recall that there are multiple ways to declare and initialise strings in C,

char *string = "Hello";
char string[] = "Hello";
char string[6] = "Hello";
char string[] = {'H', 'e', 'l', 'l', 'o', '\0'};


When we initialise the string as "Hello", the compiler automatically adds the null terminator (\0).

Why did we specify array length as 6 in the third statement above?

In particular, notice that char string[] and char *string are equivalent declarations. In fact, an array in C is converted to a pointer internally by the compiler. How should we pass arrays to functions? Consider the following code,

int main() {
char str1[6] = "HELLO";
char str2[6];
char str3[6];
str_copy(str2, str1);  // copy str1 into str2
reverse(str3, str2);   // reverse str2 and store in str3
return 0;
}

void string_cpy(char *dst, char* src) {
// not implemented
return;
}

void reverse(char *dst, char* src) {
// not implemented
return;
}


We use char *src and char *dst for string parameters. We can pass the array arguments str1, str2, and str3 that are all declared as character arrays to the string_cpy and reverse functions.

The string_cpy function copies the string from the source string (src) to the destination string (dst), including the terminating null byte. The string_cpy function returns nothing. The reverse function copies the source string src to the destination string dst in reverse order. The function reads the string backward and returns nothing.

Open the src/string-util.c file and implement string_cpy and reverse functions. Don’t forget to set the trailing null bytes!

## Printing and Casting Pointers#

We covered basics of pointers in the last tutorial. Here, we will look further at manipulating pointers, and write practice programs.

It is sometimes useful to print pointers on the screen for debugging purposes. The following statements print the addresses of the marks array from the previous lab.

printf("ptr = %p\n", &marks[0]);
printf("ptr = %p\n", &marks[1]);


Compile and run the program called src/print-ptrs.c and verify that indeed the elements of the marks array are four bytes apart?

What if we want to print the address of individual bytes in the marks array and not integers? The following program reinterprets int * as char * and then prints the addresses of individual bytes.

//type conversions
char *byte0 = (char*) marks;
char *byte1 = (char*) marks + 1;
char *byte2 = (char*) marks + 2;

//print statements
printf("byte0 = %p\n", byte0);
printf("byte1 = %p\n", byte1);
printf("byte2 = %p\n", byte2);


Compile and run src/print-ptrs.c again and observe memory addresses of individual bytes that make up the array elements in the marks array.

## Pointers As Function Arguments#

Recall how functions in C look like from the last tutorial. Functions take input arguments and they optionally return a value. C passes arguments to functions by value (affectionately called pass by value or call by value). These phrases mean that when we provide an argument to a function, the value of that argument is copied into a distinct variable for use within the function. Consider the following function that tries to swap two arguments. We call this version of swap the version 1 or v1.

void swap_v1(int a, int b) {
int temp = a;
a = b;
b = t;
printf("a = %i, b = %i\n", a, b);
}


We first safely store a in a temporary local variable called temp. We then do the swapping. Run the program src/swap-v1.c and you will find something strange. Although a and b have been successfully swapped inside the swap function, when we print a and b from the main function, we still see their original (non-swapped) values.

Why do you think this is happening?

We can use pointers to rewrite the swap function. We can use the indirection (*) operator to declare pointers and dereference them. Consider the following prototype of the swap_v2 function

int main() {
int a = 2, b = 3;
swap_v2(&a, &b);  // pass the memory addresses of a and b
}

void swap_v2(int *a, int *b) {
int temp = ...;
}


Passing arguments in the above fashion is called pass by reference. We are not providing the swap_v2 function the actual values of a and b. Instead, we are passing a reference to a and b. In other words, we are providing the swap_v2 function with the memory addresses of a and b.

Open src/swap-v2.c and write the code to swap a and b in the swap_v2 function. Test that your new function definition swaps the input arguments, so even from the main function, when we print a and b, we observe their values swapped. Does your new function work?

Pointers as function arguments serve another important purpose. Recall that C functions can return only a single value. What if we want our functions to return more than one value? We can pass the address of a variable to a function (i.e., pass a pointer as the input argument). The function can then assign a value to the variable pointed-to by the input argument using the dereference operator.

Read the program in src/return2.c. What value of ret1 and ret2 do you expect to see on the screen? Run the program and test your hypothesis.

Open src/sum.c and complete the function definition of the sum function. Make sure it correctly sums up the elements of the input array to the function. Note again that array arguments can be passed to functions as pointers, and we can do the usual pointer arithmetic on these input arguments. So, sum(int *array) is the same as sum(int array[]).

Write a program that compares two strings character-wise and returns a 1 if the two strings are equal. Otherwise, the function returns 0. Write your code in src/strcmp.c. You can rely on the caller to pass null-terminated strings to the str_cmp function.

Write a function that takes a null-terminated string as an input argument. It computes and returns the length of the string. Test your function by writing a C program with a main function. Write the code in src/strlen.c.

## Function Pointers#

Function pointers are pointers that point to executable code (typically other functions). They are used to treat functions as regular data. This means that it is possible to define pointers to functions, which can be assigned, placed in arrays, passed to functions, and even returned by functions. Unlike regular pointers, the type of a function pointer is described in terms of a return value and parameters that the function accepts. Declarations for function pointers look as follows:

int (*match)(int *key1, int *key2);


The above declaration means that we can set match to point to any function that accepts two int pointers and returns an integer. If we have a function match_int as below,

int match_int (int *k1, int *k2) {
if (*k1 == *k2) return 1;
return 0;
}


We can set match to point to the above function with the following statement:

match = match_int;


To execute a function referenced by the function pointer, we simply use the function pointer where we would normally use the function itself. For example,

int x = 10;
int y = 12;
int val = match(&x, &y);


Function pointers are useful for encapsulating functions into data structures. Typically, a function pointer is made a member of a data structure, and the pointer is used to invoke one of the many functions based on the type of data that is stored in the data structure. An (optional) exercise at the end of this handout helps you to explore this use of function pointers further.

Open the file src/reduction.c and read the comments to fill in the missing parts of the code. The compiled code should either reduce the samples array using reduce1 or reduce2 depending on whether the user runs the compiled binary with -r1 or -r2 argument on the command prompt. If you write the code we ask for properly, you will observe the output 55 for -r1, and 3628800 for -r2.

If you have come this far, congratulations! If you manage to solve the problems below, send an email to the course convener with the title Lab2-Advanced-Finished.
Write the program tail, which prints the last n lines of its input. By default, n is 5, but it can be changed by an optional argument, so that tail -n prints the last n lines. The program should behave gracefully regardless of the input or the value of n. You can store the lines in a two-dimensional array of fixed size. If the user enters more lines than a threshold, or lines bigger than the maximum length, the program must terminate gracefully. Note that we do not provide you a template for this program. You need to write this program from scratch.
If you already know how malloc() works, try to make the best use of available memory instead of using a two-dimensional array of fixed size.
The next exercise explores the power of function pointers with a sorting program that either sorts lines input by the user (via keyboard) lexicographically or numerically. Specifically, if the optional argument -n is given, the program will sort the input lines numerically. A sort typically consists of three parts: (1) a comparison that determines the ordering of a pair of objects (e.g., numbers), (2) an exchange that reverses their order, and (3) a sorting algorithm that makes a sequence of comparisons and exchanges until the objects are in a proper order. Note that the sorting algorithm is independent of the comparison and exchange operations, so by passing different comparison and exchange functions to it, we can arrange to sort by different criteria. Let’s explore this decoupling of concerns via function pointers in the exercise below.
Open the source file src/qsort.c. The source code read lines from the input and sorts them lexicographically using the quick sort algorithm. You do not need to understand quick sort to solve this problem, but if you have the time to explore it, that would be great! Run the program and type a bunch of lines and then enter EOF and observe the output. Now, we would like to add the -n option to the program that sorts the lines entered from the keyboard numerically. Write a function called numcmp that takes two input strings and convert them to double types, and returns -1 if the first argument is less than the second argument, and 1 if the first argument is greater than the second argument, and 0 otherwise. You can use the atof utility function to convert a string to a floating point value. If the user enters -n at the command prompt, then the program should compare the input lines numerically. To test numerical sort, you can input one integer per line from the keyborad and then enter EOF. Note that you will need to change the line in the main function that calls quick_sort.
Note that the src/qsort.c program makes use of generic void* pointers. Normally, C allows assignments only between pointers of the same type. A generic pointer in C is declared as a void pointer in C and it can be assigned to a pointer of any type. Once again, generic pointers are useful for implementing data structures, which we will explore in the future.