Outline and Preparation#

In this week’s lab, you will:

  • Learn to fork and properly reap child processes
  • Learn to execute new programs in the context of an existing process
  • Learn how shells and other similar utilities are implemented
  • Learn how signal handlers are written
  • Practice writing and debugging (concurrent) systems programs

This lab uses a new repository, make sure you fork and clone it!

  1. Log in with your uni ID/password and open the lab template on GitLab here.

  2. Fork the repository by clicking on the fork button. This will create your own copy of the lab repository.

  3. Check the URL bar and make sure it contains your uid (eg. gitlab.cecs.anu.edu.au/*your-uid-here*/comp2310-2024-lab-pack-2). This means you are looking at your copy of the lab, and not the template. Once you’ve done this, click Clone and copy the address next to “Clone with HTTPS”.

  4. Open a terminal and clone with the Git clone command:

git clone address_from_step_3_here

Type in your uid and password when requested.

  1. You should now have a local copy of the repo on your computer. Try moving to the correct directory with the cd and ls commands from earlier.

Introduction#

In the previous tutorials you’ve practised C code, as well as understood a bit on how that C code is compiled into the assembly language you learnt all about in COMP2300. Now we will teach you to write systems software, the cornerstone of systems programming. Systems programming involves writing low-level code that interfaces directly with the operating systems kernel and system libraries. In the coming tutorials, you will write and implement code for controlling processes and handling signals, and even write a memory allocator for manual memory management (which you have so far done via the C standard library’s malloc()/free() functions.) Knowing the art of systems programming will make you a powerful programmer. With the skills gained, you can start hacking virtual machines, interpreters, editors, shell environments, web browser internals, etc. This specific tutorial deals with managing processes and handling signals.

Reading man pages#

There are a lot of details about system-level functions that we cannot cover in lectures or tutorial handouts. However, fortunately Unix-like operating systems have manual pages that allow easily looking up these functions, which you can access with the command man.

Read the manual pages for fork and execve. Don’t worry if there are parts that you don’t understand yet — you’ll get more hands-on experience with these system calls during this lab.

In future labs we will often provide less about system calls and functions and instead direct you to the man pages. It is a good idea to get used to using man now. Some useful tips:

  1. man itself has a manual page! If you run man man you will see important information man itself and the arguments you can pass it.

  2. Man pages are split into different sections. For example, section 2 is the section that contains man pages for system calls (such as fork) and section 3 is for library functions (such as printf). When you type man some_name to find a specific manual page, man will automatically search for a matching man page in these different sections.

    You may find that the man page you get is not the one you wanted, if it belongs to multiple sections. You can tell man to search in a specific section by running man sectnumber name.

    For example, if you run man printf you will most likely get the man page for the shell command printf, rather than the C standard library function. If you run man 3 printf, you will instead be given the manual page for the C function.

    There are also man pages for terminal commands like ls and grep, which are in section 1.

  3. You can see which section the manual page you are current viewing belongs by looking in the top left corner. The section is in brackets after the manual page name (e.g. printf(3)).

  4. Linux Man Pages Online contains mirrors of all manual pages present on a linux system. You can use this if you find using the CLI interface cumbersome.

Shell and Terminal#

You have been using a shell and terminal to run commands throughout this course so far, but what you may not have realised is that the shell program is not part of the OS. It is a user-level application which interfaces with the OS, and in fact you can write a shell program yourself! If you’ve been using Linux or WSL, the specific shell you’ve been using is likely to be BASH (the Bourne-Again SHell), and you can find the binary for this program at /bin/bash. We proceed to define what a “shell” actually is, in abstract terms.

A shell is an interactive user application program that runs other programs on behalf of the user. Note that a terminal is an interface that provides a display for output and a keyboard for input to a shell session. The shell is the interpreter that executes commands typed as strings. Typically, shells can run a job in the foreground or background. While a foreground job runs, the shell waits for the job to finish before interpreting the following command the user types. The shell user can submit a background job by appending an & after the command. Below is a typical session with the shell environment:

> ls
forks
> ./forks 1
Parent has x = 0
Bye from process 2647667 with x = 0
Child has x = 2
Bye from process 2647668 with x = 2
>./forks 1 &
[1] 2647669
>
Parent has x = 0
Bye from process 2647669 with x = 0
Child has x = 2
Bye from process 2647670 with x = 2

By convention we use > at the beginning of lines in the above snippet to indicate a command typed into the shell. Other lines are the output of those commands.

Depending on the particular shell you are using it may be a different character, such as % or $.

You have hopefully had some practice working in a shell environment, so this should seem familiar to you. A few aspects of user interaction to note in the above session:

  • Some commands are built-in shell commands. In the above session, the examples are the echo and cd commands.

    The shell can also run any program for the user. For example, ls and forks are external programs. To run them, we need to inform the shell where the specific program resides. There are two ways of doing this:

    1. Using the PATH environment variable. This is a special variable that your shell uses to search for executable programs, consisting of different directories it should use to search for executable programs. The program ls will be somewhere in your PATH1.

    2. Directly telling the shell where the program is. The forks program is not present in the shell’s PATH, so we invoke it as ./forks. The ./ prefix tells the shell to look in the current directory for an executable named forks.


  • When we type ./forks 1, the shell waits for forks to finish executing and only returns to read/interpret the next command after forks terminates.

  • When we run forks in the background by using an &, the shell returns immediately, printing some information about the child process it creates. You can verify that & indeed sends the job in the background by writing a program with a while (1) {;} loop, submitting it in the background, and then using ps to verify the program is still running. Try running the same program in the foreground without the &.

Make sure you understand the difference between foreground and background jobs as explained above. This will be important for completing Exercise 2.

ps and execve#

Linux and most Unix-like operating systems provide a utility called ps (“Process Status”) for viewing information about processes on a system. The ps command is used to list the currently running processes and their PIDs, along with some other information depending on what flags the ps is run with.

You will need to use ps in Exercise 1, so make sure you understand the above explanation. You can read the manual page for ps by running the command man ps if you would like more information.

By default, simply typing ps shows information about the processes running in the current shell.

The lecture slides from the third week cover all these system-level functions in detail. Please refer to the slides in the case you need to brush up your understanding.

Here, we discuss execve() again for your convenience. The execve function loads and runs a new program in the “context” of the current process.

#include <unistd.h>

int execve(const char *filename, const char *argv[], const char *envp[]);

This function loads and runs the executable object file filename with the argument list argv and the environment variable list envp. Execve returns to the calling program only if there is an error (e.g., filename does not exist). To see how the argument list and the list of environment variables look like before and after the call to execve, check the relevant section of the Lecture Slides (slides 49–52).

Note that execve belongs to the exec family of system calls. The v in execve comes from the fact that it takes an argument argv to be the vector of arguments to the program (the same way the main function of a C program will). The e comes from its second set of arguments, envp, which is a vector of the environment variables (key/value pairs like LANG=english).

Signal Handling#

A signal is a small message that notifies a process that an event of some type has occurred in the system. Each signal has a name and a number. For example, on Linux SIGINT, SIGKILL, and SIGSEGV are well-known signals with numbers 2, 9, and 11 respectively. Signals provide the kernel a mechanism to expose low-level hardware exceptions (e.g., divide by zero) to the user processes. Imagine a process attempts to divide by zero, then the kernel sends a SIGFPE signal (number 8) to the process.

You have most likely encountered the dreaded “Segmentation Fault” by now - when a process makes an illegal memory access, the kernel sends it a SIGSEGV.

If you type Ctrl+C in your shell, the kernel sends a SIGINT (for “interrupt”) to the foreground process. A program that receives a SIGINT signal will usually terminate, though as you will see later in this lab it is possible for a process to ignore this signal.

The SIGKILL signal is used to forcibly terminate a process. It cannot be blocked or ignored.

When a child process terminates or stops, the kernel sends a SIGCHLD signal to the parent process.

The kernel delivers a signal to a process for different reasons, including when:

  • The kernel has detected an event such as divide-by-zero error or a child process has terminated.
  • A process has invoked kill to explicitly request the kernel to send a signal to the destination process.

There is a default action associated with each type of signal. A process can modify the default action associated with a signal by installing a signal handler. A process can install a signal handler by using the signal function. This function takes two arguments:

  1. the number of the signal
  2. a signal handler associated with the signal.

The prototype of the signal function is as follows:

#include <signal.h>

typedef void (*sighandler_t)(int);

sighandler_t signal(int signum, sighandler_t handler);

The function returns a pointer to the previous handler if everything goes well, and SIGERR on error.

For reasons of portability, you should always use the names of signals in your programs and not the numbers (make sure to include the signal.h header file). This is because the actual integer associated with a signal may differ between systems.

You can install a signal handler by calling the signal function and providing the signal number and a pointer to a function that returns void and take an integer argument. When a process catches a signal of type k, the handler installed for signal k is invoked with a single integer argument set to k. This argument allows the same handler function to catch different types of signals.

Using kill#

Processes send signals to other processes (including themselves) by calling the kill function. kill is also available as a shell command.

#include <sys/types.h>
#include <signal.h>

int kill(pid_t pid, int sig);

If pid is greater than zero, then the kill function sends the signal number sig to process pid.

Printing from Signal Handlers#

It is unsafe to call printf, sprintf, malloc, and many other system-level functions inside a signal handler. In technical terms this is because these functions are not reentrant. A reentrant function is one that is able to be interrupted during its execution and safely re-called before the previous invocation completes its execution. This typically means reentrant functions make use of local variables only. Non-reentrant functions maintain state across function calls (e.g., by declaring variables with the static storage duration).

Exercise 1: Understanding fork()#

The file forks.c contains various functions that make use of fork(), wait() and waitpid() in different ways. You can run the program by providing a number as an input argument. To run the first function called fork1(), run ./forks 1.

Read the code for each function and then, without running, determine the function’s behaviour, including what will be printed on the screen. You will need to predict if the function will create any zombies. Verify your prediction using a combination of output on the screen and the ps utility. Terminate any zombie processes with the kill command.

You may find it useful to draw a process graph to understand the behaviour of the different functions. Process graphs were covered in the lectures — see slides 37–42 of the week 3 lectures for a reminder.

Exercise 2: Reaping Background Jobs in shellx#

Programs such as shells in Unix-like operating systems and web servers make heavy use of the fork and execve system calls. This exercise provides you with a basic template for building a shell in shellx.c.

Real shells such as Bash have many features beyond what we have discussed, such as output redirection. This program shellx is a very miniature example of a shell. But it demonstrates how you can build a shell as a user level program using system calls!

This shell program will properly reap foreground jobs, but it will not properly reap background jobs. This will leave “zombie” processes in the systems. Your task is to understand what the program is doing and then fix it so it correctly reaps the background jobs by using a signal handler.

Open the file shellx.c and try to understand the code, including key functions. You should have an intuition what the program is doing: reading and parsing user commands, and interpreting them one-by-one, and reaping foreground jobs properly.

Don’t worry too much as this stage if the details of the parseline function does not make sense to you. All you need to know is that it “sets” up the argument list required to execute a function.

In shellx.c, install a signal handler for the SIGCHLD signal. Every time the kernel sends the parent process the SIGCHLD signal, the parent should reap the defunct child process. Your signal handler should not exit the program, and return control to the main program.

Exercise 3: Reaping Multiple Child Processes#

In this exercise, we provide a program (child_handler.c) that attempts to use signals to allow a parent process to count events. Each event corresponds to a terminated child.

The idea is that the parent process tracks the number of terminated children in a global variable. It creates N child processes and sets ccount to N. Then, the signal handler decrements ccount each time it is called. If everything is correct, the parent process will eventually terminate. The parent will run forever if something is wrong with the program. Can you see what the program is doing? Can you verify that the parent indeed runs forever?

We provide you with the following property regarding signals that is key to finding and fixing the bug: Pending signals of a specific type are not queued.

The kernel maintains a pending bit vector containing exactly one bit for every signal type. Therefore, there can be at most one pending signal of a particular type. The existence of a pending signal merely indicates that at least one signal has arrived. If the signal handler is executing and two more signals are delivered to the destination process, then the kernel only keeps track of the fact that there is at least one pending signal of type k. It does not keep track of the number of signals of type k delivered while the signal handler runs.

Open the file child_handler.c. Identify and try to fix the bug in the child_handler function.

Volatile#

In child_handler.c file you may have noticed the line

volatile int ccount = 0;

The volatile keyword in C is a a qualifier that is applied to a variable when it is declared. Its main purpose is to inform the compiler that the value of the variable may change at any time, even in the absence of any updates to a variable in the the code the compiler finds nearby.

When compilers generate code for the rest of the program they sometimes cannot predict that the value of a variable can be changed by a signal handler. These handlers are not tightly coupled with the rest of the program like normal routines/functions.

Consider the following code:

int _exit = 0;

main() {
  while (!_exit) {
    /* tight spin loop which is completely visible to the compiler */
  }
  return;
}

The compiler can see that the loop body and the rest of the main() function does not touch (or write/update) _exit . Therefore, it is allowed to convert the loop to a while (1) loop. Even if the _exit variable is set to 1 by a signal handler for SIGINT, the compiler has no way to know that.

Similarly, the compiler can notice that a variable never changes its value, and stores (caches) its value in a fast register (treating it as a constant) instead of reading it from memory. Then, if the signal handler updates the value in memory, the main function will never see the change.

However, if the _exit variable is declared volatile, the compiler will generate assembly code that will load the most up-to-date value of the variable _exit from memory every time. Thus, the volatile qualifier tells the compiler that the specific variable it qualifies can be modified elsewhere.

Note that the combination of your specific compiler and operating system may lead to different behaviour. Write a couple signal handlers as part of the lab exercises and let us know if you observe anything strange!

Exercise 4: Snooze#

The sleep function suspends a process for a specified period of time.

#include <unistd.h>

unsigned int sleep(unsigned int secs);

It returns zero if the requested amount of time has elapsed before the program resumed, and the number of seconds still left to sleep otherwise. The latter case is possible if the sleep function returns prematurely because it was interrupted by a signal.

Write a wrapper function for sleep called wakeup, with the following interface:

unsigned int wakeup(unsigned int secs);

The wakeup function behaves exactly as the sleep function, except that it prints a message describing when the process actually woke up:

Woke up at 4 seconds.

Now, write a program called snooze that takes a single command-line argument, calls the wakeup function with this argument, and then terminates. Write your program so that the user can interrupt the snooze function by entering Ctrl+C. For example:

shell> ./snooze 10
CTRL+C
Slept for 3 of 5 seconds.
shell>

If the program does not get interrupted and sleep returns normally, it should print the above message instead.

Exercise 5: Inter-Process Communication#

A process can kill another process using the kill() function (as explained in the background section). In this exercise, we ask you to kill the child processes running in a forever loop.

Open the file kill_children.c. You can see that the parent process creates children and then reaps them. Unfortunately, the children will keep executing forever. Your job is to use kill() to send a SIGINT signal to the child processes. Recall that by default on receipt of the SIGINT function the recipient process terminates.

Push your work in kill_children.c. The CI will test your kill_children program by running it and checking it terminates before 10 seconds has elapsed.

Exercise 6: Understanding Concurrency Bugs#

It is challenging to manage concurrent flows that read and write the same program data. We need a way to synchronize the concurrent flows. Otherwise, we end up with interleaving of instructions (or program statements) that lead to the wrong outcome. In this exercise, we explore a subtle concurrency bug in a cooked-up shell program.

We will study concurrency in more detail in the second half of the semester, but it is useful to start developing an intuition for the causes of and solutions to concurrency bugs now. First: some background on blocking and unblocking signals.

By default, the kernel blocks any pending signals of the type currently being handled by a handler. More importantly, applications can explicitly block and unblock selected signals using the sigprocmask function and its helpers. The sigprocmask function changes the set of currently blocked signals by manipulating the blocked bit vector. Read the code below carefully to see how sigprocmask and its helper functions are used.

/* Create masks */
sigset_t mask, prev_mask;
/* Initialize the mask to empty set. None of the signals are in the set yet */
sigemptyset(&mask);
/* Add the SIG_INT signal to the set */
sigaddset(&mask, SIG_INT);
/* Block SIGINT and save previous blocked set */
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
/* Removes SIGINT from blocked */
sigprocmask(SIG_UNBLOCK, &mask, &prev_mask);
/* Restores previous blocked set */
sigprocmask(SIG_SETMASK, &prev_mask, NULL);

Finally, the sigfillset function adds every signal to the mask.

Open the file procmask1.c. You will note that in main(), the parent process adds a child process to a list. After receiving a signal, the handler removes the child process from the list. In essence, the program keeps track of which processes are currently running. Note that we have intentionally kept the functions empty to focus on the interleaving of events that leads to a concurrency bug. Your job is to identify the sequence of events that may lead to an erroneous outcome.

An example of an erroneous outcome is when a child is removed from the list before it is added to the list by the parent process. Once you have identified the problem, change the code to fix the concurrency bug.

Extension Task: sigaction#

Throughout this lab you have used the signal function to install signal handlers. If you read the man page for signal (online mirror is available here as you may see something different depending on your specific system) you may have noticed that it says:

WARNING: the behavior of signal() varies across UNIX versions, and has also also varied historically across different versions of Linux. Avoid its use: use sigaction(2) instead.

This lab has stuck to using signal as it has a simpler interface. If you feel confident, you should read into how sigaction works and attempt to re-write your code to use the recommended sigaction instead.

Re-write any signal handling code to use sigaction to install signal handlers, rather than signal.

Extension Task: Extending shellx.c#

Some of these extension ideas are quite complex and go beyond the scope of this course. Don’t stress if you get stuck! These are just here if you want an “extreme” challenge.

As an extension exercise, think about some of the additional features your shell has that shellx.c lacks. For example, you have seen the following features already:

  1. You have previously run commands of the form make prog_name && ./prog_name. The && tells your shell to run the command to the left of the && first, and then run the command on the right after it has finished.
  2. The cd command allows your shell to change what the “active” directory is. This can’t be implemented as an external program like ls is2 – you can add support for changing directories as a “built in command”. You can use the system call chdir to do so.
  3. Running a command of the form ./prog_name > outfile will redirect the contents of prog_name’s STDOUT to outfile. Modify shellx to be able to parse and run redirected commands correctly.

Extend shellx to include some of these “bonus” features.

  1. You can examine your $PATH environment variable by running echo $PATH in your shell. 

  2. Think about why this would be the case based on your understanding of fork and exec

bars search times