Outline and Preparation#
In this week’s lab, you will:
- Learn to fork and properly reap child processes
- Learn to execute new programs in the context of an existing process
- Learn how shells and other similar utilities are implemented
- Learn how signal handlers are written
- Practice writing and debugging (concurrent) systems programs
This lab uses a new repository, make sure you fork and clone it!
-
Log in with your uni ID/password and open the lab template on GitLab here.
-
Fork the repository by clicking on the fork button. This will create your own copy of the lab repository.
-
Check the URL bar and make sure it contains your uid (eg.
gitlab.cecs.anu.edu.au/*your-uid-here*/comp2310-2025-lab-pack-2). This means you are looking at your copy of the lab, and not the template. Once you’ve done this, click Clone and copy the address next to “Clone with HTTPS”. -
Open a terminal and clone with the Git clone command:
git clone address_from_step_3_here
Type in your uid and password when requested.
- You should now have a local copy of the repo on your computer. Try moving to the
correct directory with the
cdandlscommands from earlier.
Introduction#
In the previous tutorials you’ve practised C code, as well as understood a bit on
how that C code is compiled into the assembly language you learnt all about in
COMP2300. Now we will teach you to write systems software, the cornerstone of
systems programming. Systems programming involves writing low-level code that
interfaces directly with the operating systems kernel and system libraries. In
the coming tutorials, you will write and implement code for controlling processes
and handling signals, and even write a memory allocator for manual memory
management (which you have so far done via the C standard library’s malloc()/free()
functions.) Knowing the art of systems programming will make you a powerful
programmer. With the skills gained, you can start hacking virtual machines,
interpreters, editors, shell environments, web browser internals, etc. This
specific tutorial deals with managing processes and handling signals.
Reading man pages#
There are a lot of details about system-level functions that we cannot cover in
lectures or tutorial handouts. However, fortunately Unix-like operating systems
have manual pages that allow easily looking up these functions, which you can access
with the command man.
Read the manual pages for fork and execve. Don’t worry if there are parts
that you don’t understand yet — you’ll get more hands-on experience with these
system calls during this lab.
In future labs we will often provide less about system calls and functions
and instead direct you to the man pages. It is a good idea to get used to
using man now. Some useful tips:
-
manitself has a manual page! If you runman manyou will see important informationmanitself and the arguments you can pass it. -
Man pages are split into different sections. For example, section 2 is the section that contains man pages for system calls (such as
fork) and section 3 is for library functions (such asprintf). When you typeman some_nameto find a specific manual page,manwill automatically search for a matching man page in these different sections.You may find that the man page you get is not the one you wanted, if it belongs to multiple sections. You can tell man to search in a specific section by running
man sectnumber name.For example, if you run
man printfyou will most likely get the man page for the shell commandprintf, rather than the C standard library function. If you runman 3 printf, you will instead be given the manual page for the C function.There are also man pages for terminal commands like
lsandgrep, which are in section 1. -
You can see which section the manual page you are current viewing belongs by looking in the top left corner. The section is in brackets after the manual page name (e.g.
printf(3)). -
Linux Man Pages Online contains mirrors of all manual pages present on a linux system. You can use this if you find using the CLI interface cumbersome.
Shell and Terminal#
You have been using a shell and terminal to run commands throughout this course so far, but what you may not have realised is that the shell program is not part of the OS. It is a user-level application which interfaces with the OS, and in fact you can write a shell program yourself! If you’ve been using Linux or WSL, the specific shell you’ve been using is likely to be BASH (the Bourne-Again SHell), and you can find the binary for this program at /bin/bash. We proceed to define what a “shell” actually is, in abstract terms.
A shell is an interactive user application program that runs other programs on behalf of the user. Note that a terminal is an interface that provides a display for output and a keyboard for input to a shell session. The shell is the interpreter that executes commands typed as strings. Typically, shells can run a job in the foreground or background. While a foreground job runs, the shell waits for the job to finish before interpreting the following command the user types. The shell user can submit a background job by appending an & after the command. Below is a typical session with the shell environment:
> ls
forks
> ./forks 1
Parent has x = 0
Bye from process 2647667 with x = 0
Child has x = 2
Bye from process 2647668 with x = 2
>./forks 1 &
[1] 2647669
>
Parent has x = 0
Bye from process 2647669 with x = 0
Child has x = 2
Bye from process 2647670 with x = 2
By convention we use > at the beginning of lines in the above snippet to
indicate a command typed into the shell. Other lines are the output of those
commands.
Depending on the particular shell you are using it may be a different character,
such as % or $.
You have hopefully had some practice working in a shell environment, so this should seem familiar to you. A few aspects of user interaction to note in the above session:
-
Some commands are built-in shell commands. In the above session, the examples are the
echoandcdcommands.The shell can also run any program for the user. For example,
lsandforksare external programs. To run them, we need to inform the shell where the specific program resides. There are two ways of doing this:-
Using the
PATHenvironment variable. This is a special variable that your shell uses to search for executable programs, consisting of different directories it should use to search for executable programs. The programlswill be somewhere in yourPATH1. -
Directly telling the shell where the program is. The
forksprogram is not present in the shell’sPATH, so we invoke it as./forks. The./prefix tells the shell to look in the current directory for an executable namedforks.
-
-
When we type
./forks 1, the shell waits forforksto finish executing and only returns to read/interpret the next command afterforksterminates. -
When we run
forksin the background by using an&, the shell returns immediately, printing some information about the child process it creates. You can verify that&indeed sends the job in the background by writing a program with awhile (1) {;}loop, submitting it in the background, and then usingpsto verify the program is still running. Try running the same program in the foreground without the&.
Make sure you understand the difference between foreground and background jobs as explained above. This will be important for completing Exercise 2.
ps and execve#
Linux and most Unix-like operating systems provide a utility called ps (“Process Status”)
for viewing information about processes on a system. The ps command is used to
list the currently running processes and their PIDs, along with some other
information depending on what flags the ps is run with.
You will need to use ps in Exercise 1, so make sure you
understand the above explanation. You can read the manual page for ps by
running the command man ps if you would like more information.
By default, simply typing ps shows information about the processes running
in the current shell.
The lecture slides from the third week cover all these system-level functions in detail. Please refer to the slides in the case you need to brush up your understanding.
Here, we discuss execve() again for your convenience. The execve function
loads and runs a new program in the “context” of the current process.
#include <unistd.h>
int execve(const char *filename, const char *argv[], const char *envp[]);
This function loads and runs the executable object file filename with the
argument list argv and the environment variable list envp. Execve returns
to the calling program only if there is an error (e.g., filename does not
exist). To see how the argument list and the list of environment variables look
like before and after the call to execve, check the relevant section of the
Lecture Slides (slides 49–52).
Note that execve belongs to the exec family of system calls. The v in
execve comes from the fact that it takes an argument argv to be the vector of
arguments to the program (the same way the main function of a C program will).
The e comes from its second set of arguments, envp, which is a vector of the
environment variables (key/value pairs like LANG=english).
Signal Handling#
A signal is a small message that notifies a process that an event of some type
has occurred in the system. Each signal has a name and a number. For example,
on Linux SIGINT, SIGKILL, and SIGSEGV are well-known signals with numbers
2, 9, and 11 respectively. Signals provide the kernel a mechanism to expose
low-level hardware exceptions (e.g., divide by zero) to the user processes.
Imagine a process attempts to divide by zero, then the kernel sends a
SIGFPE signal (number 8) to the process.
You have most likely encountered the dreaded “Segmentation Fault” by now - when
a process makes an illegal memory access, the kernel sends it a SIGSEGV.
If you type Ctrl+C in your shell, the kernel sends a
SIGINT (for “interrupt”) to the foreground process. A program that receives
a SIGINT signal will usually terminate, though as you will see later in this
lab it is possible for a process to ignore this signal.
The SIGKILL signal is used to forcibly terminate a process. It cannot be
blocked or ignored.
When a child process terminates or stops, the kernel sends a SIGCHLD signal
to the parent process.
The kernel delivers a signal to a process for different reasons, including when:
- The kernel has detected an event such as divide-by-zero error or a child process has terminated.
- A process has invoked
killto explicitly request the kernel to send a signal to the destination process.
There is a default action associated with each type of signal. A process can
modify the default action associated with a signal by installing a signal handler.
A process can install a signal handler by using the signal function. This function
takes two arguments:
- the number of the signal
- a signal handler associated with the signal.
The prototype of the signal function is as follows:
#include <signal.h>
typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);
The function returns a pointer to the previous handler if everything goes well,
and SIGERR on error.
For reasons of portability, you should always use the names of signals in your
programs and not the numbers (make sure to include the signal.h header file).
This is because the actual integer associated with a signal may differ between
systems.
You can install a signal handler by calling the signal function and providing the signal number and a pointer to a function that returns void and take an integer argument. When a process catches a signal of type k, the handler installed for signal k is invoked with a single integer argument set to k. This argument allows the same handler function to catch different types of signals.
Using kill#
Processes send signals to other processes (including themselves) by calling the
kill function. kill is also available as a shell command.
#include <sys/types.h>
#include <signal.h>
int kill(pid_t pid, int sig);
If pid is greater than zero, then the kill function sends the signal number
sig to process pid.
Printing from Signal Handlers#
It is unsafe to call printf, sprintf, malloc, and many other system-level
functions inside a signal handler. In technical terms this is because these
functions are not reentrant. A reentrant function is one that is able to be
interrupted during its execution and safely re-called before the previous
invocation completes its execution. This typically means reentrant functions
make use of local variables only. Non-reentrant functions maintain state across
function calls (e.g., by declaring variables with the static storage duration).
Exercise 1: Understanding fork()#
The file forks.c contains various functions that make use of fork(), wait()
and waitpid() in different ways. You can run the program by providing a number
as an input argument. To run the first function called fork1(), run ./forks 1.
Read the code for each function and then, without running, determine the
function’s behaviour, including what will be printed on the screen. You will
need to predict if the function will create any zombies. Verify your prediction
using a combination of output on the screen and the ps utility. Terminate any
zombie processes with the kill command.
You may find it useful to draw a process graph to understand the behaviour of the different functions. Process graphs were covered in the lectures — see slides 37–42 of the week 3 lectures for a reminder.
Exercise 2: Reaping Background Jobs in shellx#
Programs such as shells in Unix-like operating systems and web servers make
heavy use of the fork and execve system calls. This exercise provides you
with a basic template for building a shell in shellx.c.
Real shells such as Bash have many features beyond what we have discussed, such as output
redirection. This program shellx is a very miniature example of a shell. But it demonstrates
how you can build a shell as a user level program using system calls!
This shell program will properly reap foreground jobs, but it will not properly reap background jobs. This will leave “zombie” processes in the systems. Your task is to understand what the program is doing and then fix it so it correctly reaps the background jobs by using a signal handler.
Open the file shellx.c and try to understand the code, including key functions.
You should have an intuition what the program is doing: reading and parsing
user commands, and interpreting them one-by-one, and reaping foreground jobs
properly.
Don’t worry too much as this stage if the details of the parseline function
does not make sense to you. All you need to know is that it “sets” up the
argument list required to execute a function.
In shellx.c, install a signal handler for the SIGCHLD signal. Every time the
kernel sends the parent process the SIGCHLD signal, the parent should reap the
defunct child process. Your signal handler should not exit the program, and
return control to the main program.
Exercise 3: Reaping Multiple Child Processes#
In this exercise, we provide a program (child_handler.c) that attempts to use
signals to allow a parent process to count events. Each event corresponds to a
terminated child.
The idea is that the parent process tracks the number of terminated children
in a global variable. It creates N child processes and sets ccount to N.
Then, the signal handler decrements ccount each time it is called.
If everything is correct, the parent process will eventually terminate. The
parent will run forever if something is wrong with the program. Can you see what
the program is doing? Can you verify that the parent indeed runs forever?
We provide you with the following property regarding signals that is key to finding and fixing the bug: Pending signals of a specific type are not queued.
The kernel maintains a pending bit vector containing exactly one bit for
every signal type. Therefore, there can be at most one pending signal of a
particular type. The existence of a pending signal merely indicates that at
least one signal has arrived. If the signal handler is executing and two
more signals are delivered to the destination process, then the kernel only
keeps track of the fact that there is at least one pending signal of type k.
It does not keep track of the number of signals of type k delivered
while the signal handler runs.
Open the file child_handler.c. Identify and try to fix the bug in the
child_handler function.
Volatile#
In child_handler.c file you may have noticed the line
volatile int ccount = 0;
The volatile keyword in C is a a qualifier that is applied to a variable when it is declared. Its main purpose is to inform the compiler that the value of the variable may change at any time, even in the absence of any updates to a variable in the the code the compiler finds nearby.
When compilers generate code for the rest of the program they sometimes cannot predict that the value of a variable can be changed by a signal handler. These handlers are not tightly coupled with the rest of the program like normal routines/functions.
Consider the following code:
int _exit = 0;
main() {
while (!_exit) {
/* tight spin loop which is completely visible to the compiler */
}
return;
}
The compiler can see that the loop body and the rest of the main() function
does not touch (or write/update) _exit . Therefore, it is allowed to convert
the loop to a while (1) loop. Even if the _exit variable is set to 1 by
a signal handler for SIGINT, the compiler has no way to know that.
Similarly, the compiler can notice that a variable never changes its value, and
stores (caches) its value in a fast register (treating it as a constant) instead
of reading it from memory. Then, if the signal handler updates the value in
memory, the main function will never see the change.
However, if the _exit variable is declared volatile, the compiler will
generate assembly code that will load the most up-to-date value of the variable
_exit from memory every time. Thus, the volatile qualifier tells the compiler
that the specific variable it qualifies can be modified elsewhere.
Note that the combination of your specific compiler and operating system may lead to different behaviour. Write a couple signal handlers as part of the lab exercises and let us know if you observe anything strange!
Exercise 4: Snooze#
The sleep function suspends a process for a specified period of time.
#include <unistd.h>
unsigned int sleep(unsigned int secs);
It returns zero if the requested amount of time has elapsed before the program
resumed, and the number of seconds still left to sleep otherwise. The latter
case is possible if the sleep function returns prematurely because it was
interrupted by a signal.
Write a wrapper function for sleep called wakeup, with the following interface:
unsigned int wakeup(unsigned int secs);
The wakeup function behaves exactly as the sleep function, except that it
prints a message describing when the process actually woke up:
Woke up at 4 seconds.
Now, write a program called snooze that takes a single command-line argument,
calls the wakeup function with this argument, and then terminates. Write your
program so that the user can interrupt the snooze function by entering
Ctrl+C. For example:
shell> ./snooze 10
CTRL+C
Slept for 3 of 5 seconds.
shell>
If the program does not get interrupted and sleep returns normally, it should print the above message instead.
Exercise 5: Inter-Process Communication#
A process can kill another process using the kill() function (as explained in
the background section). In this exercise, we ask you to kill
the child processes running in a forever loop.
Open the file kill_children.c. You can see that the parent process creates
children and then reaps them. Unfortunately, the children will keep executing
forever. Your job is to use kill() to send a SIGINT signal to the child
processes. Recall that by default on receipt of the SIGINT function the
recipient process terminates.
Push your work in kill_children.c. The CI will test your kill_children
program by running it and checking it terminates before 10 seconds has elapsed.
Exercise 6: Understanding Concurrency Bugs#
It is challenging to manage concurrent flows that read and write the same program data. We need a way to synchronize the concurrent flows. Otherwise, we end up with interleaving of instructions (or program statements) that lead to the wrong outcome. In this exercise, we explore a subtle concurrency bug in a cooked-up shell program.
We will study concurrency in more detail in the second half of the semester, but it is useful to start developing an intuition for the causes of and solutions to concurrency bugs now. First: some background on blocking and unblocking signals.
By default, the kernel blocks any pending signals of the type currently being
handled by a handler. More importantly, applications can explicitly block and
unblock selected signals using the sigprocmask function and its helpers. The
sigprocmask function changes the set of currently blocked signals by
manipulating the blocked bit vector. Read the code below carefully to see how
sigprocmask and its helper functions are used.
/* Create masks */
sigset_t mask, prev_mask;
/* Initialize the mask to empty set. None of the signals are in the set yet */
sigemptyset(&mask);
/* Add the SIG_INT signal to the set */
sigaddset(&mask, SIG_INT);
/* Block SIGINT and save previous blocked set */
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
/* Removes SIGINT from blocked */
sigprocmask(SIG_UNBLOCK, &mask, &prev_mask);
/* Restores previous blocked set */
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
Finally, the sigfillset function adds every signal to the mask.
Open the file procmask1.c. You will note that in main(), the parent process
adds a child process to a list. After receiving a signal, the handler removes
the child process from the list. In essence, the program keeps track of which
processes are currently running. Note that we have intentionally kept the
functions empty to focus on the interleaving of events that leads to a
concurrency bug. Your job is to identify the sequence of events that may lead to
an erroneous outcome.
An example of an erroneous outcome is when a child is removed from the list before it is added to the list by the parent process. Once you have identified the problem, change the code to fix the concurrency bug.
Extension Task: sigaction#
Throughout this lab you have used the signal function to install signal
handlers. If you read the man page for signal (online mirror is
available here as you may
see something different depending on your specific system) you may have noticed
that it says:
WARNING: the behavior of
signal()varies across UNIX versions, and has also also varied historically across different versions of Linux. Avoid its use: usesigaction(2)instead.
This lab has stuck to using signal as it has a simpler interface. If you feel
confident, you should read into how sigaction works and attempt to re-write
your code to use the recommended sigaction instead.
Re-write any signal handling code to use sigaction to install signal handlers,
rather than signal.
Extension Task: Extending shellx.c#
Some of these extension ideas are quite complex and go beyond the scope of this course. Don’t stress if you get stuck! These are just here if you want an “extreme” challenge.
As an extension exercise, think about some of the additional features your
shell has that shellx.c lacks. For example, you have seen the following features
already:
- You have previously run commands of the form
make prog_name && ./prog_name. The&&tells your shell to run the command to the left of the&&first, and then run the command on the right after it has finished. - The
cdcommand allows your shell to change what the “active” directory is. This can’t be implemented as an external program likelsis2 – you can add support for changing directories as a “built in command”. You can use the system callchdirto do so. - Running a command of the form
./prog_name > outfilewill redirect the contents ofprog_name’sSTDOUTtooutfile. Modifyshellxto be able to parse and run redirected commands correctly.
Extend shellx to include some of these “bonus” features.