Outline and Preparation#
In this week’s lab, you will:
- Learn to fork and properly reap child processes
- Learn to execute new programs in the context of an existing process
- Learn how shells and other similar utilities are implemented
- Learn how signal handlers are written
- Practice writing and debugging (concurrent) systems programs
This lab uses a new repository, make sure you fork and clone it!
-
Log in with your uni ID/password and open the lab template on GitLab here.
-
Fork the repository by clicking on the fork button. This will create your own copy of the lab repository.
-
Check the URL bar and make sure it contains your uid (eg.
gitlab.cecs.anu.edu.au/*your-uid-here*/comp2310-2024-lab-pack-2
). This means you are looking at your copy of the lab, and not the template. Once you’ve done this, click Clone and copy the address next to “Clone with HTTPS”. -
Open a terminal and clone with the Git clone command:
git clone address_from_step_3_here
Type in your uid and password when requested.
- You should now have a local copy of the repo on your computer. Try moving to the
correct directory with the
cd
andls
commands from earlier.
Introduction#
In the previous tutorials you’ve practised C code, as well as understood a bit on
how that C code is compiled into the assembly language you learnt all about in
COMP2300. Now we will teach you to write systems software, the cornerstone of
systems programming. Systems programming involves writing low-level code that
interfaces directly with the operating systems kernel and system libraries. In
the coming tutorials, you will write and implement code for controlling processes
and handling signals, and even write a memory allocator for manual memory
management (which you have so far done via the C standard library’s malloc()
/free()
functions.) Knowing the art of systems programming will make you a powerful
programmer. With the skills gained, you can start hacking virtual machines,
interpreters, editors, shell environments, web browser internals, etc. This
specific tutorial deals with managing processes and handling signals.
Reading man
pages#
There are a lot of details about system-level functions that we cannot cover in
lectures or tutorial handouts. However, fortunately Unix-like operating systems
have manual pages that allow easily looking up these functions, which you can access
with the command man
.
Read the manual pages for fork
and execve
. Don’t worry if there are parts
that you don’t understand yet — you’ll get more hands-on experience with these
system calls during this lab.
In future labs we will often provide less about system calls and functions
and instead direct you to the man pages. It is a good idea to get used to
using man
now. Some useful tips:
-
man
itself has a manual page! If you runman man
you will see important informationman
itself and the arguments you can pass it. -
Man pages are split into different sections. For example, section 2 is the section that contains man pages for system calls (such as
fork
) and section 3 is for library functions (such asprintf
). When you typeman some_name
to find a specific manual page,man
will automatically search for a matching man page in these different sections.You may find that the man page you get is not the one you wanted, if it belongs to multiple sections. You can tell man to search in a specific section by running
man sectnumber name
.For example, if you run
man printf
you will most likely get the man page for the shell commandprintf
, rather than the C standard library function. If you runman 3 printf
, you will instead be given the manual page for the C function.There are also man pages for terminal commands like
ls
andgrep
, which are in section 1. -
You can see which section the manual page you are current viewing belongs by looking in the top left corner. The section is in brackets after the manual page name (e.g.
printf(3)
). -
Linux Man Pages Online contains mirrors of all manual pages present on a linux system. You can use this if you find using the CLI interface cumbersome.
Shell and Terminal#
You have been using a shell and terminal to run commands throughout this course so far, but what you may not have realised is that the shell program is not part of the OS. It is a user-level application which interfaces with the OS, and in fact you can write a shell program yourself! If you’ve been using Linux or WSL, the specific shell you’ve been using is likely to be BASH (the Bourne-Again SHell), and you can find the binary for this program at /bin/bash
. We proceed to define what a “shell” actually is, in abstract terms.
A shell is an interactive user application program that runs other programs on behalf of the user. Note that a terminal is an interface that provides a display for output and a keyboard for input to a shell session. The shell is the interpreter that executes commands typed as strings. Typically, shells can run a job in the foreground or background. While a foreground job runs, the shell waits for the job to finish before interpreting the following command the user types. The shell user can submit a background job by appending an &
after the command. Below is a typical session with the shell environment:
> ls
forks
> ./forks 1
Parent has x = 0
Bye from process 2647667 with x = 0
Child has x = 2
Bye from process 2647668 with x = 2
>./forks 1 &
[1] 2647669
>
Parent has x = 0
Bye from process 2647669 with x = 0
Child has x = 2
Bye from process 2647670 with x = 2
By convention we use >
at the beginning of lines in the above snippet to
indicate a command typed into the shell. Other lines are the output of those
commands.
Depending on the particular shell you are using it may be a different character,
such as %
or $
.
You have hopefully had some practice working in a shell environment, so this should seem familiar to you. A few aspects of user interaction to note in the above session:
-
Some commands are built-in shell commands. In the above session, the examples are the
echo
andcd
commands.The shell can also run any program for the user. For example,
ls
andforks
are external programs. To run them, we need to inform the shell where the specific program resides. There are two ways of doing this:-
Using the
PATH
environment variable. This is a special variable that your shell uses to search for executable programs, consisting of different directories it should use to search for executable programs. The programls
will be somewhere in yourPATH
1. -
Directly telling the shell where the program is. The
forks
program is not present in the shell’sPATH
, so we invoke it as./forks
. The./
prefix tells the shell to look in the current directory for an executable namedforks
.
-
-
When we type
./forks 1
, the shell waits forforks
to finish executing and only returns to read/interpret the next command afterforks
terminates. -
When we run
forks
in the background by using an&
, the shell returns immediately, printing some information about the child process it creates. You can verify that&
indeed sends the job in the background by writing a program with awhile (1) {;}
loop, submitting it in the background, and then usingps
to verify the program is still running. Try running the same program in the foreground without the&
.
Make sure you understand the difference between foreground and background jobs as explained above. This will be important for completing Exercise 2.
ps
and execve
#
Linux and most Unix-like operating systems provide a utility called ps
(“Process Status”)
for viewing information about processes on a system. The ps
command is used to
list the currently running processes and their PIDs
, along with some other
information depending on what flags the ps
is run with.
You will need to use ps
in Exercise 1, so make sure you
understand the above explanation. You can read the manual page for ps
by
running the command man ps
if you would like more information.
By default, simply typing ps
shows information about the processes running
in the current shell.
The lecture slides from the third week cover all these system-level functions in detail. Please refer to the slides in the case you need to brush up your understanding.
Here, we discuss execve()
again for your convenience. The execve
function
loads and runs a new program in the “context” of the current process.
#include <unistd.h>
int execve(const char *filename, const char *argv[], const char *envp[]);
This function loads and runs the executable object file filename
with the
argument list argv
and the environment variable list envp
. Execve returns
to the calling program only if there is an error (e.g., filename
does not
exist). To see how the argument list and the list of environment variables look
like before and after the call to execve
, check the relevant section of the
Lecture Slides (slides 49–52).
Note that execve
belongs to the exec
family of system calls. The v
in
execve
comes from the fact that it takes an argument argv
to be the vector of
arguments to the program (the same way the main
function of a C program will).
The e
comes from its second set of arguments, envp
, which is a vector of the
environment variables (key/value pairs like LANG=english
).
Signal Handling#
A signal is a small message that notifies a process that an event of some type
has occurred in the system. Each signal has a name and a number. For example,
on Linux SIGINT
, SIGKILL
, and SIGSEGV
are well-known signals with numbers
2, 9, and 11 respectively. Signals provide the kernel a mechanism to expose
low-level hardware exceptions (e.g., divide by zero) to the user processes.
Imagine a process attempts to divide by zero, then the kernel sends a
SIGFPE
signal (number 8) to the process.
You have most likely encountered the dreaded “Segmentation Fault” by now - when
a process makes an illegal memory access, the kernel sends it a SIGSEGV
.
If you type Ctrl+C in your shell, the kernel sends a
SIGINT
(for “interrupt”) to the foreground process. A program that receives
a SIGINT
signal will usually terminate, though as you will see later in this
lab it is possible for a process to ignore this signal.
The SIGKILL
signal is used to forcibly terminate a process. It cannot be
blocked or ignored.
When a child process terminates or stops, the kernel sends a SIGCHLD
signal
to the parent process.
The kernel delivers a signal to a process for different reasons, including when:
- The kernel has detected an event such as divide-by-zero error or a child process has terminated.
- A process has invoked
kill
to explicitly request the kernel to send a signal to the destination process.
There is a default action associated with each type of signal. A process can
modify the default action associated with a signal by installing a signal handler.
A process can install a signal handler by using the signal
function. This function
takes two arguments:
- the number of the signal
- a signal handler associated with the signal.
The prototype of the signal
function is as follows:
#include <signal.h>
typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);
The function returns a pointer to the previous handler if everything goes well,
and SIGERR
on error.
For reasons of portability, you should always use the names of signals in your
programs and not the numbers (make sure to include the signal.h
header file).
This is because the actual integer associated with a signal may differ between
systems.
You can install a signal handler by calling the signal
function and providing the signal number and a pointer to a function that returns void and take an integer argument. When a process catches a signal of type k, the handler installed for signal k is invoked with a single integer argument set to k. This argument allows the same handler function to catch different types of signals.
Using kill
#
Processes send signals to other processes (including themselves) by calling the
kill
function. kill
is also available as a shell command.
#include <sys/types.h>
#include <signal.h>
int kill(pid_t pid, int sig);
If pid
is greater than zero, then the kill
function sends the signal number
sig
to process pid
.
Printing from Signal Handlers#
It is unsafe to call printf
, sprintf
, malloc
, and many other system-level
functions inside a signal handler. In technical terms this is because these
functions are not reentrant. A reentrant function is one that is able to be
interrupted during its execution and safely re-called before the previous
invocation completes its execution. This typically means reentrant functions
make use of local variables only. Non-reentrant functions maintain state across
function calls (e.g., by declaring variables with the static storage duration).
Exercise 1: Understanding fork()
#
The file forks.c
contains various functions that make use of fork()
, wait()
and waitpid()
in different ways. You can run the program by providing a number
as an input argument. To run the first function called fork1()
, run ./forks 1
.
Read the code for each function and then, without running, determine the
function’s behaviour, including what will be printed on the screen. You will
need to predict if the function will create any zombies. Verify your prediction
using a combination of output on the screen and the ps
utility. Terminate any
zombie processes with the kill
command.
You may find it useful to draw a process graph to understand the behaviour of the different functions. Process graphs were covered in the lectures — see slides 37–42 of the week 3 lectures for a reminder.
Exercise 2: Reaping Background Jobs in shellx
#
Programs such as shells in Unix-like operating systems and web servers make
heavy use of the fork
and execve
system calls. This exercise provides you
with a basic template for building a shell in shellx.c
.
Real shells such as Bash have many features beyond what we have discussed, such as output
redirection. This program shellx
is a very miniature example of a shell. But it demonstrates
how you can build a shell as a user level program using system calls!
This shell program will properly reap foreground jobs, but it will not properly reap background jobs. This will leave “zombie” processes in the systems. Your task is to understand what the program is doing and then fix it so it correctly reaps the background jobs by using a signal handler.
Open the file shellx.c
and try to understand the code, including key functions.
You should have an intuition what the program is doing: reading and parsing
user commands, and interpreting them one-by-one, and reaping foreground jobs
properly.
Don’t worry too much as this stage if the details of the parseline
function
does not make sense to you. All you need to know is that it “sets” up the
argument list required to execute a function.
In shellx.c
, install a signal handler for the SIGCHLD
signal. Every time the
kernel sends the parent process the SIGCHLD
signal, the parent should reap the
defunct child process. Your signal handler should not exit the program, and
return control to the main program.
Exercise 3: Reaping Multiple Child Processes#
In this exercise, we provide a program (child_handler.c
) that attempts to use
signals to allow a parent process to count events. Each event corresponds to a
terminated child.
The idea is that the parent process tracks the number of terminated children
in a global variable. It creates N
child processes and sets ccount
to N
.
Then, the signal handler decrements ccount
each time it is called.
If everything is correct, the parent process will eventually terminate. The
parent will run forever if something is wrong with the program. Can you see what
the program is doing? Can you verify that the parent indeed runs forever?
We provide you with the following property regarding signals that is key to finding and fixing the bug: Pending signals of a specific type are not queued.
The kernel maintains a pending bit vector containing exactly one bit for
every signal type. Therefore, there can be at most one pending signal of a
particular type. The existence of a pending signal merely indicates that at
least one signal has arrived. If the signal handler is executing and two
more signals are delivered to the destination process, then the kernel only
keeps track of the fact that there is at least one pending signal of type k
.
It does not keep track of the number of signals of type k
delivered
while the signal handler runs.
Open the file child_handler.c
. Identify and try to fix the bug in the
child_handler
function.
Volatile#
In child_handler.c
file you may have noticed the line
volatile int ccount = 0;
The volatile keyword in C is a a qualifier that is applied to a variable when it is declared. Its main purpose is to inform the compiler that the value of the variable may change at any time, even in the absence of any updates to a variable in the the code the compiler finds nearby.
When compilers generate code for the rest of the program they sometimes cannot predict that the value of a variable can be changed by a signal handler. These handlers are not tightly coupled with the rest of the program like normal routines/functions.
Consider the following code:
int _exit = 0;
main() {
while (!_exit) {
/* tight spin loop which is completely visible to the compiler */
}
return;
}
The compiler can see that the loop body and the rest of the main()
function
does not touch (or write/update) _exit
. Therefore, it is allowed to convert
the loop to a while (1)
loop. Even if the _exit
variable is set to 1
by
a signal handler for SIGINT
, the compiler has no way to know that.
Similarly, the compiler can notice that a variable never changes its value, and
stores (caches) its value in a fast register (treating it as a constant) instead
of reading it from memory. Then, if the signal handler updates the value in
memory, the main
function will never see the change.
However, if the _exit
variable is declared volatile, the compiler will
generate assembly code that will load the most up-to-date value of the variable
_exit
from memory every time. Thus, the volatile
qualifier tells the compiler
that the specific variable it qualifies can be modified elsewhere.
Note that the combination of your specific compiler and operating system may lead to different behaviour. Write a couple signal handlers as part of the lab exercises and let us know if you observe anything strange!
Exercise 4: Snooze#
The sleep
function suspends a process for a specified period of time.
#include <unistd.h>
unsigned int sleep(unsigned int secs);
It returns zero if the requested amount of time has elapsed before the program
resumed, and the number of seconds still left to sleep otherwise. The latter
case is possible if the sleep
function returns prematurely because it was
interrupted by a signal.
Write a wrapper function for sleep
called wakeup
, with the following interface:
unsigned int wakeup(unsigned int secs);
The wakeup
function behaves exactly as the sleep
function, except that it
prints a message describing when the process actually woke up:
Woke up at 4 seconds.
Now, write a program called snooze
that takes a single command-line argument,
calls the wakeup function with this argument, and then terminates. Write your
program so that the user can interrupt the snooze function by entering
Ctrl+C. For example:
shell> ./snooze 10
CTRL+C
Slept for 3 of 5 seconds.
shell>
If the program does not get interrupted and sleep returns normally, it should print the above message instead.
Exercise 5: Inter-Process Communication#
A process can kill another process using the kill()
function (as explained in
the background section). In this exercise, we ask you to kill
the child processes running in a forever loop.
Open the file kill_children.c
. You can see that the parent process creates
children and then reaps them. Unfortunately, the children will keep executing
forever. Your job is to use kill()
to send a SIGINT
signal to the child
processes. Recall that by default on receipt of the SIGINT
function the
recipient process terminates.
Push your work in kill_children.c
. The CI will test your kill_children
program by running it and checking it terminates before 10 seconds has elapsed.
Exercise 6: Understanding Concurrency Bugs#
It is challenging to manage concurrent flows that read and write the same program data. We need a way to synchronize the concurrent flows. Otherwise, we end up with interleaving of instructions (or program statements) that lead to the wrong outcome. In this exercise, we explore a subtle concurrency bug in a cooked-up shell program.
We will study concurrency in more detail in the second half of the semester, but it is useful to start developing an intuition for the causes of and solutions to concurrency bugs now. First: some background on blocking and unblocking signals.
By default, the kernel blocks any pending signals of the type currently being
handled by a handler. More importantly, applications can explicitly block and
unblock selected signals using the sigprocmask
function and its helpers. The
sigprocmask
function changes the set of currently blocked signals by
manipulating the blocked bit vector. Read the code below carefully to see how
sigprocmask
and its helper functions are used.
/* Create masks */
sigset_t mask, prev_mask;
/* Initialize the mask to empty set. None of the signals are in the set yet */
sigemptyset(&mask);
/* Add the SIG_INT signal to the set */
sigaddset(&mask, SIG_INT);
/* Block SIGINT and save previous blocked set */
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
/* Removes SIGINT from blocked */
sigprocmask(SIG_UNBLOCK, &mask, &prev_mask);
/* Restores previous blocked set */
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
Finally, the sigfillset
function adds every signal to the mask.
Open the file procmask1.c
. You will note that in main()
, the parent process
adds a child process to a list. After receiving a signal, the handler removes
the child process from the list. In essence, the program keeps track of which
processes are currently running. Note that we have intentionally kept the
functions empty to focus on the interleaving of events that leads to a
concurrency bug. Your job is to identify the sequence of events that may lead to
an erroneous outcome.
An example of an erroneous outcome is when a child is removed from the list before it is added to the list by the parent process. Once you have identified the problem, change the code to fix the concurrency bug.
Extension Task: sigaction
#
Throughout this lab you have used the signal
function to install signal
handlers. If you read the man page for signal
(online mirror is
available here as you may
see something different depending on your specific system) you may have noticed
that it says:
WARNING: the behavior of
signal()
varies across UNIX versions, and has also also varied historically across different versions of Linux. Avoid its use: usesigaction(2)
instead.
This lab has stuck to using signal
as it has a simpler interface. If you feel
confident, you should read into how sigaction
works and attempt to re-write
your code to use the recommended sigaction
instead.
Re-write any signal handling code to use sigaction
to install signal handlers,
rather than signal
.
Extension Task: Extending shellx.c
#
Some of these extension ideas are quite complex and go beyond the scope of this course. Don’t stress if you get stuck! These are just here if you want an “extreme” challenge.
As an extension exercise, think about some of the additional features your
shell has that shellx.c
lacks. For example, you have seen the following features
already:
- You have previously run commands of the form
make prog_name && ./prog_name
. The&&
tells your shell to run the command to the left of the&&
first, and then run the command on the right after it has finished. - The
cd
command allows your shell to change what the “active” directory is. This can’t be implemented as an external program likels
is2 – you can add support for changing directories as a “built in command”. You can use the system callchdir
to do so. - Running a command of the form
./prog_name > outfile
will redirect the contents ofprog_name
’sSTDOUT
tooutfile
. Modifyshellx
to be able to parse and run redirected commands correctly.
Extend shellx
to include some of these “bonus” features.