Lab 9: Concurrency 1

Note that this lab has a new lab pack to clone. Normally we would release multiple weeks worth of labs at the same time (i.e. next weeks lab), but we want to go over the content more closely to make sure it helps prepare you for your second assignment more directly.

The lab material for next week will be added to the same repository, you will need to perform an upstream pull when that occurs.

Outline & Preparation#

In this week’s lab, you will:

Learn the basics of pthreads and their use in writing concurrent applications.
Learn how to use pthread locks to synchronise between threads.
Learn about condition variables and using them to implement shared data between threads.

This lab uses a new repository, make sure you fork and clone it!

Clone the lab pack 4 repo from here.

This is our first lab on concurrency using threads. You have already seen another way of approaching concurrency when you looked at Multiprocessing in Lab 4. We encourage you to think about the differences and similarities between multithreading and multiprocessing as you read through this lab.

Introduction to POSIX Threads#

The Week 8 and Week 9 covers concurrency and the pthread interface in quite a lot detail. We recommend watching these lectures and/or reading through the slides if you find yourself getting stuck.

POSIX Threads (hereafter referred to as “pthreads”) are a standard interface for using threads in C programs. A thread is a “logical flow that runs in the context of a process”. They are a way of writing concurrent programs in C.

The code you are given to work on this week may feel trivial and not the sort of programs that demonstrate the practical benefits of concurrency. This is because the lab is mostly focused on introducing you to the basics of using pthreads. In our next (and final) lab we will put these concepts together to create something more substantial.

Creating Threads#

A pthread is created by calling the pthread_create function:

#include <pthread.h>

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                   void *(*start_routine) (void *), void *arg);

This will launch a new thread in the calling process. The “thread id” of this new thread will be stored at the location pointed to by the first argument thread. The newly-created thread will start executing the function pointed to by the start_routine argument, which which takes a single void * as an argument. We will often refer to this start_routine function as the thread’s “top level function”. The attr argument is used to define certain attributes of the thread (this will often just be left NULL).

You may remember function pointers from all the way back in Lab 3, where we said:

This is a more advanced C feature that will become more relevant when covering concurrency later in the semester.

If you’re feeling confused by function pointers, have a quick read through of the explanation on that page (you don’t have to complete the exercises if you don’t want to) and ask your tutor if you have any questions.

Run man pthread_create to read about the pthread_create function in more detail. Alternatively, click this handy link for access to an online copy.

Thread Routines#

The start_routine argument provided to pthread_create is a pointer to a function that returns a “void pointer” and accepts a void pointer as its sole argument. While it may seem prohibitive to only allow one argument to be passed to a thread routine, remember that a void pointer in C represents a “generic pointer” to an unknown type (e.g. the return type of malloc). If you need to pass multiple arguments to a thread routine you can pass a pointer to a struct type with all the necessary data in it.

Joining Threads#

What happens to the thread created by a call to pthread_create when it returns from the given start_routine? Unlike a child process created with fork, it will not continue following the same logical control flow as its “parent”. Instead, a thread that returns from its top level function it will terminate. The operating system kernel will handle reclaiming certain resources created for the thread, but only after the another thread in the process has indicated it is ok to do so by called a specific function — pthread_join:

int pthread_join(pthread_t thread, void **retval);

This function waits for the thread specified by thread to terminate. If that thread has already terminated, then pthread_join returns immediately. Otherwise the caller will block until has done so. The retval argument is a pointer to a void pointer, which if non-NULL will be used to store the return value of the terminated thread.

Note that returning from a top-level function isn’t the only way a thread can terminate, though it is the way we will be relying on for the first part of this lab.

What are the similarities this has to calling wait when reaping child processes created by fork?

Exercise 1: Creating & Joining Threads#

Your first task is to look at a very simple (and broken) usage of the functions we described above. If you open the simple-pthreads.c file in your repository, you will see a program that will launch a number of threads that each run the following thread_func function:

void *thread_func(void *arg) {
  size_t thread_num = *((size_t *)arg);
  printf("[Thread #%lu] ", thread_num);
  printf("Hello from %s %lu!\n", "thread number", thread_num);
  printf("[Thread #%lu] ", thread_num);
  printf("Has finished executing, %s\n", "and will return now.");
  return NULL;
}

Each thread created will print two messages to stdout, both of which are tagged with their “thread number”.

The creation (and eventual joining) of these threads happens in the create_threads function:

printf("[Main Thread] launching %ld threads.\n", num_threads);
for (idx = 0; idx < num_threads; idx++)
  pthread_create(&tids[idx], NULL, thread_func, &idx);

Compile and run this program. Does it do what you expect?

If you run the program without specifying a number of threads, it will default to launching 2. If you run ./simple-pthreads you may see output that looks like the following correct behaviour:

[Main Thread] Creating 2 threads.
[Thread #0] Hello from thread number 0!
[Thread #1] Hello from thread number 1!
[Thread #1] Has finished executing, and will return now.
[Thread #0] Has finished executing, and will return now.
[Main Thread] has created all threads. Joining threads now.
[Main Thread] joined thread 0
[Main Thread] joined thread 1
[Main Thread] Joined all threads. Exiting now.

Or you may instead see something more garbled:

[Main Thread] Creating 2 threads.
[Thread #1] [Thread #2] Hello from thread number 2!
[Thread #2] Has finished executing, and will return now.
[Main Thread] has created all threads. Joining threads now.
Hello from thread number 1!
[Thread #1] Has finished executing, and will return now.
[Main Thread] joined thread 0
[Main Thread] joined thread 1
[Main Thread] Joined all threads. Exiting now.

If you re-run the program many times, you will probably see different kinds of unexpected output.

Your first task is to determine what parts of the sample code are causing the program to behave incorrectly. Note that there is more than one issue with this code. You may attempt to fix the bugs yourself now, or keep reading through the rest of the lab before coming back to this task once you’ve learned a bit more about concurrency bugs.

One frustrating aspect of debugging concurrent programs is that bugs won’t always emerge, especially if you are working with short-lived programs or a small number of threads. Try running with a higher number of threads given as an argument (e.g. ./simple-pthreads 6).

If you’re not sure where to start looking for bugs, here are a few pointers for you:

Consider how the thread number is passed to thread_func: should this value be passed by reference? What will happen if the main thread increments idx before the newly created thread executes the assignment statement?
How do the threads write to stdout? Is it possible to guarantee that only one thread is writing to stdout at once? Would collapsing the printf statements into one prevent this problem?
While it is most likely not going to cause problems for this specific program, notice that some of these functions should have their return value checked for errors. For example, how can pthread_create fail?

If you inspect this code closely, you may even discover a bug we didn’t include intentionally. In that case, excellent work!

Exercise 2: Locking and Synchronisation#

Using Mutexes#

The lectures on concurrency have covered the use of semaphores as a way of performing synchronisation between threads already. While they are an important concept to learn about, we will not be using them in these labs.¹ Instead, we will focus on a closely related concept - the mutexes provided as part of the pthread API. There are a different types of mutexes that are made available, but for now we will start by focusing on “the standard” mutex (pthread_mutex_t). This mutex allows only one thread to access the resource at any given time.

The following functions are used to initialise, use and destroy mutexes:

// Initialise `mutex`
int pthread_mutex_init(pthread_mutex_t *mutex, pthread_mutexattr_t *attr);

// Free any resources used by `mutex`
int pthread_mutex_destroy(pthread_mutex_t *mutex);

// Attempt to lock the mutex. If it is already locked, calling this function
// will block the thread until it is free.
int pthread_mutex_lock(pthread_mutex_t *mutex);

// Unlock the mutex. If there are any threads waiting to lock this mutex, they
// may have a chance to acquire the lock.
int pthread_mutex_unlock(pthread_mutex_t *mutex); 

// Similar to pthread_mutex_lock, but an already-locked mutex will not cause the
// calling thread to block.
int pthread_mutex_trylock(pthread_mutex_t *mutex);

Read through the manual pages for each of these functions. In particular focus on understanding what each of the arguments mean, how they are meant to be used and what errors can occur when using them.

Understanding the Problem#

Your second exercise will get you to practice working with this interface for creating and managing mutexes. In counting.c you have been provided with some template code that creates two threads, each of which increment a shared counter variable a set number of times.

Read through the code in counting.c. If you have been paying attention, you likely already know what the issue with this code will be. Run the code anyway to confirm your suspicions. A similar warning as for exercise 1 applies - you may have to set a high iteration count before you see any divergence in the expected and computed sums.

Hopefully you are starting to develop an intuition for the cause of these sorts of issues. In this case, it is because both threads will be reading from and writing to the same location in memory concurrently. Each of these updates requires loading the value of counter that is currently stored in memory, incrementing it by 1, and storing that updating value back. In pseudo-x86 assembly, this would look something like the following three instructions:

I1. movq counter(%rip), %rax 
I2. addq $1, %rax
I3. movq %rax, counter(%rip)

Discuss with either your tutor or your fellow students a way the two threads could be interleaved in such a way that causes an update to become “lost”.

You may wish to lay your ideas out in table form:

Step	Thread Running	Instruction Executed	Counter Value
1.	T1/T2	`I1`	0
2.
3.
4.
5.
6.
7.
8.

Fixing the Problem (With Mutexes!)#

Modify the code in counting.c to add a global variable pthread_mutex_t counter_lock used to protect updates to the counter. You will need to make sure you initialise the mutex by pthread_mutex_init with the correct arguments, and that a thread will only increment the counter when it is holding the lock.

Is it better to lock and unlock around the outside or inside of the while loop? The former case means that one thread completes all of it’s iterations before the other thread can start incrementing at all. The latter case could incur a large performance hit, as locking and unlocking each iteration introduces overhead. Could you instead perform some N < iterations updates before releasing the lock?

If you want extra work, try experimenting with different locking approaches and see which is the fastest³.

Exercise 3: Producer-Consumer Threads#

The concept of shared buffers and the Producer/Consumer Problem has already been covered in the lectures already, so you may already be familiar. If not, the idea of the “Producer-Consumer” pattern is to designate certain threads as producer threads and others as consumer threads. The role of a producer is to perform some form of computation and/or I/O to produce data needed by other parts of the program. In this scenario the “other parts of the program” are one or more consumer threads. The consumer will use data produced by a consumer thread in some way.

This exercise will look at an example of a very simple Producer-Consumer pattern, with a Producer feeding data read from standard input to a Consumer performing computations on that data. The producer-consumer pattern will usually have multiple producers and/or consumers, but for simplicities sake we will focus on this small example to start with.

Shared Buffers#

How does a producer send data to a consumer? We use a shared data structure for producers to insert items into and consumers to remove items from. This shared data structure will need to be carefully managed to ensure that only one producer or consumer is accessing it at any given time. This shared data structure often takes the form of a queue.

A shared buffer will need:

To keep track of its total capacity, and how many items it currently contains. This is to ensure producers don’t cause a buffer overflow by inserting more items than there is allocator space for.
A locking mechanism to ensure that producers and consumers don’t attempt to modify the contents of the shared buffer concurrently.
A way for producer threads to indicate to a waiting consumer thread that they have inserting a new item, and the buffer is no longer empty.
A way for consumer threads to indicate to a waiting producer thread that they have removed an item from the buffer, and there is now room for the insertion of more data.

You have seen an implementation of a shared buffer in the lectures already, that used semaphores to indicate to producers and consumers when they could add/remove new data:

typedef struct {
    int *buf;          /* Buffer array */         
    int n;             /* Maximum number of slots */
    int front;         /* buf[(front+1)%n] is first item */
    int rear;          /* buf[rear%n] is last item */
    sem_t mutex;       /* Protects accesses to buf */
    sem_t slots;       /* Counts available slots */
    sem_t items;       /* Counts available items */
} sbuf_t;

You can view the full sbuf.c and sbuf.h files to help you understand the idea behind a shared buffer more.

Bad Idea: Shared Buffer With No Synchronisation#

Open the producer-consumer.c file in your repository and inspect the code provided. The program will create two threads — one of which is a producer and the a consumer. The producer thread will read integer values entered into standard input by calling scanf. Whenever it reads a new integer, it checks whether there is room in the buffer (shared_buffer) before inserting this value. The consumer will wait until it sees that there is an item in the shared buffer to consume. It will add the value of the item in the buffer to a running total.

Compile and run the producer-consumer program. You can either enter numbers manually (enter Ctrl+D) to send EOF, or use the numbers.input file present in your repository:

./producer-consumer < numbers.input

If you run this command multiple times, do you get the same result every time?

Currently this program suffers from a similar problem to those we have looked at already. The sbuf_cond_t structure definition has all the components we need to implement proper sharing between the producer and consumer:

typedef struct shared_buffer {
  pthread_mutex_t lock;       // Lock for all updates to to our buffer/
  pthread_cond_t empty, full; // Condition variables for signalling.
  int count;                  // Number of items currently in the shared buffer.
  int data;                   // Our shared buffer only has room for one integer.
} sbuf_cond_t;

Rather than using semaphores like the above, we’re relying on “condition variables” to signal the current status of the buffer to the different threads.

Condition Variables#

The pthreads API also includes a useful tool for facilitating communication between different threads — condition variables.

The idea of condition variables is simple. They exist to allow for threads to signal to other threads that certain conditions have been met, or allow for threads to wait for another thread to tell it certain conditions have been met.

The main functions of relevance are:

int pthread_cond_init(pthread_cond_t *cond, pthread_condattr_t *attr);

int pthread_cond_destroy(pthread_cond_t *cond);

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); 

int pthread_cond_signal(pthread_cond_t *cond);

At this point, the init and destroy functions should be intuitive. The cond_wait function is a bit more involved, however:

The pthread_cond_wait function atomically blocks the current thread
waiting on the condition variable specified by cond, and releases the mutex
specified by mutex. The waiting thread unblocks only after another thread
calls pthread_cond_signal, or pthread_cond_broadcast with the same
condition variable, and the current thread reacquires the lock on mutex.

Look at the relevant man pages for the pthread conditional variable functions.

In particular, pay attention to the section of the manpage for cond_wait that says it needs to have possession of the mutex passed as an argument, and that returning from the wait call does not guarantee that the condition being waited on is True!

This may seem like a counter-intuitive design choice, but there is a rationale behind it (as explained in the man page).

In general, typical usage of a condition variable will look like the following:

int some_cond;
pthread_cond_t cond;
pthread_mutex_t lock;

// Run by thread 1
void foo() {
  pthread_mutex_lock(&lock); 
  // --- Critical Region Begins ---
  while (some_cond == 0)
    pthread_cond_wait(&cond, &lock); // Wait for another thread to signal change in condition.
                                     // Temporarily releases the lock.
  // At this point, some_cond is 'true' and the condition has been signalled.
  // The caller has also regained control of the lock, so we are still in a 
  // critical region.
  ... /* perform some work that relied on 'some_cond' being true */ ...
  pthread_mutex_unlock(&lock); 
  // --- Critical Region Ends ---
}

// Run by thread 2
void bar() {
  pthread_mutex_lock(&lock); 
  // --- Critical Region Begins ---

  ... /* perform some work which allows `some_cond` to become true */...

  some_cond = 1;
  pthread_cond_signal(&cond); // Signal change in condition
  pthread_mutex_unlock(&lock); 
} // --- Critical Region Ends ---

Adding Locks and Condition Variables#

Modify the code so that the consumer will wait on the full signal before attempting to remove items from the shared buffer, and the producer will wait for the empty signal before attempting to add items.

When the consumer has removed an item, it should also signal the empty condition variable so the producer can wake up and start populating the buffer again.

Extension 1: Supporting More Items#

Our shared buffer only has capacity for a single item. As an extension task, modify the code to allow for multiple integers to be stored in it at once.

It would be a good idea to take inspiration from the sbuf.c code from the lectures, especially in regards to how it keeps track of which slots of the buffer are in use.

Extension 2: Supporting More Threads#

Modify the code to allow for multiple consumers or producers to exist, all using the same shared buffer. It is up to you to decide the purpose of the additional produces and consumers is — you could allow for multiple producers by having each read from a different file, perhaps.

Does your implementation still work now that you have more than two threads waiting and signalling on the same condition variables? Don’t be discouraged if you find additional bugs — this is quite a difficult thing to get fully correct.

If you’ve reached this point, you have reached the end of our introduction to using pthreads in C. This knowledge will be very helpful in both next weeks lab and for your second assignment.

More Locking: Read/Write Locks#

There is no associated exercise with this section. We have included it here at the end of the lab because it will become more relevant for next week’s lab, and your second assignment.

As noted in the lectures, when sharing resources between multiple threads, it is often safe to allow multiple readers to access the resource simultaneously as they would not alter any data, but only one writer should be allowed to access the resource at any given time.

The standard mutex, pthread_mutex_t, allows only one thread to access the resource, irrespective of whether the thread is a reader or writer. This behaviour is safe, but not very efficient, as the readers are made to wait even though they are not going to modify the data.

The pthread API offers support for a variation of the standard mutex described above that allows for multiple readers to access a resource at the same time.

The pthread_rwlock_t which is a read-write lock allows multiple readers to access the resource, but allows only one writer at any given time.

pthread_rwlock_t rwlock;

We can initialize rwlock as follows,

pthread_rwlock_init(&rwlock,NULL);

We can destroy the lock to avoid memory leaks as follows,

pthread_rwlock_destroy(&rwlock);

Next, we can apply a read or a write lock depending on the scenario as follows,

pthread_rwlock_rdlock(&rwlock);
pthread_rwlock_wrlock(&rwlock);

Read the manual pages for read-write locks, and think about situations in which they would be useful.

If you are working on MacOS, you may notice an inability to use the semaphore-related functions shown in lectures. ↩
While you won’t have to worry about the two threads overwriting the other’s register values, you may find it useful to keep track of regardless. ↩
You probably won’t get faster performance than by simply calculating iterations << 1 in the main thread, but it is a good idea to familiarise yourself with these kinds of design questions for when we start working on more complex programs. ↩