Outline#

In this week’s lab you will:

  1. Practice reverse engineering x86-64 assembly and hand-optimizing assembly code
  2. Learn about composite data types with structs in C
  3. Learn about optimizing data structure layout for modern memory systems
  4. Learn about dynamically allocating memory on the C heap
  5. Practice writing code for traversing statically and dynamically allocated data structures in C

Preparation#

Like in lab 2, you are going to need to update your local fork of the lab repo with the materials for this week’s lab from the main lab repo. If you are still using the same local fork as with lab 2, then go ahead and follow the below instructions. If that isn’t the case, go back to the lab 2 preparation and follow them through instead.

You can check to make sure your changes from lab 2 are present by running the following in a terminal open at your local repo root:

git remote --verbose

If the output looks like:

$ git remote --verbose
origin  https://gitlab.cecs.anu.edu.au/uXXXXXXX/comp2310-labs.git (fetch)
origin  https://gitlab.cecs.anu.edu.au/uXXXXXXX/comp2310-labs.git (push)
upstream        https://gitlab.cecs.anu.edu.au/comp2310/2022/comp2310-labs.git (fetch)
upstream        https://gitlab.cecs.anu.edu.au/comp2310/2022/comp2310-labs.git (push)

# Where uXXXXXXX is your UID.

then you’re good to follow along with a simple update. If it doesn’t, you’ll need to go back to the lab 2 preparation.

Pull in changes from the upstream remote. Do this with the following commands

git fetch upstream

git pull --no-ff --no-edit upstream main

If you want an explanation of what these do, have a read of the lab 2 preparation.

You should now have a folder called lab3 in your repo. Open that folder in VS Code and continue on with the lab.

If you don’t have that folder then something has gone wrong, if you’re unsure or not able to resolve it yourself then ask your tutors for help!

C to x86-64 Assembly#

We have already discussed the basics of x86-64 assembly and converting C code to x86-64 assembly in the lectures. It is now time to apply the knowledge we gained in practice. In the first exercise below, we provide you with the C code for a recursive function, and your job is to compile the C code using gcc, disabling all optimizations.

Next, you should study the assembly code carefully and become comfortable with the correspondence between the C code and its translation into x86-64 assembly. If you understand the assembly code, you may notice that the code generated by the compiler is far from efficient, let alone optimal. In the exercise below, we ask you to optimize the assembly code by hand.

You will need to generate the executable binary from your hand-optimized assembly code. You should also ensure that the executable binary generated from your assembly code produces the same output as that produced by the executable generated from the original C code.

When aiming for difficult optimizations that lead to greater code efficiency or fewer instructions, you will need to use the GNU debugger (gdb) for debugging your assembly code. You should first read the description of tasks below and skip this section on using gdb. Once you need to debug your optimized assembly code, you should consult this section.

Using gdb#

  • You can run gdb from the command line by typing gdb NAME_OF_EXECUTABLE.
  • Once you see the prompt, you can provide arguments to the program using set args 4 1. (Alternatively, and more conveniently, you can provide arguments to the program when you are directly invoking gdb like so: gdb --args NAME_OF_EXECUTABLE 4 1.)
  • You can then run the program to completion with the r command.
  • Unfortunately, running the program to completion is not very useful for debugging. You can instead set a breakpoint. A breakpoint pauses the program until further instructions are given to the debugger. It is an intentional stopping or pausing place in a program, put in place for debugging purposes. You can set a breakpoint at the main function with b main. It is more convenient and useful to set a breakpoint at functions we are interested in, such as the jacobsthal function in the first exercise.
  • Now, type si to step through the instructions one instruction at a time. gdb will print the last instruction executed on the screen. s is used to step through one source-level statement, in this case the C source code.
  • Use ni to step through the instructions one instruction at a time while stepping over call instructions. n is used to step through one source-level statement while stepping overs calls, i.e. the call is treated as one source-level statement instead of multiple.
  • It is useful to print the register values at any given point. You can type info registers to print all registers.
  • You can print only a specific register using x/x $rax. The first x stands for examine and second x is hex (printing format). You can see other formats using help x. On some versions of gdb, the p command also works for printing register values. Try p $rax.
  • (Optional) The display command enables automatic displaying of registers each time gdb stops at a breakpoint or after a step. Try display $rax.

If you are using Apple silicon (e.g an M1/M1 Pro/M2 CPU), there are two things you need to do in order to be able to complete this lab properly. Firstly, the default Makefile in the lab repo will produce the wrong type of assembly (specifically, ARMv8 assembly instead of x86-64). This can be corrected by adding -arch x86_64 to the flags in your Makefile (the line beginning with CFLAGS=...).

Secondly, you may find that you get an error like this in your terminal trying to run gdb:

zsh: command not found: gdb

If not, great! Go on with the lab as scheduled. If so, you will instead need to use a slightly different debugger, with much of the same functionality, called lldb. The instructions above mostly apply, with a few minor differences, some of which are listed below:

  • To provide program arguments, you should input settings append target.run-args 4 2; which appends to the specific setting target.run-args. Of course, the more convenient option of simply providing the arguments on invoking lldb still exists as in gdb.
  • si won’t print the assembly instructions by default - you will need to use the disassemble command to see the assembly in your terminal. It is highly recommended to carefully read the help disassemble page and try out different options to work out your display preference here.
  • Printing the register values can be done with register read instead of info registers.
  • The x/x syntax in lldb reads only from the memory of the target process, if you want to print only a specific register, you should instead use register read (name) - try register read $rax.

Note that if you’re ever unsure how to do something, you can start an lldb instance and then use the help command for a full list of debugger commands.

Open the file src/jacobsthal.c and run make assembly to compile the C code with the -O0 flag. The -O0 flag turns off all optimizations, generating assembly code that closely resembles the C code. Run the executable and try to understand what the program is doing. Note down the output of the program for a few input values.

Open the generated assembly file from the previous step and try to fully understand the key function, namely jacobsthal. You can ignore the startup and finalization code that interacts with the operating system and assembler directives. You need to focus only on the call to jacobsthal in main and the body of the jacobsthal function. Do you see how the recursive calls in the C code are translated into assembly? Can you set n to a small value (e.g., 2) and draw the call stack to gain insight into how recursion works? Note that call stack is a fancy name for all active stack frames (one corresponding to each active function) on the program’s stack.

Your next task is to optimize the assembly code by hand. You may notice on a quick inspection that there is a lot of low-hanging fruit! Has the compiler made efficient use of the stack space (i.e., main memory resource)? Has the compiler made efficient use of general purpose registers? Are there any instructions you can omit without changing the semantics of the C program? You can generate an executable binary from your assembly code with the make jacobsthal command. Study the make assembly and make jacobsthal commands carefully to know which sources are they using to generate the executable file.

We leave it to your creativity and insight to figure out as many optimizations as possible. If you perform five or more optimizations, please email the course convener with the subject LAB3-REC-OPT. If you feel stuck after a few optimizations, you should prioritize moving on to the rest of the lab handout and exercises.

Data Structures in C#

So far, we have only dealt with variables of a single data type. For example, we have written code to declare and manipulate integer and character variables. This tutorial will introduce grouping objects (or variables) of different data types into a single entity. Specifically, we will introduce structs in C. We will also practice writing code that involves structures.

Structs#

The C struct declaration creates a data type that group variables of different types into a new type. The different components that make up the struct can be referenced via names. Similar to arrays, the different components of a structure are stored in a contiguous region in memory. A pointer to a struct is the address of its first byte. As an example, consider the following declaration of a new composite data type called record,

struct record {
    int id;
    char name[10];
};

The above declaration of a struct creates a new composite structure called record. The structure contains two fields or members: a 4-byte variable of type int, and a ten-element array of type char. Note that the character array is embedded within the structure. The byte offset of the first character from the start of the structure is 4 (an integer is 4 bytes).

The above struct declaration only defines a new type. No space is reserved in memory yet for the members of the record structure. In fact, no record even exists. We can define a new variable of type record and initialise its members as follows,

struct record r1 = {10, "Amanda"};

Once a structure is both declared and a variable of its type defined, we can reference (read or write) its members. We can use the dot operator (.) to reference members of a structure. If we have a pointer to a structure, we can reference its members with the structure pointer (->) operator. Consider the following example,

struct record r1;
r1.id = 10;
r1.name = "Amanda";

In the above example, we have used the dot operator to initialise the members of the struct r1. Now, consider the following code,

struct record r1 = {10, "Amanda"};
struct record *r2 = &r1;
r2->id = 17;
r2->name = "Helen";

We first define a record r1 and initialise its members (the first statement). We then define a pointer with the type record and assign it the address of r1. The r2 pointer contains the memory address of the first element of the r1 structure. Now, we can use the -> operator to access (and read/update) r1 members via the r2 pointer. To properly access the second member (name) of the r1 structure (the third statement), the compiler will add the appropriate offset (4 bytes) to the starting address of r1. We call this process translation from the symbolic reference of a struct member (r2->name) to the appropriate memory address.

When defining a structure such as r1 above, it is cumbersome to prefix struct record before the variable name. C provides a facility called typedef for creating new data type names for convenience. The following declaration,

typedef struct record new_record;

makes the name new_record a synonym for struct record. We can then define a new record variable as follows.

new_record r1 = {17, "Jack"};

When we pass structures to function as arguments, the whole structure (i.e., every member) is copied on the stack frame. It is more efficient to pass a pointer to the structure.

Open the program src/record.c and read the code. Try to guess the output of the program. Compile and run the program to check your guess.

Note that we have declared a new structure outside the main function. We have also declared several functions that we define later in the code (after the main function). Declaring the function header first before defining it in informs the compiler that we intend to define the function later (and use it before its definition in the code).

Open the program src/record.c and this time complete the definitions of the two functions equal and equal_with_copy. Both functions return 1 if the two structures passed as arguments are equal. Otherwise, they return 0. We define equality as follows: (1) the integer ids are equal, (2) the records contain similar names. Use your string utility functions from the previous lab or use an appropriate function from string.h. You can find the available functions in string.h via an online search.

Open the program src/buggy-record.c and read the code. There is a serious bug in the function create_record. Use your understanding of how functions operate at the assembly level and try to explain the bug. You can revisit call stack and stack frames from the lecture slides.

Linked List#

We will now use C structures to build a simple linked list data structure. A linked list is a data structure with a collection of elements or nodes and each node contains a pointer to the next element in the list. A linked list allows traversal over all its elements starting from the head node. To create a linked list of records, we first need to add a new member to the structure that points to the next element in the list. The revised declaration of the record structure is as follows,

struct record {
    int id;
    char name[10];
    struct record *next;
};

The following code creates a linked list of three student records and connects them together into a linked list.

new_record student3 = {23, "Anson", NULL};
new_record student2 = {22, "Steve", &student3};
new_record student1 = {10, "Jack", &student2};

Note that student3 is the last element in the linked list and points to nothing. The NUll value is a built-in C constant that has a value of zero. In fact, it is a synonym for the character 0 used to terminate strings in C. Null can also be the value of a pointer, which is the same as zero.

Open the program src/linkedlist.c and compile and run the program. Observe carefully the definitions of the two functions to print the record and a linked list of records.

It is more convenient to have a function that adds a new record to the end of the linked list of records. So we can write code as follows:

new_record student1 = {10, "Jack", NULL};
new_record student2 = {22, "Steve", NULL};
new_record student3 = {23, "Anson", NULL};

insert(&student1, &student2);
insert(&student1, &student3);

Open the program src/linkedlist.c again and complete the function definition of the insert function. Test your function definition is correct by printing the linked list of records.

We would like to have the ability to delete records from the linked list given the values for the members of a record. Complete the definition of the delete_record function. The function takes the head node and integer id and name as arguments. If a record has matching id and name, it is deleted from the linked list. Test your code by printing the linked list.

Array of Structs#

We can declare an array of structs the same way we can declare arrays of ordinary types. The following declares and defines an array of 100 records,

new_record records[100];

We can access element of the records array the same way we access elements of ordinary arrays. The following example assigns an id of 40 to the tenth element of the records array,

records[9].id = 40;

Alternatively, we can use pointer arithmetic and add the appropriate offset to the starting address of the array of records. Consider the following

(records + 9)->id = 40;

As discussed before, the name of an array of type T is in essence a pointer of type T. When we add 9 to a pointer of type new_record, the compiler under the hood multiplies 9 by the size of the structure to access the tenth element of the array. The -> operator then accesses the id of that element.

What do you expect the sizeof operator to return when applied on the new_record type? Write a small program to test your understanding.

Case Study: Struct of Arrays#

We will now do a case study to gain a deep insight into how data structures interact with the memory hierarchy of modern compute systems. Consider the following structure that represents a point in a 3-dimensional space,

struct point3D {
    int i;
    int j;
    int k;
};

// array of structs
struct point3D points[LEN];

Suppose we have an array of several 3D points (call the array points) and we want to sum all the points along the k dimension. Each point3D element of the points array is 12 bytes in size (4 bytes for i, 4 bytes for j, and 4 bytes for k). Recall that members of a structure are placed contiguously in memory. Recall also that elements of an array are also placed contiguously in memory. The k members of two 3D points, points[0] and points[1], are therefore 12 bytes apart in memory. In fact, each time the CPU accesses a point in the k dimension, it jumps to an address 12 bytes away from the previous access.

We would now like to discuss an aspect of the memory systems in modern CPUs. The main memory in modern computers is built using Dynamic Random Access Memory (DRAM) technology. Each DRAM cell uses a capacitor to store charge. Storing and retrieving charge to/from capacitors is slower than a different memory technology, namely Static Random Access Memory (SRAM). SRAM is built using flip-flops or cross-coupled inverters. SRAM is expensive but fast and DRAM is cheap but slow. Therefore, modern computers use a memory hierarchy that consists of a small amount of fast SRAM (called a cache) and a large amount of slow DRAM. The cache is tightly integrated tightly with the CPU, i.e., it is present on the same die (chip) as the CPU. On the other hand, main memory resides off-chip or outside the CPU die.

On a memory read operation, the CPU first check the fast SRAM (called a cache) for data. If the data is not found in the cache, only then the CPU sends the request to slow main memory (DRAM). A cache hit means the CPU finds the requested data in the cache. A cache miss means the data is not found in the cache. Since DRAM accesses are slow, the CPU tries to be efficient and brings a lot more information than a single 64-bit word. Typically, the CPU loads 64 bytes from memory. This data from memory is then stored in the SRAM cache. The expectation is that if the CPU needs a word at an address A at a given point in time, then it will likely need the neighboring word at address A+1 soon afterward. Surprisingly, a lot of real-world programs exhibit this so-called spatial locality in program references.

The amount of data loaded from memory in each memory access is not related to the size of the CPU’s registers. Instead, it is the CPU’s cache line size, which is the amount of memory loaded in to each cache entry. For example, early Intel Pentium 3 processors were 32-bit processors (i.e. 32 bit registers and ALU), with 512kB caches using a 64 byte cache line size.

Now, let’s go back to the points array. Now that we have more knowledge about the underlying memory system of a modern computer, we can be smart about how we sum all the points along the k dimension in the 3D space. As a start, we can change the representation of our points in space. Consider the following declaration,

struct pointarray3D {
    int i[LEN];
    int j[LEN];
    int k[LEN];
};

Instead of placing the points in the k dimension 12-bytes apart in the array of structs approach, we have placed all the points along the k dimension in the 3D space next to each other. This second representation above is called the struct of arrays approach. By placing the points next to each other, we expect the CPU to more frequently find the k points in the fast SRAM cache. The result is that the number of times the CPU needs to wait for slow main memory is reduced significantly. This way the program that uses the second approach runs faster than the one using the array of structs approach.

Open the programs src/aos.c and src/soa.c and complete the definitions of the sum function. Test your code for correctness. You can measure the time it takes for each program to finish using the time command. For example, if you type time ./program_name on the terminal, the time command reports the total time the program takes to run or execute. You should only look at the real component of the output.

scanf#

You can skip this section if you already know how to use scanf in your programs.

Before moving any further, we would like to introduce the scanf function that you will need for your second assignment. The scanf function receives formatted input from the terminal the same way printf prints formatted strings to the standard output. Consider the following C code for receiving the user name and integer identifier from the standard input or keyboard.

#include <stdio.h>

void main () {

    int id;
    char str1[20];

    printf("Enter name: ");
    scanf("%19s", str1);

    printf("Enter your identifier: ");
    scanf("%i", &id);

    return;
}

Open the program src/scanf.c and read the code carefully to understand what the program is doing. Compile and run the program to make sure your understand how to use scanf in your C programs.

Static vs. Dynamic Memory Allocation#

So far, we have only allocated memory for variables, arrays, and data structures on the call stack or simply the stack. Next, in this tutorial, we will see how we can allocate memory in the global data segment and the heap. Consider the following diagram showing the memory map of a typical program. The entire address space is typically divided into a number of segments. The figure explains the use of different segments.

memory-map

The dynamic data segment holds the stack and the heap. Recall that space for variables on the stack is allocated and deallocated automatically. Specifically, the space for variables on the stack is allocated when functions create new variables that cannot fit into the registers, and deallocated when functions return to their callers.

A heap is a large, subdividable block of memory that is managed by a memory manager. The heap stores data that is allocated by the program during runtime. The memory allocated on the heap is also called dynamically allocated memory. Such memory is used when the exact memory requirements for a program are unknown before runtime (i.e., when the program is written or when it is being compiled). The lifetime of heap objects extend from when they are allocated by the programmer and until they are deallocated explicitly by the programmer. Let’s explore the reasons we need a heap further.

Statically Allocated Memory#

Every object in C has a storage duration that determines its lifetime (i.e., how long it lasts during program execution). Objects declared in file scope in C have static duration or lifetime (a.k.a. storage duration). Such objects (scalar variables, arrays, and data structures) are allocated in the global data segment of the process’ address space. The benefit of declaring variables with global scope is that they can be accessed from anywhere in the source code. The storage duration of such variables lasts the entire execution of the program. (Note here that variables on the stack have automatic storage duration, but we do not need to explicitly specify this duration when declaring variables in C programs. Here, automatic implies the stack space grows and shrinks automatically in response to functions executing and exiting.)

Open the program src/static-array.c and familiarise yourself with the program.

Once you have understood the example program, answer the following questions:

  • Where are the inc function’s variables static_int and stack_int stored?

  • What is the difference between a variable allocated inside the main function and a variable allocated at the file scope, outside of any function?

  • The global_array is globally visible to any function in the source program. Unfortunately, it’s length is fixed at compile time. What should we do if we want a globally visible variable-length array?

  • What are the disadvantages of allocating very large arrays on the stack frame? What are some of the limitations of allocating objects on the stack frame?

Check your answers with a tutor in your lab.

Dynamic Memory Allocation#

We hope you understand the motivation for the program heap while answering the questions above. Now, we will show you and practice dynamically allocating memory on the program heap. Once again, dynamically allocated memory is used when the exact memory requirements of a program are unknown at compile time. Dynamically allocated memory has allocated storage duration. The lifetime of an allocated object extends from its allocation until its deallocation.

Growing the stack downward and the heap upward makes the management of the dynamic data segment easy. Of course, the operating system ensures that stack and the heap never grow into each other.

An easy example of a program that does not know its memory requirements at compile time is a text editor - the program does not know how many words the user is going to type.

The C standard library defines memory management functions for allocating and deallocating (freeing) heap memory. The functions we are concerned with include malloc and free. The malloc function allocates space for an object of a specified size. The size is specified in terms of the number of bytes of memory to be allocated. The malloc function returns a pointer to the starting address of the newly allocated memory. The sizeof operator is really useful for specifying the size of memory to be allocated. The following examples allocate memory on the heap for ten integers, ten characters, and ten structures.

struct student {
    int id;
    char name[20];
};

typedef struct student student_t;

int *int_array = (int *) malloc(10 * sizeof(int));
char *char_array = (char *) malloc(10 * sizeof(char));
student_t *struct_array = (student_t *) malloc(10 * sizeof(student_t));

The malloc function returns NULL if the heap is full. Otherwise, it returns a (void) pointer to the allocated space. Note that in each case in the example above, we cast the pointer returned by malloc to a pointer to the type of the declared object.

A careful programmer must deallocate dynamically allocated memory when it’s no longer needed by calling the free function.

Why is it important to deallocate unused heap memory?

Open the program src/item-array.c and read the code. How big is item_t? What is the total amount of memory consumed by the item_array? Recalculate the consumption accounting for the sizes of individual members contained in each item in the item_array. Can you draw how the item_array looks like on the heap?

Two-Dimensional Array#

We will now dynamically allocate a 2-dimensional array (matrix) with N rows and M columns on the heap. We provide the partial code for the exercise in src/2-dim-array.c. Your task is to initialize each array element to a calculated value, then free the array from memory. This exercise is fairly similar to the previous exercise, with a subtle difference in its solution when freeing memory that is both severely consequential and easy to miss.

Follow the instructions in src/2-dim-array.c and complete the program. Check your solution with a tutor when complete.

Linked List with Dynamic Memory#

We will now revisit the linked list program. We have rewritten the program to ask the user if there is another student to be added to the list. If the user types 1, we then ask for additional details regarding the student’s record, and create and add a new student to the linked list. At any point, if the user types a -1, we terminate the program. This program demonstrates the true power of dynamic allocation. Indeed, we have no clue at compile time, how many students the user intends to add to the linked list. We dynamically allocate memory for each student’s record as the user wishes. If the user types 2, the program asks for the details of the student’s record and removes the appropriate element from the list.

In order to create and destroy linked list elements, you will need to use malloc and free respectively. There are a few other subtle differences this time that are worth noting:

  • The insert and delete operate on the list head, which is a globally defined variable. Thus it is no longer passed as a function argument, and can be modified from within the function.

  • Because head is globally defined and modifiable by functions, you should be able to delete any element of the list - including the first element.

  • head is initially a NULL pointer - Your insert function should not assume that head -> next exists, as it will not exist when the list is empty.

Open the program src/linkedlist-dynamic.c and complete the definitions of insert() and delete(). Check your implementation with a lab tutor when you think you have completed the exercise, or if you are completely stuck.

Most students have solved the linked list exercise with an iterative program using loops. Consider - could you solve the exercise using recursion?

Memory Bugs#

The C programming language requires manual memory management (unlike Java and Python, which offer automatic memory management or garbage collection). The programmer needs to track when an object is no longer needed (i.e., the object is dead) and can be safely freed. Not freeing dead objects leads to memory leaks and the inefficient use of physical (main) memory. Memory leaks can even lead to dreadful out-of-memory errors. Unfortunately, programmers frequently make mistakes and either forget to free memory or free it prematurely.

Your task in this section is to identify and fix all the memory errors in the provided program. The program declares three arrays at run-time with malloc, initializes their elements to zero, and frees them. There are many bugs in the program. Use your intuition about how pointers and dynamic memory allocation work to fix as many bugs as possible. We will revisit memory-related bugs in C programs in a future lecture.

Open src/bug-galore.c, identify, and correct, all bugs in the program.

bars search times arrow-up