Outline#

Rather than having dedicated exercises for the lab this week, in this week’s lab we will go over how to implement the two memory allocators from last week, which you can use as a starting point for your first assignment. If you have already finished the Lab 5 content, you may ask questions about and work on your first assignment instead.

Introduction#

We assume you have read and even attempted Lab 5 before starting this lab. (We will not introduce the relevant topics again.) We strongly advise you to try Lab 5 before seeing the solutions in this lab. We will provide intermediate patches for each step and a complete solution at the end.

Basic setup#

You may either wish to continue using the mymalloc-mmap.c and mymalloc-implicit.c files from lab5. Alternatively, you can copy the lab5 directory into a newly created lab6 directory and work from there.

Exercise 1: Pointer Arithmetic#

This is a warm-up exercise/pop quiz to review how Pointer Arithmetic works in C. If you remember the section on Arrays and Pointer Arithmetic from Lab 1 this will be very familiar to you. Be warned — not understanding how this works means you will struggle quite a bit with implementing your memory allocator.

Consider the following C code, which defines pointers to various different types and performs pointer arithmetic on them:

// Assume sizeof(point) == 24
typedef struct point {
  int x;
  long y;
  int z; 
} point; 

int main(void) {
  int   *ptr_to_int   = (int   *) 0x100000000;
  char  *ptr_to_char  = (char  *) 0x200000000;
  long  *ptr_to_long  = (long  *) 0x300000000;
  point *ptr_to_point = (point *) 0x400000000; 

  int   *A = ptr_to_int   + 1;
  char  *B = ptr_to_char  + 5;
  long  *C = ptr_to_long  + 2;
  point *D = ptr_to_point + 3;
  int   *E = (int *)  (((char *) ptr_to_int)   + 8);
  char  *F = (char *) (((long *) ptr_to_point) + 6) + 1; 
}

Determine what the values of A, B, C, D, E and F will be. You may assume:

  1. sizeof(char) = 1
  2. sizeof(int) = 4
  3. sizeof(long) = 8
  4. sizeof(point) = 241
int   *A = ???;
char  *B = ???;
long  *C = ???;
point *D = ???;
int   *E = ???;
char  *F = ???; 
Click here to reveal the answers when you are ready.
Variable Value Explanation
A 0x100000004 ptr_to_int is a pointer to an int, so ptr_to_int+1 adds 4 bytes to the address (as sizeof(int) == 4).
B 0x200000005 ptr_to_char is a pointer to a char, so ptr_to_char + 5 adds 5 * sizeof(char) bytes to the address.
C 0x300000010 sizeof(long) == 8, so 2 * 8 == 16 is added to ptr_to_long.
D 0x400000048 sizeof(point) == 24, meaning 3*24 = 72 (0x48 in hexadecimal) is added to ptr_to_point.
E 0x100000008 Since ptr_to_int is cast to a char * before adding 8, this adds 8 bytes to ptr_to_int.
F 0x400000031 Similar to E, ptr_to_point is cast to a long * before adding 6 (resulting in 0x400000030). Then it is cast to a char * before adding 1, resulting in 0x400000031.

If you are still confused about how pointer arithmetic works, you should ask your tutor to explain it to you — understanding this is very important in implementing your memory allocator(s).

Exercise 2: An mmap-based allocator#

We first draw our attention to the mymalloc-mmap.c file.

#include "mymalloc-mmap.h"

void *my_malloc(size_t size) {
  /** TODO */
  return NULL;
}

void my_free(void *ptr) {
  /** TODO **/
}

Allocating memory#

For this mmap allocator, if my_malloc is called to allocate \(N\) bytes, we directly call mmap to allocate a chunk with size no less than \(N\) bytes. Each mmap-ed chunk has a piece of meta-data memory for us to record the size of the chunk. So later when we call munmap, we will always know the size of the mmap-ed chunk. We want to have this meta-data as otherwise we may not know what the size of the allocation was.

The mymalloc-mmap.h file defines the Chunk data structure you will be using, and defines two useful constants for you to use — kMetadataSize and
kMaxAllocationSize. The purpose of the Chunk data structure is to keep track of the size of the allocation it points to. The contents of mymalloc-mmap.h includes:

...
// Chunk data structure
typedef struct {
  size_t size;
} Chunk;

// Size of meta-data per Chunk
const size_t kMetadataSize = sizeof(Chunk);
// Maximum allocation size (16 MB)
const size_t kMaxAllocationSize = (16ull << 20) - kMetadataSize;
...

If you’re unsure about why we need to store metadata, think about what will happen when it comes time to implement my_free.

We need to un-map the region of memory returned by mmap using munmap. If you look at the definition of munmap you will see that you need to pass a len argument:

int munmap(void *addr, size_t len);

You won’t be able to free any memory without keeping track of the size!

We can now implement the my_malloc function by calling mmap with the appropriate arguments. The function signature for mmap is:

void *mmap(void *addr, size_t len, int prot, int flags, int fd, off_t off);

Remember to re-read the relevant section of Lab 5 for an explanation on what addr, prot, flags, fd and off mean, and what they should be set to.

Since we will be storing the size of the Chunk in the first word of the allocation, we want to return the address of the data portion of our allocated memory, so we need to add +1 to the chunk before returning. Re-read the section on Pointer Arithmetic above if you’re confused what the result of adding 1 will be.

We also need to make sure we have enough space reserved for the metadata itself, so we need to add this to the size argument we pass to mmap. The metadata size was helpfully defined for us already in the mymalloc-mmap.h header file in the variable kMetadataSize.

Add your code to mymalloc-mmap.c to implement the my_malloc function as follows:

...
#include <sys/mman.h> /* Remember to include this! */

void *my_malloc(size_t size) {
  size_t chunk_size = kMetadataSize + size;
  Chunk *chunk = mmap(NULL,
                      chunk_size, 
                      PROT_READ | PROT_WRITE, 
                      MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
  chunk->size = chunk_size;
  return chunk+1;
}
...

If you don’t understand why we add 1 to chunk before returning this, think about what would happen if we didn’t. In particular, given the following program which uses my_malloc to allocate enough space for a 5-element array and then writes values to it:

int main(void) {
  long *array = my_malloc(sizeof(long) * 5); 
  for (int i=0; i<5; i++) {
    array[i] = i; 
  }

  my_free(array); 
}

Would this cause problems2? What will happen to the size field of the Chunk we set in my_malloc?

We also want to make sure that mmap hasn’t failed. If it has then we want to return NULL. From mmap’s man page, we can see that mmap returns a value of MAP_FAILED when mmap fails. Hence we can check for errors like so:

void *my_malloc(size_t size) {
  size_t chunk_size = kMetadataSize + size;
  Chunk *chunk = mmap(NULL,
                      chunk_size, 
                      PROT_READ | PROT_WRITE, 
                      MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
  if (chunk == MAP_FAILED) {
    return NULL;
  }
  chunk->size = chunk_size;
  return chunk+1;
}

mmap size#

An important thing to consider about mmap is that it is used to map pages of memory. This means that when you call mmap with a given length it will “round up” the number of bytes you request to the nearest multiple of the system’s page size. On most Linux systems, the page size is 4KB (4096 bytes). If you are using an ARM Mac, this will be 16KB.

Deallocating memory#

Now that our my_malloc function is (more-or-less) complete, let us now tackle implementing the my_free function. Because we used mmap to allocate memory, we should use munmap to de-allocate memory.

In my_free, the first thing we need to do is reverse the operation we performed in my_malloc to “skip over” the metadata:

void my_free(void *ptr) {
  Chunk *chunk = ((Chunk *) ptr) - 1; 
  munmap(chunk, chunk->size); 
}

As with mmap, munmap can fail. We need to check the result of munmap and abort the execution of the program if we have encountered an error. Generally if we are writing programs, we want to gracefully handle errors as it allows the program to redo or retry certain actions, however, memory allocator libraries are different. Since such a library is so low-level, any error in the allocator will almost certainly result in catastrophic failure of the program using the allocator. Hence, it is best that if we find an error state in our allocator, we inform the program of the error and then exit.

#include <string.h> /* Defines `strerror` */
#include <errno.h>  /* Defines `errno`    */
#include <stdlib.h> /* Defines `exit`     */

...

void my_free(void *ptr) {
  Chunk *chunk = ((Chunk *) ptr) - 1; 
  int ret = munmap(chunk, chunk->size); 
  if (ret < 0) {
    fprintf(stderr, "my_free error: %s\n", strerror(errno));
    exit(1);
  }
}

Our implementation of my_malloc doesn’t exit the program when we encounter an error, and instead returns NULL. Can you think of any reason we would want to exit when munmap fails, but not when mmap fails?

Corner cases#

To be comprehensive, we need to resolve a few corner cases:

  • For allocations with zero or greater than the maximum allocation size, my_malloc should return NULL. In such cases, the user of the my_malloc library must check they aren’t using a NULL pointer by accident.
  • Freeing a NULL pointer is a no-op.
...

void *my_malloc(size_t size) {
  size_t chunk_size = size + kMetadataSize;
  if (size == 0 || size > kMaxAllocationSize) {
    return NULL;
  }
  Chunk *chunk = mmap(NULL, chunk_size, 
                      PROT_READ | PROT_WRITE, 
                      MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
  if (chunk == MAP_FAILED) {
    return NULL;
  }
  chunk->size = chunk_size;
  return chunk+1;
}

void my_free(void *ptr) {
  if (ptr == NULL) {
    return NULL;
  }
  Chunk *chunk = ((Chunk *) ptr) - 1; 
  int ret = munmap(chunk, chunk->size); 
  if (ret < 0) {
    fprintf(stderr, "my_free error: %s\n", strerror(errno));
    exit(1);
  }
}

All Together#

The finished source code should look something like this:

#include "mymalloc-mmap.h"
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

inline static size_t round_up(size_t size, size_t alignment) {
  const size_t mask = alignment - 1;
  return (size + mask) & ~mask;
}

void *my_malloc(size_t size) {
  size_t chunk_size = size + kMetadataSize;
  if (size == 0 || size > kMaxAllocationSize) {
    return NULL;
  }
  Chunk *chunk = mmap(NULL, chunk_size, 
                      PROT_READ | PROT_WRITE, 
                      MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
  if (chunk == MAP_FAILED) {
    return NULL;
  }
  chunk->size = chunk_size;
  return chunk+1;
}

void my_free(void *ptr) {
  if (ptr == NULL) {
    return;
  }
  Chunk *chunk = ((Chunk *) ptr) - 1; 
  int ret = munmap(chunk, chunk->size); 
  if (ret < 0) {
    fprintf(stderr, "my_free error: %s\n", strerror(errno));
    exit(1);
  }
}

Running Tests#

After finishing the implementation of my_malloc and my_free, you need to use the testing script to run tests. Testing is required to check the correctness of your implementation. Since we named our file mymalloc-mmap.c, we run the tests like so:

$ python3 ./test.py -m mymalloc-mmap

Other usages:

  • Run a single test: python3 ./test.py -t align.
  • Run tests with release build: python3 ./test.py --release.
  • Enable logging: python3 ./test.py --log.
    • Use LOG("Hello %s\n", "world"); for logging in your source code.

Using a Debugger#

If your tests are failing and you aren’t sure why, the best approach is to Use a Debugger. Rather than running the test script directly, you may find it easier to compile the tests manually. You can do so by running the following commands (for macOS, replace gdb with lldb):

make test LOG=1 
gdb ./tests/test-name

Commit and push your changes to mymalloc-mmap.c. The CI will run the same tests as those present in your tests/ directory. Make sure everything is working as expected before moving on.

Exercise 3: Implicit Free-List Allocator#

We will now implement a simple implicit free-list allocator.

Acquire Initial Memory#

To reduce the number of expensive system calls, we can get a large chunk of memory in one go and manage it on our own. Let us say we will acquire 64 MB of memory once from the operating system. We can then service allocation requests from the 64 MB chunk, and this chunk size is enough for most of our current tests.

Before we can acquire memory, we must think about how to represent a memory block. Note that a “block” here is just a name for the allocation primitive we will work with. That is to say, when we are asked to allocate an object in my_malloc, we will allocate a “block” that contains meta-data followed by the actual allocation. We define a Block to have two pieces of meta-data:

  • whether the Block is currently allocated or not;
  • what the allocation size of the Block is (including its meta-data size):

This has been defined for you in mymalloc-implicit.h:

typedef struct Block {
  // Size of the block, including meta-data size.
  size_t size;
  // Is the block allocated or not?
  bool allocated;
} Block;

We can now acquire memory from the OS using mmap if we are allocating for the first time. Add the following code to mymalloc-implicit.c:

#include "mymalloc-implicit.h"
#include <sys/mman.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

Block *start_block = NULL;

void *initialise(void) {
  start_block = mmap(NULL, kMemorySize, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, 0, 0);
  if (start_block == MAP_FAILED) {
    LOG("mmap_failed: %s\n", strerror(errno));
    return NULL;
  }
  start_block->size = kMemorySize;
  start_block->allocated =  false;
  return start_block;
}

void *my_malloc(size_t size) {
  if (start_block == NULL) {
    start_block = initialise();
  }
  return NULL;
}

... /* rest of the file down here */

We don’t need to explicitly set start_block->allocated = false, since the memory returned by mmap will be initialised to all zeroes. This means it will be set to false already — we just include it here for clarity.

Allocate Memory from the Global Free List#

We use a global implicit free list to manage the memory blocks. Instead of adding explicit next/prev pointers to block meta-data, we use the block start address and block size to traverse blocks inside a contiguous piece of memory. This as an implicit free list[^3].

[^3:] An explicit free list (where blocks contain prev/next pointers) is what you will build for your first assignment.

The only block meta-data we need to maintain is the free/allocated flag and the block size. For allocation, we traverse the list, find the first free block large enough, and split it into two blocks if necessary.

We first fill out the two helper functions, block_size and is_free, which return the block size and allocation status, respectively:

size_t block_size(Block *block) {
  return block->size;
}

int is_free(Block *block) {
  return block->allocated;
}

It may seem unnecessary to write these functions, as they are simple wrappers around struct member accesses. When you being working on more complex optimisations as part of your assignment it will be useful to have helper functions like these, so we want you to write them now rather than forget to later.

We fix the traversal order by always starting from the first block (i.e., the lowest address) and moving towards the last block (i.e., the highest address). To implement this traversal, we need to implement a function that will return the next block from a given block. This will require being able to add the block size number of bytes to the “current” block — for this we will define a helper macro ADD_BYTES and a helper function to get the next block in memory:

#define ADD_BYTES(ptr, n) ((void *) (((char *) (ptr)) + (n)))

Block *get_next_block(Block *block) {
  if (!block) {
    return NULL;
  }
  Block *next_block = ADD_BYTES(block, block->size);

  if (next_block >= (Block *) ADD_BYTES(start_block, kMemorySize)) {
    return NULL;
  }
  return next_block;
}

Traversing through the entire free list then becomes a matter of repeatedly calling get_next_block in a while loop (starting with start_block) until it returns NULL:

Block *current = start_block;
while (current != NULL) {
  current = get_next_block(current); 
}

Now that we have a function that can traverse through our free list, we need to find an algorithm for choosing an appropriate free block in which to place a newly allocated block. We will implement the first fit algorithm. First fit searches the free list from the beginning and chooses the first free block that fits. The idea is to traverse the free list until we find a Block large enough to fit our allocation. We may end up wasting memory if the chosen block is larger than the requested size. In that case, we may decide to split the block into two parts. We will worry about splitting blocks later in the lab. If no Block can be found, we will return NULL. Putting it together, we have the following function to traverse through our free list:

Block *find_free_block(size_t size) {
  Block *current = start_block;
  while (current != NULL) {
    if (current->size >= size && !current->allocated)
      break;
    current = get_next_block(current); 
  }
  return current; 
}

Now we can slot this into my_malloc to find and allocate a block of memory! We also make sure to check whether find_free_block failed and set errno to ENOMEM.

void *my_malloc(size_t size) {
  if (start_block == NULL) {
    start_block = initialise();
  }
  size_t total_size = size + kMetadataSize;
  Block *block = find_free_block(total_size); 
  if (block == NULL) {
    errno = ENOMEM;
    return NULL;
  }

  block->allocated = true; 
  return ADD_BYTES(block, kMetadataSize); 
}

Splitting Blocks#

Before we implement my_free, let us address the fact our my_malloc implementation is only able to allocate one block before running out of available space, and is not currently any more space efficient than the mmap implementation, by implementing splitting a given Block into two:

/** Splits a the block starting at the given address into two. 
 *  Assumes size includes metadata size. */
Block *split_block(Block *block, size_t size) {
  size_t original_size = block->size; 
  block->size = size; 
  Block *right = get_next_block(block); 
  right->size = original_size - block->size; 
  right->allocated = false;
  return block; 
}

We now use this split_block function to implement my_malloc by splitting the block we find with our first-fit algorithm if it is too large:

void *my_malloc(size_t size) {
  if (start_block == NULL) {
    start_block = initialise();
  }
  size_t total_size = size + kMetadataSize;
  Block *block = find_free_block(size); 
  if (block == NULL) {
    errno = ENOMEM;
    return NULL;
  } else if (block->size > (total_size + kMetadataSize + kMinAllocationSize)) {
    block = split_block(block, total_size); 
  }

  block->allocated = true; 
  return ADD_BYTES(block, kMetadataSize); 
}

Freeing Memory#

To release a memory block, we get the block start address from the given pointer, and mark the block as free:

void my_free(void *ptr) {
  Block *block = ADD_BYTES(ptr, 0-kMetadataSize); 
  block->allocated = false;
}

Alignment#

If you run all the tests now , you’ll see one important test (align) failing. We see that it failed with a message:

align: tests/align.c:12: int main(): Assertion 'is_word_aligned(ptr)' failed.

Checking tests/align.c line 12, we can see that the first allocation void *ptr = mallocing(1); may not return a word-aligned address. The issue lies in the fact that we are not aligning our size to the next word. We can fix this by rounding up our allocation size correctly (note how we had to change the minimum allocation size as well):


/** Note that this function only works if `alignment` is a power of 2. **/
static inline size_t round_up(size_t size, size_t alignment) {
  const size_t mask = alignment - 1;
  return (size + mask) & ~mask;
}

void *my_malloc(size_t size) {
  if (start_block == NULL) {
    start_block = initialise();
  }
  size_t total_size = round_up(size + kMetadataSize, kMetadataSize); 
  Block *block = find_free_block(total_size); 
  if (!block) {
    return NULL;
  } else if (block->size > (total_size + kMetadataSize + kMinAllocationSize)) {
    block = split_block(block, total_size); 
  }
  
  block->allocated = true;
  return ADD_BYTES(block, kMetadataSize); 
}

Handling Corner Cases (again)#

We now have a mostly correct implementation of an implicit free-list allocator. We just have to handle a few corner cases again, just like for the mmap allocator.

void *my_malloc(size_t size) {
  if (size == 0 || size >= kMaxAllocationSize) {
    return NULL; 
  }
  if (start_block == NULL) {
    start_block = initialise();
    if (start_block == MAP_FAILED) {
      return NULL;
    }
  }
  size_t total_size = round_up(size + kMetadataSize, kMetadataSize); 
  Block *block = find_free_block(total_size); 
  if (!block) {
    return NULL;
  } else if (block->size > (total_size + kMetadataSize + kMinAllocationSize)) {
    block = split_block(block, total_size); 
  }
  
  block->allocated = true;
  return ADD_BYTES(block, kMetadataSize); 
}

void my_free(void *ptr) {
  if (!ptr) return; 
  Block *block = ADD_BYTES(ptr, 0-kMetadataSize); 
  block->allocated = false;
}

All Together#

The full implicit free list allocator:

#include "mymalloc-implicit.h"
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

#define ADD_BYTES(ptr, n) ((void *)(((char *)(ptr)) + (n)))

Block *start_block = NULL;

static inline size_t round_up(size_t size, size_t alignment) {
  const size_t mask = alignment - 1;
  return (size + mask) & ~mask;
}

void *initialise(void) {
  start_block = mmap(NULL, kMemorySize, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, 0, 0);
  if (start_block == MAP_FAILED) {
    LOG("mmap_failed: %s\n", strerror(errno));
    return NULL;
  }
  start_block->size = kMemorySize;
  start_block->allocated = false;
  return start_block;
}

Block *get_next_block(Block *block) {
  if (!block) {
    return NULL;
  }
  Block *next_block = ADD_BYTES(block, block->size);

  if (next_block >= (Block *)ADD_BYTES(start_block, kMemorySize)) {
    return NULL;
  }
  return next_block;
}

/** Splits a the block starting at the given address into two.
 *  Assumes size includes metadata size. */
Block *split_block(Block *block, size_t size) {
  size_t original_size = block->size;
  block->size = size;
  Block *right = get_next_block(block);
  right->size = original_size - block->size;
  right->allocated = false;
  return block;
}

Block *find_free_block(size_t size) {
  Block *current = start_block;
  while (current != NULL) {
    if (current->size >= size && !current->allocated)
      break;
    current = get_next_block(current);
  }
  return current;
}

size_t block_size(Block *block) {
  return block->size;
}

int is_free(Block *block) {
  return block->allocated;
}

void *my_malloc(size_t size) {
  if (size == 0 || size >= kMaxAllocationSize) {
    return NULL;
  }
  if (start_block == NULL) {
    start_block = initialise();
    if (start_block == MAP_FAILED) {
      return NULL;
    }
  }
  size_t total_size = round_up(size + kMetadataSize, kMetadataSize);
  Block *block = find_free_block(total_size);
  if (!block) {
    return NULL;
  } else if (block->size > (total_size + kMetadataSize + kMinAllocationSize)) {
    block = split_block(block, total_size);
  }

  block->allocated = true;
  return ADD_BYTES(block, kMetadataSize);
}

void my_free(void *ptr) {
  if (!ptr) return;
  Block *block = ADD_BYTES(ptr, 0 - kMetadataSize);
  block->allocated = false;
}

Running Tests#

After finishing the my_malloc and my_free implementations, rerun the tests to see if they are passing or not.

To run the testing script with the implicit free list allocator, invoke the testing script as follows:

python3 ./test.py -m mymalloc-implicit

Note that all tests passing does not necessarily mean that your implementation is correct! Try making some small tests that use your my_malloc and my_free functions to allocate memory dynamically. Try to think of edge cases and other conditions that may break the assumptions of your implementation.

If you have understand all of the above, good work! Having a working implicit free list allocator (and more importantly understanding how it all works) will make tackling your first assignment a breeze ✨!

  1. The reason it is not 16 bytes (sizeof(int) + sizeof(long) + sizeof(int)) is because of padding

  2. Whenever we ask this question in a lab document the answer is “yes, it will cause problems”. 

bars search times