Task 1: Quantitative Analysis#
In this task, you will quantitatively compare syscall I/O to memory-mapped file I/O for small I/O accesses (up to 64 bytes). Specifically, we ask you to write a simple benchmarking framework to do a large number of small read/write file I/O operations on a file using the following techniques. We leave it to you to create the file and write the C code from groud up.
- Use native
read()
/write()
system calls to perform a number of I/O operations on a file - Use standard C library
fread()
andfwrite()
functions to perform I/O operations - Use the robust I/O (RIO) package from the course website to perform I/O operations (use the appropriate read/write functions from
csapp.h
) - Use memory-mapped file I/O (MMIO) to perform I/O operations (use pointer arithmetic where necessary)
Your benchmark code should use #define
to create constants for: (1) number of read and write operations and (2) size of each I/O operation
Recall that the standard C library and the robust I/O package maintain an internal buffer to reduce the number of system calls. They copy the requested data (mostly) from the internal buffer to the user-provided buffer. This copy is avoidable when using native read()/write() system calls. In both cases, the OS kernel does its own buffering in the page cache to amortize the high cost of disk transfers.
Recall that mapping a disk file into virtual memory with mmap() returns a pointer that the programmer can use for accessing the file. The first access to the file (e.g., read operation using pointer) results in a transfer of an entire 4 KB (page) from disk drive to main memory. Subsequent accesses results in no transfer and with MMIO there is no copying operations like the ones in syscall I/O. The program has direct access to the file contents transferred by the kernel into main memory.
Bottomline: MMIO uses the virtual memory abstraction and relies on page faulting for disk to memory transfers. Its advantage is direct (pointer) access to file contents. On the other hand, syscall I/O results in one (at least) and up to two memory copies in addition to disk to memory transfers. How these tradeoffs play out in a real system depends on the underlying architecture and program’s I/O behavior.
You should use the Linux shell’s time
command to measure your benchmark’s execution times. You should measure the execution time for each of the above I/O techniques.
Please send an email to shoaib.akram@anu.edu.au
with your results and analysis.
Task 2: Understanding functions in the robust I/O package#
Read the code for the following functions from the RIO package, and try to comprehend the implementation of the following functions.
static ssize_t rio_read(rio_t *rp, char *usrbuf, size_t n)
ssize_t rio_readnb(rio_t *rp, void *usrbuf, size_t n)
ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen)
Task 3: Understanding additional flags for opening files#
Use the man pages to understand the meaning and use of the O_DIRECT
and O_NONBLOCK
flags during file creation.
Task 4: Understanding fflush() and fsync()#
Use Google and man pages to understand the need and use of (1) C library function fflush()
and (2) system-level function fsync()
.