# COMP2310/COMP6310 Systems, Networks, & Concurrency Convener: Shoaib Akram

### **A System Using Physical Addressing**



 Used in "simple" systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames

### **A System Using Virtual Addressing**



Data word

- Used in all modern servers, laptops, and smart phones
- One of the great ideas in computer science

### **Address Spaces**

Linear address space: Ordered set of contiguous non-negative integer addresses:

- Virtual address space: Set of N = 2<sup>n</sup> virtual addresses {0, 1, 2, 3, ..., N-1}
- Physical address space: Set of M = 2<sup>m</sup> physical addresses {0, 1, 2, 3, ..., M-1}

# Why Virtual Memory (VM)?

#### Uses main memory efficiently

Use DRAM as a cache for parts of a virtual address space

#### Simplifies memory management

Each process gets the same uniform linear address space

#### Isolates address spaces

- One process can't interfere with another's memory
- User program cannot access privileged kernel information and code

### VM as a Tool for Caching

- Conceptually, virtual memory is an array of N contiguous bytes stored on disk.
- The contents of the array on disk are cached in *physical memory* (*DRAM cache*)
  - These cache blocks are called pages (size is P = 2<sup>p</sup> bytes)



## **DRAM Cache Organization**

#### DRAM cache organization driven by the enormous miss penalty

- DRAM is about **10x** slower than SRAM
- Disk is about **10,000x** slower than DRAM

#### Consequences

- Large page (block) size: typically 4 KB, sometimes 4 MB
- Fully associative
  - Any VP can be placed in any PP
  - Requires a "large" mapping function different from cache memories
- Highly sophisticated, expensive replacement algorithms
  - Too complicated and open-ended to be implemented in hardware
- Write-back rather than write-through

### **Enabling Data Structure: Page Table**

- A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.
  - Per-process kernel data structure in DRAM



### Page Hit

 Page hit: reference to VM word that is in physical memory (DRAM cache hit)



### Page Fault

 Page fault: reference to VM word that is not in physical memory (DRAM cache miss)



Page miss causes page fault (an exception)



- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)



- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)



- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)
- Offending instruction is restarted: page hit!



### **Allocating Pages**

#### Allocating a new page (VP 5) of virtual memory.



### Locality to the Rescue Again!

- Virtual memory seems terribly inefficient, but it works because of locality.
- At any point in time, programs tend to access a set of active virtual pages called the *working set* 
  - Programs with better temporal locality will have smaller working sets
- If (working set size < main memory size)</p>
  - Good performance for one process after compulsory misses
- If (SUM(working set sizes) > main memory size)
  - Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously

### VM as a Tool for Memory Management

**Key idea: each process has its own virtual address space** 

- It can view memory as a simple linear array
- Mapping function scatters addresses through physical memory
  - Well-chosen mappings can improve locality



### VM as a Tool for Memory Management

#### Simplifying memory allocation

- Each virtual page can be mapped to any physical page
- A virtual page can be stored in different physical pages at different times
- Sharing code and data among processes
  - Map virtual pages to the same physical page (here: PP 6)



# **Simplifying Linking and Loading**

#### Linking

- Each program has similar virtual address space
- Code, data, and heap always start at the same addresses.

#### Loading

- execve allocates virtual pages for .text and .data sections & creates PTEs marked as invalid
- The .text and .data sections are copied, page by page, on demand by the virtual memory system



### VM as a Tool for Memory Protection

- Extend PTEs with permission bits
- MMU checks these bits on each access



### Summary: Why virtual memory?

#### Illusion of a large address space

 Use physical memory as a cache for disk-resident virtual address space

#### Efficient memory management

- Processes have a uniform memory map and layout
- Easy for compiler/linkers to target a uniform address space

#### Sharing of code and data

- Processes can share code and data efficiently
- Multiple virtual pages mapped to one physical copy of data

#### Protection

Use bits in the page table entries to protect invalid accesses

### **VM Address Translation**

- Virtual Address Space
  - *V* = {0, 1, ..., *N*−1}
- Physical Address Space
  - *P* = {0, 1, ..., *M*−1}
- Address Translation
  - MAP:  $V \rightarrow P \ U \ \{\emptyset\}$
  - For virtual address a:
    - MAP(a) = a' if data at virtual address a is at physical address a' in P
    - $MAP(a) = \emptyset$  if data at virtual address *a* is not in physical memory
      - Either invalid or stored on disk

# **Summary of Address Translation Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- M = 2<sup>m</sup>: Number of addresses in physical address space
- P = 2<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- TLBT: TLB tag
- **VPO**: Virtual page offset
- VPN: Virtual page number
- Components of the physical address (PA)
  - **PPO**: Physical page offset (same as VPO)
  - **PPN:** Physical page number

### **Address Translation With a Page Table**



**Physical address** 

### **Address Translation: Page Hit**



- 1) Processor sends virtual address to MMU
- 2-3) MMU fetches PTE from page table in memory
- 4) MMU sends physical address to cache/memory
- 5) Cache/memory sends data word to processor

### **Address Translation: Page Fault**



- 1) Processor sends virtual address to MMU
- 2-3) MMU fetches PTE from page table in memory
- 4) Valid bit is zero, so MMU triggers page fault exception
- 5) Handler identifies victim (and, if dirty, pages it out to disk)
- 6) Handler pages in new page and updates PTE in memory
- 7) Handler returns to original process, restarting faulting instruction

### **Integrating VM and Cache**



VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address

### Speeding up Translation with a TLB

- Page table entries (PTEs) are cached in L1 like any other memory word
  - PTEs may be evicted by other data references
  - PTE hit still requires a small L1 delay

#### Solution: Translation Lookaside Buffer (TLB)

- Small set-associative hardware cache in MMU
- Maps virtual page numbers to physical page numbers
- Contains complete page table entries for small number of pages

### Accessing the TLB

MMU uses the VPN portion of the virtual address to access the TLB:



### **TLB Hit**



A TLB hit eliminates a memory access

### **TLB Miss**



#### A TLB miss incurs an additional memory access (the PTE)

Fortunately, TLB misses are rare. Why?

### **Multi-Level Page Tables**

Suppose:

**Problem:** 

Level 2 **Tables** 4KB (2<sup>12</sup>) page size, 48-bit address space, 8-byte PTE Level 1 Would need a 512 GB page table! Table 2<sup>48</sup> \* 2<sup>-12</sup> \* 2<sup>3</sup> = 2<sup>39</sup> bytes **Common solution: Multi-level page table Example: 2-level page table** Level 1 table: each PTE points to a page table (always memory resident) Level 2 table: each PTE points to a page

(paged in and out like any other data)

### **A Two-Level Page Table Hierarchy**



### **Translating with a k-level Page Table**



### **Summary**

#### Programmer's view of virtual memory

- Each process has its own private linear address space
- Cannot be corrupted by other processes

#### System view of virtual memory

- Uses memory efficiently by caching virtual memory pages
  - Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions

### Agenda

- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Review of Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- **M** = 2<sup>m</sup>: Number of addresses in physical address space
- P = 2<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- **TLBT**: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number

#### Components of the physical address (PA)

- **PPO**: Physical page offset (same as VPO)
- **PPN:** Physical page number
- **CO**: Byte offset within cache line
- **CI:** Cache index
- **CT**: Cache tag

### Simple Memory System Example

#### Addressing

- 14-bit virtual addresses
- 12-bit physical address
- Page size = 64 bytes



### **1. Simple Memory System TLB**

- 16 entries
- 4-way associative



| Set | Тад | PPN | Valid |
|-----|-----|-----|-------|-----|-----|-------|-----|-----|-------|-----|-----|-------|
| 0   | 03  | _   | 0     | 09  | 0D  | 1     | 00  | -   | 0     | 07  | 02  | 1     |
| 1   | 03  | 2D  | 1     | 02  | -   | 0     | 04  | -   | 0     | 0A  | -   | 0     |
| 2   | 02  | _   | 0     | 08  | _   | 0     | 06  | -   | 0     | 03  | -   | 0     |
| 3   | 07  | _   | 0     | 03  | 0D  | 1     | 0A  | 34  | 1     | 02  | _   | 0     |

### 2. Simple Memory System Page Table

Only show first 16 entries (out of 256)

| VPN | PPN | Valid |  |  |  |  |
|-----|-----|-------|--|--|--|--|
| 00  | 28  | 1     |  |  |  |  |
| 01  | -   | 0     |  |  |  |  |
| 02  | 33  | 1     |  |  |  |  |
| 03  | 02  | 1     |  |  |  |  |
| 04  | —   | 0     |  |  |  |  |
| 05  | 16  | 1     |  |  |  |  |
| 06  | _   | 0     |  |  |  |  |
| 07  | _   | 0     |  |  |  |  |

| VPN        | PPN | Valid |
|------------|-----|-------|
| 08         | 13  | 1     |
| 09         | 17  | 1     |
| <b>0</b> A | 09  | 1     |
| OB         | _   | 0     |
| <b>0C</b>  | _   | 0     |
| <b>0</b> D | 2D  | 1     |
| OE         | 11  | 1     |
| OF         | 0D  | 1     |

### **3. Simple Memory System Cache**

- 16 lines, 4-byte block size
- Physically addressed
- Direct mapped



| Idx | Тад | Valid | BO | <b>B1</b> | <b>B2</b> | <b>B3</b> | Idx | Tag | Valid | BO | <b>B1</b> | B2 | <b>B3</b> |
|-----|-----|-------|----|-----------|-----------|-----------|-----|-----|-------|----|-----------|----|-----------|
| 0   | 19  | 1     | 99 | 11        | 23        | 11        | 8   | 24  | 1     | 3A | 00        | 51 | 89        |
| 1   | 15  | 0     | -  | -         | _         | -         | 9   | 2D  | 0     | _  | -         | -  | -         |
| 2   | 1B  | 1     | 00 | 02        | 04        | 08        | Α   | 2D  | 1     | 93 | 15        | DA | 3B        |
| 3   | 36  | 0     | _  | _         | _         | -         | В   | OB  | 0     | _  | _         | -  | _         |
| 4   | 32  | 1     | 43 | 6D        | 8F        | 09        | С   | 12  | 0     | _  | _         | -  | _         |
| 5   | 0D  | 1     | 36 | 72        | FO        | 1D        | D   | 16  | 1     | 04 | 96        | 34 | 15        |
| 6   | 31  | 0     | _  | -         | -         | -         | E   | 13  | 1     | 83 | 77        | 1B | D3        |
| 7   | 16  | 1     | 11 | C2        | DF        | 03        | F   | 14  | 0     | _  | -         | -  | _         |

### **Address Translation Example #1**

#### Virtual Address: 0x03D4



#### **Physical Address**



### **Address Translation Example #2**

#### Virtual Address: 0x0020



#### **Physical Address**



### **Address Translation Example #3**

#### Virtual Address: 0x0020



#### **Physical Address**



### Agenda

- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Intel Core i7 Memory System**



### **Review of Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- **M** = 2<sup>m</sup>: Number of addresses in physical address space
- P = 2<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number

#### Components of the physical address (PA)

- **PPO**: Physical page offset (same as VPO)
- **PPN:** Physical page number
- CO: Byte offset within cache line
- **CI:** Cache index
- **CT**: Cache tag

### **End-to-end Core i7 Address Translation**



### **Core i7 Level 1-3 Page Table Entries**

| 63 | 62 52  | 51 12                            | 11 9   | 8 | 7  | 6 | 5 | 4  | 3  | 2   | 1   | 0   |
|----|--------|----------------------------------|--------|---|----|---|---|----|----|-----|-----|-----|
| XD | Unused | Page table physical base address | Unused | G | PS |   | Α | CD | ωт | U/S | R/W | P=1 |

Available for OS (page table location on disk)

#### Each entry references a 4K child page table. Significant fields:

**P:** Child page table present in physical memory (1) or not (0).

**R/W:** Read-only or read-write access access permission for all reachable pages.

- **U/S:** user or supervisor (kernel) mode access permission for all reachable pages.
- **WT:** Write-through or write-back cache policy for the child page table.
- A: Reference bit (set by MMU on reads and writes, cleared by software).
- **PS:** Page size either 4 KB or 4 MB (defined for Level 1 PTEs only).
- **Page table physical base address:** 40 most significant bits of physical page table address (forces page tables to be 4KB aligned)
- **XD:** Disable or enable instruction fetches from all pages reachable from this PTE.

#### **Supplementary slide**

**P=0** 

### **Core i7 Level 4 Page Table Entries**

| 63 | 62 52  | 51 12                      | 11 9   | 8 | 7 | 6 | 5 | 4  | 3  | 2   | 1   | 0   |
|----|--------|----------------------------|--------|---|---|---|---|----|----|-----|-----|-----|
| XD | Unused | Page physical base address | Unused | G |   | D | Α | CD | wт | U/S | R/W | P=1 |

Available for OS (page location on disk)

#### Each entry references a 4K child page. Significant fields:

P: Child page is present in memory (1) or not (0)

R/W: Read-only or read-write access permission for child page

- U/S: User or supervisor mode access
- WT: Write-through or write-back cache policy for this page
- A: Reference bit (set by MMU on reads and writes, cleared by software)
- **D:** Dirty bit (set by MMU on writes, cleared by software)
- **Page physical base address:** 40 most significant bits of physical page address (forces pages to be 4KB aligned)

**XD:** Disable or enable instruction fetches from this page.

#### **Supplementary slide**

**P=0** 

### **Core i7 Page Table Translation**



### **Cute Trick for Speeding Up L1 Access**



#### Observation

- Bits that determine CI identical in virtual and physical address
- Can index into cache while address translation taking place
- Generally we hit in TLB, so PPN bits (CT bits) available next
- "Virtually indexed, physically tagged"
- Cache carefully sized to make this possible

### **Virtual Address Space of a Linux Process**



## Linux Organizes VM as Collection of "Areas"



#### Australian National University

### **Linux Page Fault Handling**



### Agenda

- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Memory Mapping**

- VM areas initialized by associating them with disk objects.
  - Process is known as *memory mapping*.
- Area can be *backed by* (i.e., get its initial values from) :
  - Regular file on disk (e.g., an executable object file)
    - Initial page bytes come from a section of a file
  - Anonymous file (e.g., nothing)
    - First fault will allocate a physical page full of 0's (*demand-zero page*)
    - Once the page is written to (*dirtied*), it is like any other page
- Dirty pages are copied back and forth between memory and a special swap file.

### **Virtual Address Space of a Linux Process**



### Linux Organizes VM as Collection of "Areas"



### **Sharing Revisited: Shared Objects**



 Process 1 maps the shared object.

### **Sharing Revisited: Shared Objects**



- Process 2 maps the shared object.
- Notice how the virtual addresses can be different.

# Sharing Revisited: Private Copy-on-write (COW) Objects



- Two processes
   mapping a *private copy-on-write (COW)* object.
- Area flagged as private copy-onwrite
- PTEs in private areas are flagged as read-only

# Sharing Revisited: Private Copy-on-write (COW) Objects



- Instruction writing to private page triggers protection fault.
- Handler creates new R/W page.
- Instruction restarts upon handler return.
- Copying deferred as long as possible!

### **User-Level Memory Mapping**

 Map len bytes starting at offset offset of the file specified by file description fd, preferably at address start

- start: may be 0 for "pick an address"
- prot: PROT\_READ, PROT\_WRITE, ...
- flags: MAP\_ANON, MAP\_PRIVATE, MAP\_SHARED, ...

Return a pointer to start of mapped area (may not be start)

### **User-Level Memory Mapping**



# Example: Using mmap to Copy Files

 Copying a file to stdout without transferring data to user space .

```
#include "csapp.h"
void mmapcopy(int fd, int size)
{
    /* Ptr to memory mapped area */
    char *bufp;
    bufp = Mmap(NULL, size,
                PROT READ,
                MAP PRIVATE,
                fd, 0);
    Write(1, bufp, size);
    return:
}
                        mmapcopy.c
```

```
/* mmapcopy driver */
int main(int argc, char **argv)
{
    struct stat stat;
    int fd;
   /* Check for required cmd line arg */
    if (argc != 2) {
        printf("usage: %s <filename>\n",
               argv[0]);
        exit(0):
    }
   /* Copy input file to stdout */
    fd = Open(argv[1], 0 RDONLY, 0);
    Fstat(fd, &stat);
    mmapcopy(fd, stat.st_size);
   exit(0);
                              mmapcopy.c
```