QuAC Extensions

Given here (in roughly increasing order of difficulty) is a list of potential extensions you may want to consider for QuAC. Note that any extensions must be backwards compatible with the base ISA (you can’t redefine instructions that are already defined, for example). You may implement several extensions if you like (note that some extensions will be incompatible with others; whereas some extensions will require others to be completed first). Not all the extensions merely add extra instructions to the computer, but some fundamentally change architectural assumptions about the current design.

We are seeking quality over quantity, especially with respect to the design document. It is far preferable to have one extension that is implemented cleanly and documented well, than many rushed extensions that are hard to follow.

The following extensions have been grouped into a rough idea of grade ranges based on the difficulty of the extension and assuming a good quality report and circuit design.
However, these are just provided as a guide and as mentioned, other aspects such as the quality of the report, quality of the circuits, etc. also factor in to the grade. This means that just because you have successfully implemented one of the extensions in grade range X, does not mean that you are guaranteed a grade in range X.

Pass (50 - 59%)#

ALU Instructions#

Add additional ALU operations that are not included in the base ISA. Some ideas for new operations include:

orr Logical OR
not Invert all bits of the register
inc/dec Increment/decrement
lsl/lsr Logical shift left/right by 1 bit
asr Arithmetic shift right by 1 bit
swap Swap the high and low bytes of a register

Also consider how your new instruction(s) might (or might not) affect the flags register.

Sign Extension Instructions#

The movl instruction sets the low byte to a value, and clears the high byte to zero. This is great normally, but makes setting a register to negative value a little annoying.

A movn (‘move negative’) instruction could be added that moves an 8 bit immediate to the low byte, and sign extends the high byte, i.e. sets bits 15-8 to the contents of bit 7 (we use zero-based indexing, so the least significant bit is bit 0). Another idea is to have a sext instruction which sign extends the low byte of a register (overwriting what was in the high byte).

See below for some examples:

movl r1, -4          @ r1 has the value `0x00FC`
sext r1              @ r1 now has the value `0xFFFC`, i.e. -4

movn r2, 4           @ r2 has the value `0x0004`
movn r3, -4          @ r3 has the value `0xFFFC`, i.e. -4

Credit (60 - 69%)#

ALU Immediates#

It’s frustrating to have to use up an extra register increment a different register. Allowing operations with immediates would allow add r2, 1 directly, without having to first do movl r1, 1 or similar.

Implementing this would allow us perform ALU operations with a register and constant (e.g. and r1, 0x3F, sub r2, 15), and also allow for PC relative jumps (i.e. add pc, 15).

This extension would also synergise well with base+offset addressing, e.g. something like ldr/str rd, [rb, 1] would make it easy to access different parts of a data structure stored at the address in rb.

Memory Addressing Immediates#

Similarly to ALU immediates, could add the ability to store or load directly from a specified memory location without first needing to load the address into another register.

e.g. ldr rd, 0x12 instead of needing to do movl r1, 0x12; ldr rd, r1

This means you could easily load constants and variables into registers without first needing to load their address. This is especially useful given the limited number of general purpose registers available.

Conditional Execution#

As written, the ISA only conditionally executes on the Zero (Z) flag. You could extend this functionality such that the processor can execute on any arbitrary condition, such as the negative flag, or if one register was greater than another register, and so on.

For example, how could you extend the ISA to make it easy to execute the following code?

if x > y:
    z = x
else:
    z = y

Base+Offset Addressing#

The other register slot in the encoding for load and store is unused. This could be extended to use a register as the base address, and either an immediate or another register as the offset address, allowing instructions like ldr/str rd, [ra,rb] which would have the effect of rd := [ra+rb] or [ra+rb] := rd as appropriate.

Distinction (70 - 79%)#

Carry Lookahead Adder#

Add the capability of a carry lookahead adder into the ALU. This should be implemented as a custom component that replaces the ordinary adder in your ALU. Justify the speed up in exchange for power/efficiency in your report, which should include a critical path analysis comparing to the ripple carry adder.

Bit Shifts#

Add instructions for shifting a register by a variable number of bits, determined by another register (e.g. lsl r1, r2). If you’ve done ALU immediates, you may also want to allow shifts by immediates (e.g. lsl r1, 7).

There are many kinds of shifts that could be done (e.g. logical vs arithmetic, shift vs. rotate, shifting in zeroes vs. shifting in the carry bit, etc.).

Restricts the use of:

Barrel Shifter

Pre/Post Offset Memory Operations#

When loading out values from an array of data, it’s often useful to have the ability to load from an address, and then increment that address afterwards to move to the next piece of data. This can be done with a post-offset memory access:

ldr/str rd, [ra], rb

which is equivalent to

ldr/str rd, [ra]
add ra, rb

It is also useful to reverse the order of operations and update the value of the address first and then store/load to memory. This is a pre-offset memory access.

ldr/str rd, [ra, rb]!

which is equivalent to

add ra, rb
ldr/str rd, [ra]

The difference here is that a pre/post offset takes only a single instruction cycle, whereas the two instructions it represents take two, which speeds up execution of code.

To complete this extension, you should first implement a dual-ported register file. You will need to change your register file such that it has two inputs, allowing two registers to be written to in the same cycle. If the same register is chosen for both destinations, then one takes precedence.

If you have completed this extension, and immediates, then you are well on the way to implementing Function Calls and the Stack!

Expanded Register File#

The ISA as written only has 4 general purpose registers. This is fairly restrictive, and quite small as far as most CPUs go. Using reserved bits, could you add additional registers without fundamentally changing the instruction format to maintain backwards compatibility?

(Note: Only adding 1 register would not be sufficient for D range, on the other hand, significantly expanded register options could move to HD range).

Some ideas/hints:

restricting destination registers to gain more bits for source registers (or vice versa)
add register shuffling for extra registers that may not be able to be used in all instructions

Another approach is to have different banks (sets) of registers, and add instruction to switch the current bank. e.g.:

movl r1, 0x12       @ r1 contains 0x12
usebank 2           @ r1's contents are undefined
movl r1, 0x34       @ r1 contains 0x34
usebank 1           @ r1 contains 0x12 again
usebank 2           @ r1 contains 0x34 again

The real-life Z80 CPU had a similar system of shadow registers.

Accessing More RAM#

Modern computers have far more than 64K words of RAM.

One way of allowing your CPU to access more RAM would be to introduce segment registers, whose value gets left shifted by some constant (e.g. by 16 bits) and then implicitly added to all memory accesses to form a 24 or 32 bit address.

Here’s an example:

movl rseg, 0
movl r1, 0x3456
ldr r2, r1            @ this accesses memory address 0x003456

movl rseg, 0x12
ldr r2, r1            @ this now accesses memory address 0x123456

Special care will be needed when changing segment registers, as it will change which instruction the CPU executes next! Some ways of fixing this might be:

have instruction fetches ignore the segment register (i.e. code must be located within the first 64K words)
have a bit in the instruction to determine whether a segment register is used or not
have a seperate segment registers for instruction fetching than for str/ldr
the segment register is only used on certain addresses (e.g. only those with bit 15 set)

If you implement the harder wider registers extension then accessing more RAM would be trivial.

Single Instruction, Multiple Data (SIMD)#

SIMD refers to instructions that can perform the same operation (e.g. addition, multiplication, logical/memory operations) on multiple data points simultaneously. This provides a form of parallelism, where multiple computations can be made in a single cycle without needing multiple CPUs. SIMD instructions can be thought of as operations that act on vectors, modifying each component of a vector simultaneously. In the real world, SIMD is used all the time to speed up lengthy mathematical operations (especially linear algebra computations, thanks to the similarity to vectors).

There are various ways to implement SIMD, although typically it means adding a new set of instructions to perform vectorised operations. These instructions usually operate on single registers by partitioning them into a number of smaller sections and performing its operation on each section individually (as if it was a vector consisting of smaller registers). For example, a 32-bit register might be treated as four 8-bit registers by a SIMD instruction, where each smaller 8-bit register is operated on at the same time.

The SIMD operations could use the existing 16-bit registers as their vectors, but this is a fairly restrictive size for the registers - it may be worth thinking about using wider registers for carrying out SIMD operations.

There are a variety of ways to implement this extension, and depending on the complexity of your implementation, this extension could definitely be worth a HD.

High Distinction (80% - 100%)#

Some of the extensions listed from this point will cause the part-1 tests to fail (which is okay). Ensure that you are leaving your standard CPU in basic-cpu.dig and are completing your extension in extended-cpu.dig.

Multiplication#

Add a multiplication instruction to the ALU (do not use the built in component for this). Doing some long multiplication (by hand!) of various numbers in base 2 can be a useful starting point for how such a circuit can be designed.

Some things to consider:

Will your multiply circuit handle signed (negative) numbers?
Multiplying two 16 bit values gives a 32 bit result. Are you going to truncate the result, or write the high 16 bits to another register?
Will it execute in one cycle, or take multiple cycles? If it were implemented on a real CPU, how might this affect the clock speed?

A basic, naive multiplier is unlikely to be enough for an HD - but a design taking some of these points into account would move it into the HD range.

Restricts the use of:

Multiply

Division#

Similarly, a division instruction could be added (do not use the built in component for this). Division is even more complex than multiplication, and it has many of the same things to consider.

Other things to consider are what you do with the remainder, and divison by zero is handled. (Will it produce unreliable results? Will it lock up? Will it set a flag to indicate an error?)

Again, doing short/long division in binary can be a useful starting point for thinking about how to design this.

Restricts the use of:

Divide

Hardware Acceleration#

In a similar vein to multipliers and dividers, hardware accelerators are separate dedicated processing units that use hardware to perform a complicated computation in a single cycle, whereas it would traditionally take many simpler instructions to perform. This doesn’t mean doing an entire algorithm in one cycle, but it does mean speeding up a collection of computations in an algorithm that would normally take many more cycles to perform. This is a very open-ended extension - the difficulty of this task will vary depending on what you decide to implement, and your mark/grade will reflect that. Some suggestions are:

A Fibonacci calculator that can calculate the next (or even an arbitrary) element in the Fibonacci sequence in one cycle
An encryption unit that can perform a simple encryption operation in a single cycle

However, feel free to think about what other kinds of computations might be interesting and/or useful to accelerate using hardware!

Function Calls and the Stack#

Requires: Pre/Post Offset Memory Operations

Functions are a very common feature of programming languages, allowing you to write a piece of code within a function and then use it over and over again by calling the function instead of copypasting the code everywhere. This decreases the size of programs greatly.

To implement function calls in assembly, you need to jump to where the function’s code starts upon a call, and also have a mechanism that allows jumping back to where you were in the main code after the function is done. For nested function calls, you will need to keep track of multiple addresses to jump back to.

This is where the “stack” data structure facilitated by most modern ISAs is useful. This is a dynamically sized data structure in memory that acts as an array that grows downwards as you “push” (append) and “pop” (remove) the elements. The ISA typically defines a fixed base address that the stack starts at and a “stack pointer” register that keeps track of the address where the stack ends, i.e. of the last element of the stack (typical mmemonic is sp).

Pre/post offset memory operations are very useful for implementing the stack, as performing a push or pop instruction you need to both update the value of the stack pointer and store or load (for push and pop respectively) in the same instruction.

Your main task for this extension is to add the stack and stack pointer register sp. Once a stack has been added, function calls are now also possible and can be implemented with composite instructions. call¹ acts like a jump (overwriting the pc) that also stores pc+1 onto the stack as a return address. Another instruction ret¹ to return from functions would then just pop the return address into the pc register. You can implement these as psuedo-instructions in quac.json or as a hardware instruction that does the two steps at once.

push and pop, for pushing and popping data to/from the stack can also be defined as pseudo-instructions. Note that while this section has mostly described storing return addresses on the stack it can be used to store other data in a first-in-first out order as well.

Textbook Chapter: 6

Wider Registers#

Extend the ALU, and some registers from 16 bits wide to 32 bits wide. The extension needs to remain backwards compatible with the original CPU (e.g. by default, a register with 0xFFFF in it must wrap around to zero when incremented).

Some potential ways of doing this:

Introduce new registers which are able to store 32 bit results. The other registers will remain at 16 bit (and therefore will truncate larger values that are passed to them).
Make all registers 32 bits wide, but introduce a new flag in the fl register. When the flag is clear, results are truncated to 16 bits before being placed in a register. If the flag is set, then the registers will store the full 32 bits.
Use one of the unused bits in the instruction encoding to determine if the register(s) will use all 32 bits, or just 16 bits.

You could also extend memory addressing to 32 bits, to allow the CPU to access 2^32 words (8GB).

I/O#

The computer doesn’t have anything to talk to. You could add some built in components in Digital (a keyboard, a screen) and get the CPU to talk to those devices in a sensible fashion.

This can be done by adding dedicated I/O instructions, or adding memory mappings, which means mapping some addresses in memory to state registers for your I/O devices so that writing to those addresses will send that data to the I/O device. For instance, say memory address 0x4000 corresponds to the OFF/ON state of an LED, so writing a 1 to that address will turn on the LED. Doing memory mapped I/O requires adding an extra logic unit that chooses whether to send data to RAM or to your I/O devices depending on the address, which is called an I/O bridge.

You will learn more about memory-mapped I/O in the microbit part of the course!

Simply displaying some part of the CPU state (e.g. the program counter, or a register) via a segmented display is not sufficient for this extension.

Digital has some built in peripheral components you could use (under Components > IO > Peripherals). With these components, you are some ideas you could try:

Reading keyboard input from the user
Outputing audio via MIDI
Displaying text via a terminal
Displaying text via VGA (very challenging)
Remote communication using telnet (very challenging)

Multi-Cycle Execution#

Up to this point the labs and the course have implicitly been leading you on a path to a single-cycle computer, one where each instruction is executed in a single clock cycle. This is unrealistic, as most real CPUs can often take a varying number of clock cycles depending on complexity. This allows splitting up the amount of logic that has to occur within a single clock cycle for long instructions and keep the clock period small, and hence single-cycle instructions will occur quickly.

With multi-cycle execution you could have instructions that take a 16-bit immediate, which is useful for jumping to a 16 bit absolute address.

Textbook Chapter: 7.4

Intermediate Registers#

Requires: Multi-Cycle Execution

Many complex instructions may take more than one cycle to execute. You could add intermediate registers that are only accessible by multi-cycle instructions to use as temporary scratch space without clobbering other registers. For example, call rd may be implemented as

add tmp, pc, 1
push tmp
mov pc, rd

Extreme Ideas#

We caution against the following extensions, and recommend them for only the bravest, most adventurous students. Many of these require a fundamental redesign of the entire computer, and/or require research into topics we don’t cover. These extensions are not required for a HD mark, but are only included here for those that are interested.

Stack Instructions#

Requires: Function Calls and the Stack

You could define instructions that take zero operands, and operate directly onto the data on the stack e.g. addst or subst could pop the first two items off the stack, add/subtract them, and push the result back to the stack. This could be implemented with either the understanding that general purpose registers may be clobbered in the process, or by adding extra temporary registers that only stack instructions may use.

Interrupts#

Requires: Pre/Post Offset Memory Operations, I/O

The CPU can respond to urgent tasks (handling a keyboard press) by interrupting the current task it is working on, save its state somewhere, handle the urgent task, and then return to what it was doing afterwards. This may require introducing non-synchronous memory units.

Complex Hardware Instructions#

Add instructions that allow the CPU to calculate square roots, exponents, trig functions, etc, in hardware rather than writing a software algorithm to compute it.

Floating Point Unit#

Implement a seperate ALU able to perform floating point arithmetic (called an FPU), and implement instructions for converting between values in integer format, and those in floating point format.

Either create new instructions for the FPU, or introduce new floating point registers that use the FPU instead of the ALU when operated on.

Pipelining#

Requires: Multi-Cycle Execution

As the computer currently stands, if we were to build it in hardware, it might be quite slow, as the computer waits until the entire instruction has been executed before trying to fetch the next one. A trick to speeding up a computer is called pipelining, where while the computer is executing the current instruction, it is also busy fetching the next instruction simultaneously. This can induce more problems, called hazards where executing a jump can mean the instruction fetched is the wrong one, or executing instructions that depend on the result from the previous instruction.

Textbook Chapter: 7.5

Multi-Core#

The entire CPU can be duplicated, with two machines simultaneously reading and writing to the same memory. Semaphores are implemented to prevent memory from being clobbered by simultaneous writes. Likely requires knowledge from COMP2310 to complete.

Dual-Issue CPU#

Another way to speed up CPU execution is via a 2-issue CPU, which always executes the next two instructions in an instruction sequence simultaneously, rather than executing one after another. This is different to multi-core, as a single CPU would be performing both these instructions in parallel. There will be many edge cases to consider (for example, what happens when two simultaneous str instructions try to store data at the same memory location?) that need to be handled either in hardware or in software (by mandating a particular policy as to how to order instructions within a program).

You may call these instructions something else if you wish. ↩ ↩²