reveal.js

# Instruction Execution

---

CS 130 // 2025-11-05

## Assignment 6 Reminder

- [Assignment 6](../../assignments/assignment-6/): do these too 
    - `and`, `or`, `xor`, `andi`, `ori`, `xori`, `slt`, `slti`
    - Extra Credit: do `lw`, `sw`, `beq`, and/or `bne`
    - Or... design another instruction set and archtecture of your choosing

# Datapath and Control Review

## Example: `addi`

```mips 
addi $8, $0, 5
```
001000 00000 01000 0000000000000101

<div>

</div>
<div>

Using this picture, highlight all datapath lines that are used and indicate what values they should have

Show the value each control line should have

</div>
</div>

```mips 
addi $8, $0, 5
```
001000 00000 01000 0000000000000101

<div>

</div>
<div>

ALUop:</br>
ALUSrc:</br>
Branch:</br>
MemRead:</br>
MemWrite:</br>
MemtoReg:</br>
RegWrite:</br>
RegDst:

</div>
</div>

# Supporting More Instructions: Load

```mips 
lw $8, 4($9)
```

100011 01001 01000 0000000000000100

<div>

</div>
<div>

ALUop:</br>
ALUSrc:</br>
Branch:</br>
MemRead:</br>
MemWrite:</br>
MemtoReg:</br>
RegWrite:</br>
RegDst:

</div>
</div>

# Exercises: Store and Branch

```mips 
sw $8, 4($9)
```

101011 01001 01000 0000000000000100

<div>

</div>
<div>

ALUop:</br>
ALUSrc:</br>
Branch:</br>
MemRead:</br>
MemWrite:</br>
MemtoReg:</br>
RegWrite:</br>
RegDst:

</div>
</div>

```mips 
beq $8, $9, 5 #jump ahead 5 instructions
```

000100 01000 01001 0000000000000101

<div>

</div>
<div>

ALUop:</br>
ALUSrc:</br>
Branch:</br>
MemRead:</br>
MemWrite:</br>
MemtoReg:</br>
RegWrite:</br>
RegDst:

</div>
</div>

# Performance Issues

## Performance Issues
- Longest delay determines clock period
- 
 Some stages of the datapath are idle waiting for others to finish
- 
 Can improve performance by **pipelining**

# Pipelining

#### Breaking down instruction execution

- Five stages:
    1. **IF**: Instruction Fetch 
     - read it from instruction memory
    2. **ID**: Instruction Decode and Register Read
     - split instruction into parts, read register data
    3. **EX**: Execute
     - ALU calculates result 
    4. **MEM**: Memory Access
     - read from or write to memory
    5. **WB**: Write back
     - put new data back into a register

## Pipeline Analogy
- Suppose you need to do four loads of laundry
- Each load of laundry needs to be
    1. Washed via the washing machine
    2. Dried via the dryer
    3. Folded
    4. Put away in the closet
- For simplicity, assume that each task takes 30 mins

## Pipeline Analogy
- How long does it take to complete four loads?
- 
 One approach uses only one stage at a time and does nothing in parallel:
    ![nonpipelined_laundry](/Fall2025/CS130/assets/images/COD/unpipelined_laundry.png)
- 
 Notice that the washer is unused 3/4 of the time

## Pipeline Analogy
- Another approach is harnessing parallelism by running independent stages simultaneously
    ![pipelined_laundry](/Fall2025/CS130/assets/images/COD/pipelined_laundry.png)
- 
 How much of a speedup does this approach give us?
    + 
 $8/3.5 = 2.3\times$ speedup
    + 
 $2n/0.5n = 4\times$ speedup if running continuously

## Pipelined Datapath

## Pipelined Datapath
- Five stages:
    1. **IF**: Instruction Fetch
    2. **ID**: Instruction Decode
    3. **EX**: Execute
    4. **MEM**: Memory access
    5. **WB**: Write back

## Pipeline Performance
- Assume time for stages is:
    + `$100\text{ps}$` for register read/write
    + `$200\text{ps}$` for other stages

![Pipeline Performance](/Fall2025/CS130/assets/images/COD/pipeline_performance_table.png)

## Without a Pipeline
![Pipeline Performance](/Fall2025/CS130/assets/images/COD/nonpipelined_mips_instructions.png)

- Why must the clock be set to `$800\text{ps}$` when some instructions like `beq` could be completed in `$500\text{ps}$`?
    + 
 Clock speed is limited by **slowest** instruction: `lw`

## With a Pipeline
![Pipeline Performance](/Fall2025/CS130/assets/images/COD/pipelined_mips_instructions.png)

- 
 How much of a speedup does this approach give us?
    + 
 `$2400/1400 = 1.7\times$` speedup
    + 
 `$800n/200n = 4\times$` if running continuously

## Pipeline Performance
- Does using a pipeline increase the efficiency of executing **individual** instructions?
    + 
 No, it slows them down from `$800\text{ps}$` to `$1000\text{ps}$`
    + 
 Performance benefits come from increased **throughput** due to the parallelism

## Why MIPS is Good for Pipelining
- All MIPS instructions are the **same length**
    + Easy to fetch instruction in cycle 1
    + Easy to decode instruction in cycle 2
- 
 MIPS has only **a few instruction formats**
    + Registers will always be in same location
    + Easy to decode instructions