reveal.js

# Pipelining

---

CS 130 // 2025-11-10

## Preparing for C programming

Installing what you need on your own computer</br>
[install guide](../../resources/installing-visual-studio-code)

Set up an account on GitHub and use Codespaces:</br>  
https://github.com/codespaces

## Suggestions on using the RAM unit in your assignment

* Use 32 bit data width

* Use 16 bit address width, but only use some of the bits coming out of the ALU

</div>

# Review

```mips 
beq $8, $9, 5 #jump ahead 5 instructions
```

000100 01000 01001 0000000000000101

<div>

</div>
<div>

ALUop: "sub"</br>
ALUSrc: 0</br>
Branch: 1</br>
MemRead: 0</br>
MemWrite: 0</br>
MemtoReg: x</br>
RegWrite: 0</br>
RegDst: x

</div>
</div>

# Performance Issues

## Performance Issues
- Longest delay determines clock period
- 
 Some stages of the datapath are idle waiting for others to finish
- 
 Can improve performance by **pipelining**

# Pipelining

#### Breaking down instruction execution

- Five stages:
    1. **IF**: Instruction Fetch 
     - read it from instruction memory
    2. **ID**: Instruction Decode and Register Read
     - split instruction into parts, read register data
    3. **EX**: Execute
     - ALU calculates result 
    4. **MEM**: Memory Access
     - read from or write to memory
    5. **WB**: Write back
     - put new data back into a register

## Pipeline Analogy
- Suppose you need to do four loads of laundry
- 
 Each load of laundry needs to be
    1. Washed via the washing machine
    2. Dried via the dryer
    3. Folded
    4. Put away in the closet
- 
 For simplicity, assume that each task takes 30 mins

## Pipeline Analogy
- How long does it take to complete four loads?
- 
 One approach uses only one stage at a time and does nothing in parallel:
    ![nonpipelined_laundry](/Fall2025/CS130/assets/images/COD/unpipelined_laundry.png)
- 
 Notice that the washer is unused 3/4 of the time

## Pipeline Analogy
- Another approach is harnessing parallelism by running independent stages simultaneously
    ![pipelined_laundry](/Fall2025/CS130/assets/images/COD/pipelined_laundry.png)
- 
 How much of a speedup does this approach give us?
    + 
 $8/3.5 = 2.3\times$ speedup
    + 
 $2n/0.5n = 4\times$ speedup if running continuously

## Pipelined Datapath

## Pipelined Datapath
- Five stages:
    1. **IF**: Instruction Fetch
    2. **ID**: Instruction Decode
    3. **EX**: Execute
    4. **MEM**: Memory access
    5. **WB**: Write back

## Pipeline Performance
- Assume time for stages is:
    + `$100\text{ps}$` for register read/write
    + `$200\text{ps}$` for other stages

![Pipeline Performance](/Fall2025/CS130/assets/images/COD/pipeline_performance_table.png)

## Without a Pipeline
![Pipeline Performance](/Fall2025/CS130/assets/images/COD/nonpipelined_mips_instructions.png)

- Why must the clock be set to `$800\text{ps}$` when some instructions like `beq` could be completed in `$500\text{ps}$`?
    + 
 Clock speed is limited by **slowest** instruction: `lw`

## With a Pipeline
![Pipeline Performance](/Fall2025/CS130/assets/images/COD/pipelined_mips_instructions.png)

- 
 How much of a speedup does this approach give us?
    + 
 `$2400/1400 = 1.7\times$` speedup
    + 
 `$800n/200n = 4\times$` if running continuously

## Pipeline Performance
- Does using a pipeline increase the efficiency of executing **individual** instructions?
    + 
 No, it slows them down from `$800\text{ps}$` to `$1000\text{ps}$`
    + 
 Performance benefits come from increased **throughput** due to the parallelism

## Why MIPS is Good for Pipelining
- All MIPS instructions are the **same length**
    + Easy to fetch instruction in cycle 1
    + Easy to decode instruction in cycle 2
- 
 MIPS has only **a few instruction formats**
    + Registers will always be in same location
    + Easy to decode instructions

# Hazards

## Hazards
- Up until now, we have pretended that each instruction is **independent** of the others and that there are no conflicts
- 
 In reality, instructions often depends on previous ones, which may cause naive pipelining to fail

### Exercise

```mips
add $s0, $t0, $t1
sub $t2, $s0, $t3
```
- What stage does `add` write the result of `$s0` into the register file?

- What stage does `sub` read from `$s0`?

- Why is this a problem?

- Can you think of any ways to fix this?

## Types of Hazards
- Situations that prevent immediately executing the next instruction in the pipeline are called **hazards**
- 
 Three types of hazards:
    1. Structure hazards
    2. Data hazards
    3. Control hazards

# Structure Hazards

## Structure Hazards
- In our laundry example, we were assuming the "fold" and "put away" stages could be done simultaneously
    + However, if you are working by yourself, you would have to do them sequentially
- 
 A **structure hazard** occurs when a required resource is busy performing another task

## Structure Hazards
- Recall that our MIPS pipeline has five stages:
    + IF, ID, EX, MEM, WB
- 
 The ID and WB stages read and write to the register file simultaneously
    + Does this create a structure hazard?
    + 
 No, by design, the register file supports simultaneous reading and writing

## Structure Hazards
- Suppose that we stored instruction memory and data memory in the same location
    + Would this create a structure hazard?
    + 
 Yes, the IF and the MEM stages would need to simultaneously read from the same memory
    + 
 If using a single memory, the pipeline would need to **stall** to wait for the resource to become available

# Data Hazards

## Data Hazards
- A **data hazard** occurs when one instruction depends on the result of a previous instruction
- 
 Our example:
    ```mips
    add $s0, $t0, $t1
    sub $t2, $s0, $t3
    ```
- 
 What stage does `add` write the result of `$s0` into the register file?
    + 
 Stage 5: Write Back
- 
 What stage does `sub` read from `$s0`?
    + 
 Stage 2: Instruction Decode

## Data Hazards
- Would need to stall for two clock cycles in order to wait for the value $s0 to be available for reading

![Data Hazard Stall](/Fall2025/CS130/assets/images/COD/data_hazard_stall.png)

## Forwarding (aka Bypassing)
- One way to avoid some data hazards without stalling is to **forward** the result to the next instruction immediately when it is available

![Data Hazard Forwarding](/Fall2025/CS130/assets/images/COD/data_hazard_forwarding.png)

### Discuss: What does this look like on the datapath?

<div>

![Data Hazard Forwarding](/Fall2025/CS130/assets/images/COD/data_hazard_forwarding.png)

</div>
<div>

</div>
</div>

## Forwarding (aka Bypassing)
- Can you think of a situation where forwarding cannot resolve a data hazard?
    ```mips
    lw $s0, 20($t1)
    sub $t2, $s0, $t3
    ```
- 
 What stage does `lw` produce the bits of `$s0`?
    + 
 After Stage 4: MEM

## Forwarding (aka Bypassing)

- Why does this create an unavoidable stall?
    ![Data Hazard Load Stall](/Fall2025/CS130/assets/images/COD/data_hazard_load.png)

- 
 We cannot send data **backwards in time**

## Data Hazard Exercise
- Consider the following MIPS code:
    ```mips
    lw $t0, 40($a3)
    add $t6, $t0, $t2
    sw $t6, 40($a3)
    ```
- 
 Assuming there is no forwarding implemented, are any stalls necessary?
- 
 How many clock cycles are required to execute these three lines of code without forwarding?

## Data Hazard Exercise
- Consider the following MIPS code:
    ```mips
    lw $t0, 40($a3)
    add $t6, $t0, $t2
    sw $t6, 40($a3)
    ```
- Assuming there IS forwarding implemented, are any stalls necessary?
- 
 How many clock cycles are required to execute these three lines of code with forwarding?

## Rearranging Instructions
- Another way to avoid data hazards is by rearranging instructions
- Consider the following MIPS code:
    ```mips
    lw  $t1, 0($t0)
    lw  $t2, 4($t0)
    add $t3, $t1, $t2
    sw  $t3, 12($t0)
    lw  $t4, 8($t0)
    add $t5, $t1, $t4
    sw  $t5, 16($t0)
    ```
- 
 Identify any stalls that are necessary even with forwarding and hazard detection active

## Rearranging Instructions
- These two stalls could be avoided by rearranging the code in the following way:
    ```mips
    lw  $t1, 0($t0)
    lw  $t2, 4($t0)
    lw  $t4, 8($t0)
    add $t3, $t1, $t2
    sw  $t3, 12($t0)
    add $t5, $t1, $t4
    sw  $t5, 16($t0)
    ```

# Control Hazards

## Control Hazards
- A **control hazard** (aka branching hazard) is when the next instruction to be executed is not yet known
- 
 Caused by **branching** instructions such as `beq`
- 
 During a `beq` instruction, at what pipeline stage do we know which branch will be taken?
    + 
 After Stage 3: EX

## Control Hazards
- One way to avoid control hazards is by stalling
    ![Stall on Branch](/Fall2025/CS130/assets/images/COD/stall_on_branch.png)
- 
 After every branch statement, we stall for one cycle

## Control Hazards
- The pros of the "always stall" approach are:
    1. 
 Simple and easy to understand
    2. 
 Will always work
- 
 The con of "always stall" is:
    + It is slow

## Control Hazards
- An alternative to the always stall approach is **branch prediction**
    + Make an educated guess on what the next instruction will be and execute that
    + 
 If incorrectly guessed, "undo" the steps and go to the correct branch

## Control Hazards
- **Static branch prediction** will always predict a certain branch depending on the branching behavior
    + 
 Predict forward branches not taken
    + 
 Predict backward branches taken (loops)
- 
 **Dynamic branch prediction** keeps track of how many times a branch is taken and updates its predictions based on history

# Pipelined Datapath Design

## Pipelined Datapath Design
<img src='/Fall2025/CS130/assets/images/COD/piplined_datapath.png' height='550'>

## Using Registers (Has Bug)
<img src='/Fall2025/CS130/assets/images/COD/pipelined_datapath_with_registers.png' height='500'>

## Using Registers (Bug Fixed)

# Pipelined Control

## Pipelined Control Simplified

![Pipeline Control Simplified](/Fall2025/CS130/assets/images/COD/pipelined_control_simplified.png)

## Pipelined Control Registers
- Control signals derived from instruction and passed through the relevant registers

![Pipeline Control Registers](/Fall2025/CS130/assets/images/COD/pipelined_control_registers.png)

## Pipelined Control Complete