reveal.js

# Pipeline Hazards And Cache

---

CS 130 // 2025-11-12

# Announcements

## Preparing for C programming

Installing what you need on your own computer</br>
[install guide](../../resources/installing-visual-studio-code)

Set up an account on GitHub and use Codespaces:</br>  
https://github.com/codespaces

## Preparing for Exam 3

* In class on Wednesday, November 19th
* You may prepare and use one piece of 8.5"x11" paper with *your* handwritten notes
* Check the review exercises posted on Blackboard

# Review

## Review Discussion

- what is pipelining?
- what is a pipeline hazard?
- what are the different types of pipeline hazards?
- what are some ways to deal with pipeline hazards?

## Review: Data Hazards
- A **data hazard** occurs when one instruction depends on the result of a previous instruction
- 
 Our example:
    ```mips
    add $s0, $t0, $t1
    sub $t2, $s0, $t3
    ```
- 
 What stage does `add` write the result of `$s0` into the register file?
    + 
 Stage 5: Write Back, *happens at time 5*
- 
 What stage does `sub` read from `$s0`?
    + 
 Stage 2: Instruction Decode, *happens at time 3*

## Data Hazards
- Would need to stall for two clock cycles in order to wait for the value $s0 to be available for reading

![Data Hazard Stall](/Fall2025/CS130/assets/images/COD/data_hazard_stall.png)

## Forwarding (aka Bypassing)
- One way to avoid some data hazards without stalling is to **forward** the result to the next instruction immediately when it is available

![Data Hazard Forwarding](/Fall2025/CS130/assets/images/COD/data_hazard_forwarding.png)

## Forwarding (aka Bypassing)
- Forwarding cannot always resolve a data hazard
    ```mips
    lw $s0, 20($t1)
    sub $t2, $s0, $t3
    ```
- What stage does `lw` produce the bits of `$s0`?
    + 
 After Stage 4: MEM

## Forwarding (aka Bypassing)

- Why does this create an unavoidable stall?
    ![Data Hazard Load Stall](/Fall2025/CS130/assets/images/COD/data_hazard_load.png)

- 
 We cannot send data **backwards in time**

# More on Pipeline Hazards

## Data Hazard Exercise

* Consider the following MIPS code:
```mips
lw $t0, 40($a3)
add $t6, $t0, $t2
sw $t6, 40($a3)
```

* Draw timing diagrams showing 
    * how many clock cycles are required when not doing forwarding
    * how many clock cycles are required with forwarding - show the forwarding lines

## Rearranging Instructions
- Another way to avoid data hazards is by rearranging instructions
- Consider the following MIPS code:
    ```mips
    lw  $t1, 0($t0)
    lw  $t2, 4($t0)
    add $t3, $t1, $t2
    sw  $t3, 12($t0)
    lw  $t4, 8($t0)
    add $t5, $t1, $t4
    sw  $t5, 16($t0)
    ```
- 
 We need to stall the first add (because it needs `$t2`)
- 
 We need to stall the second add (because it needs `$t4`)

## Rearranging Instructions
- These two stalls could be avoided by rearranging the code in the following way:
    ```mips
    lw  $t1, 0($t0)
    lw  $t2, 4($t0)
    lw  $t4, 8($t0)
    add $t3, $t1, $t2
    sw  $t3, 12($t0)
    add $t5, $t1, $t4
    sw  $t5, 16($t0)
    ```

# Control Hazards

## Control Hazards
- A **control hazard** (aka branching hazard) is when the next instruction to be executed is not yet known
- 
 Caused by **branching** instructions such as `beq`
- 
 During a `beq` instruction, at what pipeline stage do we know which branch will be taken?
    + 
 After Stage 3: EX

## Control Hazards
- One way to avoid control hazards is by stalling
    ![Stall on Branch](/Fall2025/CS130/assets/images/COD/stall_on_branch.png)
- 
 After every branch statement, we stall for one cycle

## Control Hazards
- The pros of the "always stall" approach are:
    1. 
 Simple and easy to understand
    2. 
 Will always work
- 
 The con of "always stall" is:
    + It is slow

## Control Hazards
- An alternative to the always stall approach is **branch prediction**
    + Make an educated guess on what the next instruction will be and execute that
    + 
 If incorrectly guessed, "undo" the steps and go to the correct branch

## Control Hazards
- **Static branch prediction** will always predict a certain branch depending on the branching behavior
    + 
 Predict forward branches not taken
    + 
 Predict backward branches taken (loops)
- 
 **Dynamic branch prediction** keeps track of how many times a branch is taken and updates its predictions based on history

# Pipelined CPU Design

## Pipelined Control Complete

### Pipelined Architecture with Hazard Detection and Forwarding

# Caches

## Memory Organization
- When a program uses memory, it tends to use it in *predictable ways* 
- 
 As a result, it is possible to speed up memory usage dramatically by creating a **memory hierarchy**
    + 
 Register file is small but it's ridiculously fast
    + 
 SRAM is larger but slower
    + 
 DRAM is larger still but even slower
    + 
 Hard disks are HUGE but also the slowest

## Terminology
- **Cache:**
    + An auxiliary memory from which high-speed retrieval is possible
- 
 **Block:**
    + A minimum unit of information that can either be present or not present in the cache
- 
 **Hit:**
    + CPU finds what it is looking for in the cache
- 
 **Miss:**
    + CPU doesn't find what it's looking for in the cache

## Designing a Cache
- Having a hierarchy of memories to speed up our computations is great, but we also face several design challenges
- 
 If we are looking for a value in memory address `x`, how do we know if it is already in the cache?

## Direct  Mapped Cache
- One idea is to use a **direct mapped cache** where the address `$x_n$` tells us where to look in the cache
- 
 If there are `n` slots in the cache, then we look for `x` in the `x % n` index of the cache

## Direct Mapped Cache

![Direct Mapped Cache](/Fall2025/CS130/assets/images/COD/direct_mapped_cache.png)

## Direct Mapped Cache
- If the number of blocks in the cache is `$n = 2^k$`, then `(x % n)` is simply the last `k` bits of `x`
    + Makes it extremely efficient to find where a block is in the cache
- 
 Since multiple blocks may have the same index in the cache, how do we know if the block of memory there is the one we're looking for?
    + 
 Include a **tag** that uniquely identifies the block

## Direct Mapped Cache
- Some parts of the cache can be empty and/or underutilized if their index, by chance, doesn't pop up as often
- 
 How can we improve utilization?

## Fully Associative Cache
- The "extreme" alternative to direct mapped caching is **fully associative** caching
    + Any block can be stored in *any* index of the cache

---

- 
 **Advantage**: Every spot in the cache will be used, and therefore less cache misses will occur
- 
 **Disadvantage**: We need to search the entire cache every time, so the hit time will increase

## Set-Associative Cache
- The compromise between these approaches is the **set-associative** cache
    + Blocks are *grouped* into sets of `n` blocks
    + Block number determines which set
- 
 Still requires searching through all `n` blocks in a set
- 
 Can be tuned to have a decent balance between hit rate and hit time

## Handling a Cache Miss
1. Check the cache for a memory address
    + Results in a miss
2. 
 Fetch corresponding block from RAM or disk
    + Wait for block to be retrieved (stall)
    + 
 Write block to cache
3. 
 Continue pipeline (unstall)

**Exercise:** show what happens if the next three memory access are to addresses 10110, 00000, and 10111

## Multilevel Caches
- Usually, caches are implemented in multiple "levels" for maximal efficiency
    + 
 L1 is smallest/fastest
    + 
 L2 is larger/slower
    + 
 L3 is largest/slowest
- 
 Modern multi-core processors typically have dedicated L1 and L2 caches for each core and a shared L3 cache