Details about L1, L2, and L3 caches

This article is not assessed by the IB but may be helpful to deepen your understanding. Plus, I think it's cool.

The Big Idea

Cache memory stores copies of instructions and data that the CPU is likely to reuse soon, organized into three levels:

  • L1 cache: closest and fastest
  • L2 cache: larger, slower, and still per core
  • L3 cache: shared between cores, even larger and slower

Each level stores different kinds of information, depending on proximity to the CPU and prediction of future access patterns.

 

L1 Cache – “Immediate Working Set”

Purpose:
Hold the exact data and instructions the CPU core is currently executing or will execute in the next few clock cycles.

Examples of what’s stored:

  • The current instruction being executed (e.g., ADD RAX, RBX)
  • The next few decoded instructions in the instruction pipeline (prefetched from RAM)
  • Operands and memory addresses used repeatedly in tight loops
    Example: in the loop for (i = 0; i < 1000; i++) total += price[i];
    L1 cache might store:
    • The variable total
    • The current and next few price[i] values
    • The loop counter i
  • Function call return addresses (so the CPU doesn’t have to fetch from L2 or RAM)

Analogy: The notes right on your desk—things you are using this very second.

 

L2 Cache – “Per-Core Scratchpad”

Purpose:
Store data and instructions that were recently used or are likely to be used soon again by the same core, but not in the immediate instruction window.

Examples of what’s stored:

  • Recently used arrays, small data structures, or results from function calls
    Example: a 2D matrix currently being processed for an image filter
    • Rows recently processed may sit in L2
    • The next few rows will soon move into L1
  • Code from a small function (e.g., a math library call like sqrt() or sin())
  • Partial computation results or temporary lookup tables

Analogy: Your desk drawer—things you used recently and might need again soon.

 

L3 Cache – “Shared Repository”

Purpose:
Coordinate data between multiple cores and store less-frequently accessed but still useful data.

Examples of what’s stored:

  • Shared data between threads (e.g., shared variables in multithreaded programs)
  • Portions of large datasets (e.g., large arrays, textures, or data blocks in a simulation)
  • Common libraries or instruction sequences used across threads (e.g., OS kernel routines)
  • Prefetched blocks from RAM that the CPU predicts will be used soon

Analogy: A shared bookshelf beside your workspace—accessible to all team members (CPU cores).

 

Example Scenario: Image Processing Program

Memory LevelExample Data / Instructions StoredAccess Time (approx.)
L1 CacheCurrent pixel brightness values, loop counter, add instruction1–2 cycles
L2 CacheNext 64×64 pixel block being processed, filter coefficients3–14 cycles
L3 CacheEntire image region shared between threads30–50 cycles
RAMWhole image file loaded from disk100+ cycles