Details about L1, L2, and L3 caches

The Big Idea

Cache memory stores copies of instructions and data that the CPU is likely to reuse soon, organized into three levels:

L1 cache: closest and fastest
L2 cache: larger, slower, and still per core
L3 cache: shared between cores, even larger and slower

Each level stores different kinds of information, depending on proximity to the CPU and prediction of future access patterns.

L1 Cache – “Immediate Working Set”

Purpose:
Hold the exact data and instructions the CPU core is currently executing or will execute in the next few clock cycles.

Examples of what’s stored:

The current instruction being executed (e.g., ADD RAX, RBX)
The next few decoded instructions in the instruction pipeline (prefetched from RAM)
Operands and memory addresses used repeatedly in tight loops
Example: in the loop for (i = 0; i < 1000; i++) total += price[i];
L1 cache might store:
- The variable total
- The current and next few price[i] values
- The loop counter i
Function call return addresses (so the CPU doesn’t have to fetch from L2 or RAM)

Analogy: The notes right on your desk—things you are using this very second.

L2 Cache – “Per-Core Scratchpad”

Purpose:
Store data and instructions that were recently used or are likely to be used soon again by the same core, but not in the immediate instruction window.

Examples of what’s stored:

Recently used arrays, small data structures, or results from function calls
Example: a 2D matrix currently being processed for an image filter
- Rows recently processed may sit in L2
- The next few rows will soon move into L1
Code from a small function (e.g., a math library call like sqrt() or sin())
Partial computation results or temporary lookup tables

Analogy: Your desk drawer—things you used recently and might need again soon.

L3 Cache – “Shared Repository”

Purpose:
Coordinate data between multiple cores and store less-frequently accessed but still useful data.

Examples of what’s stored:

Shared data between threads (e.g., shared variables in multithreaded programs)
Portions of large datasets (e.g., large arrays, textures, or data blocks in a simulation)
Common libraries or instruction sequences used across threads (e.g., OS kernel routines)
Prefetched blocks from RAM that the CPU predicts will be used soon

Analogy: A shared bookshelf beside your workspace—accessible to all team members (CPU cores).

Example Scenario: Image Processing Program

Memory Level	Example Data / Instructions Stored	Access Time (approx.)
L1 Cache	Current pixel brightness values, loop counter, `add` instruction	1–2 cycles
L2 Cache	Next 64×64 pixel block being processed, filter coefficients	3–14 cycles
L3 Cache	Entire image region shared between threads	30–50 cycles
RAM	Whole image file loaded from disk	100+ cycles