The Big Idea
Cache memory stores copies of instructions and data that the CPU is likely to reuse soon, organized into three levels:
- L1 cache: closest and fastest
- L2 cache: larger, slower, and still per core
- L3 cache: shared between cores, even larger and slower
Each level stores different kinds of information, depending on proximity to the CPU and prediction of future access patterns.
L1 Cache – “Immediate Working Set”
Purpose:
Hold the exact data and instructions the CPU core is currently executing or will execute in the next few clock cycles.
Examples of what’s stored:
- The current instruction being executed (e.g.,
ADD RAX, RBX) - The next few decoded instructions in the instruction pipeline (prefetched from RAM)
- Operands and memory addresses used repeatedly in tight loops
Example: in the loopfor (i = 0; i < 1000; i++) total += price[i];
L1 cache might store:- The variable
total - The current and next few
price[i]values - The loop counter
i
- The variable
- Function call return addresses (so the CPU doesn’t have to fetch from L2 or RAM)
Analogy: The notes right on your desk—things you are using this very second.
L2 Cache – “Per-Core Scratchpad”
Purpose:
Store data and instructions that were recently used or are likely to be used soon again by the same core, but not in the immediate instruction window.
Examples of what’s stored:
- Recently used arrays, small data structures, or results from function calls
Example: a 2D matrix currently being processed for an image filter- Rows recently processed may sit in L2
- The next few rows will soon move into L1
- Code from a small function (e.g., a math library call like
sqrt()orsin()) - Partial computation results or temporary lookup tables
Analogy: Your desk drawer—things you used recently and might need again soon.
L3 Cache – “Shared Repository”
Purpose:
Coordinate data between multiple cores and store less-frequently accessed but still useful data.
Examples of what’s stored:
- Shared data between threads (e.g., shared variables in multithreaded programs)
- Portions of large datasets (e.g., large arrays, textures, or data blocks in a simulation)
- Common libraries or instruction sequences used across threads (e.g., OS kernel routines)
- Prefetched blocks from RAM that the CPU predicts will be used soon
Analogy: A shared bookshelf beside your workspace—accessible to all team members (CPU cores).
Example Scenario: Image Processing Program
| Memory Level | Example Data / Instructions Stored | Access Time (approx.) |
|---|---|---|
| L1 Cache | Current pixel brightness values, loop counter, add instruction | 1–2 cycles |
| L2 Cache | Next 64×64 pixel block being processed, filter coefficients | 3–14 cycles |
| L3 Cache | Entire image region shared between threads | 30–50 cycles |
| RAM | Whole image file loaded from disk | 100+ cycles |