The Big Idea
Both the CPU (Central Processing Unit) and the GPU (Graphics Processing Unit) are made up of cores — miniature processing engines that actually perform instructions.
Each core contains the same basic functional units:
- Control Unit (CU) – directs the flow of data and instructions
- Arithmetic Logic Unit (ALU) – performs calculations and logical comparisons
- Registers – ultra-fast storage for immediate values and addresses
- Internal Buses – connect the parts of the core to move data quickly
Where they differ is in how many of these elements they have, how they’re organized, and what they’re optimized for.
CPU Core Structure
A CPU core is designed for general-purpose, sequential tasks — running an operating system, managing files, performing logical decisions, and controlling overall program flow.
Inside a CPU Core
- Arithmetic Logic Unit (ALU)
Performs integer arithmetic (+,−,×,/) and logical operations (AND,OR,NOT). - Control Unit (CU)
Fetches instructions from memory, decodes them, and sends control signals to other parts of the CPU. - Registers
Store temporary values used during instruction execution. Common ones include:- Program Counter (PC): holds the address of the next instruction
- Instruction Register (IR): holds the current instruction
- Memory Address Register (MAR) and Memory Data Register (MDR): manage data transfers between memory and CPU
- Accumulator (AC): stores intermediate arithmetic results
- Cache Memory
Extremely fast memory close to the core. Cache levels (L1, L2, sometimes L3) reduce the time needed to fetch data from main memory. - Pipelines and Branch Units (HL concept)
Modern CPU cores use pipelining — dividing the fetch–decode–execute cycle into overlapping stages, so multiple instructions are in different stages of execution at once. - Vector or SIMD Units (some CPUs)
Handle operations on multiple data elements simultaneously — useful for multimedia or numerical work.
Key point:
Each CPU core is powerful but limited in number (often 4–16 cores). Each core handles a few threads of complex, branching logic.
GPU Core Structure
A GPU core is designed for massively parallel tasks — running thousands of very small, similar computations simultaneously.
Inside a GPU Core
While CPUs emphasize control, GPUs emphasize throughput. A GPU is built from hundreds or thousands of simpler cores, each capable of performing simple arithmetic quickly.
- Streaming Multiprocessors (SMs)
Groups of small execution units. Each SM has:- Multiple ALUs (sometimes called CUDA cores or shader units)
- A control unit shared among them
- A small register file and shared memory
- ALUs Everywhere
Each GPU “core” has many ALUs — this allows vector or matrix operations on huge data sets (for example, every pixel in an image). - Minimal Control Logic
Because GPUs repeat the same instruction across large data sets, they use fewer, simpler control units. This saves space and power. - Specialized Memory Hierarchy
- Global memory (large, slower)
- Shared memory within each multiprocessor (fast, for cooperation among cores)
- Texture and constant memory optimized for graphics or AI workloads.
- Parallel Execution Model
GPUs execute the same operation on many pieces of data simultaneously — called SIMD (Single Instruction, Multiple Data).
Comparing CPU and GPU Cores
| Feature | CPU Core | GPU Core |
|---|---|---|
| Purpose | General-purpose processing | Highly parallel numerical computation |
| ALU Count | Few, complex | Many, simple |
| Control Unit | Sophisticated per core | One per many cores |
| Memory Hierarchy | Large caches | Many small, fast local memories |
| Parallelism | Dozens of threads | Thousands of threads |
| Best for | Logic-heavy, sequential tasks | Data-heavy, repetitive tasks (graphics, AI) |
In Summary
Inside every core — CPU or GPU — are the same basic building blocks: ALU, CU, registers, and buses.
What changes is the architecture philosophy:
- CPUs are optimized for diverse instructions and complex control.
- GPUs are optimized for massive data parallelism and arithmetic throughput.