Inside a CPU core and a GPU core | Computer Science KB

The Big Idea

Both the CPU (Central Processing Unit) and the GPU (Graphics Processing Unit) are made up of cores — miniature processing engines that actually perform instructions.
Each core contains the same basic functional units:

Control Unit (CU) – directs the flow of data and instructions
Arithmetic Logic Unit (ALU) – performs calculations and logical comparisons
Registers – ultra-fast storage for immediate values and addresses
Internal Buses – connect the parts of the core to move data quickly

Where they differ is in how many of these elements they have, how they’re organized, and what they’re optimized for.

CPU Core Structure

A CPU core is designed for general-purpose, sequential tasks — running an operating system, managing files, performing logical decisions, and controlling overall program flow.

Inside a CPU Core

Arithmetic Logic Unit (ALU)
Performs integer arithmetic (+, −, ×, /) and logical operations (AND, OR, NOT).
Control Unit (CU)
Fetches instructions from memory, decodes them, and sends control signals to other parts of the CPU.
Registers
Store temporary values used during instruction execution. Common ones include:
- Program Counter (PC): holds the address of the next instruction
- Instruction Register (IR): holds the current instruction
- Memory Address Register (MAR) and Memory Data Register (MDR): manage data transfers between memory and CPU
- Accumulator (AC): stores intermediate arithmetic results
Cache Memory
Extremely fast memory close to the core. Cache levels (L1, L2, sometimes L3) reduce the time needed to fetch data from main memory.
Pipelines and Branch Units (HL concept)
Modern CPU cores use pipelining — dividing the fetch–decode–execute cycle into overlapping stages, so multiple instructions are in different stages of execution at once.
Vector or SIMD Units (some CPUs)
Handle operations on multiple data elements simultaneously — useful for multimedia or numerical work.

Key point:
Each CPU core is powerful but limited in number (often 4–16 cores). Each core handles a few threads of complex, branching logic.

GPU Core Structure

A GPU core is designed for massively parallel tasks — running thousands of very small, similar computations simultaneously.

Inside a GPU Core

While CPUs emphasize control, GPUs emphasize throughput. A GPU is built from hundreds or thousands of simpler cores, each capable of performing simple arithmetic quickly.

Streaming Multiprocessors (SMs)
Groups of small execution units. Each SM has:
- Multiple ALUs (sometimes called CUDA cores or shader units)
- A control unit shared among them
- A small register file and shared memory
ALUs Everywhere
Each GPU “core” has many ALUs — this allows vector or matrix operations on huge data sets (for example, every pixel in an image).
Minimal Control Logic
Because GPUs repeat the same instruction across large data sets, they use fewer, simpler control units. This saves space and power.
Specialized Memory Hierarchy
- Global memory (large, slower)
- Shared memory within each multiprocessor (fast, for cooperation among cores)
- Texture and constant memory optimized for graphics or AI workloads.
Parallel Execution Model
GPUs execute the same operation on many pieces of data simultaneously — called SIMD (Single Instruction, Multiple Data).

Comparing CPU and GPU Cores

Feature	CPU Core	GPU Core
Purpose	General-purpose processing	Highly parallel numerical computation
ALU Count	Few, complex	Many, simple
Control Unit	Sophisticated per core	One per many cores
Memory Hierarchy	Large caches	Many small, fast local memories
Parallelism	Dozens of threads	Thousands of threads
Best for	Logic-heavy, sequential tasks	Data-heavy, repetitive tasks (graphics, AI)

In Summary

Inside every core — CPU or GPU — are the same basic building blocks: ALU, CU, registers, and buses.
What changes is the architecture philosophy:

CPUs are optimized for diverse instructions and complex control.
GPUs are optimized for massive data parallelism and arithmetic throughput.