A1.1.3 Explain the differences between the CPU and the GPU. (HL only)

• Differences in their design philosophies, usage scenarios

• Differences in their core architecture, processing power, memory access, power efficiency

• CPUs and GPUs working together: task division, data sharing, coordinating execution

📚 You can find additional information in the course companion pages 8 to 12

Big Idea:

The CPU (Central Processing Unit) and the GPU (Graphics Processing Unit) are both types of processors, but they are optimized for different types of tasks. The CPU is a general-purpose processor, ideal for sequential and logic-heavy operations, while the GPU is a specialized processor designed for high-throughput, massively parallel computations, such as rendering graphics or performing deep learning computations.

If you are interested in what is actually in the core of a CPU vs the core of a GPU, please click here.

1. Design Philosophies and Usage Scenarios

Feature	CPU	GPU
Design Philosophy	Optimized for low-latency, general-purpose computing with complex control logic	Optimized for high-throughput, data-parallel computing, often using SIMD (Single Instruction, Multiple Data)
Usage Scenarios	OS operations, logic branching, interactive applications, databases, compiling	3D graphics rendering, image/video processing, deep learning, scientific simulations

CPU example task: Executing instructions in a program with many branches (e.g., an operating system scheduler).
GPU example task: Rendering all the pixels on the screen at once (thousands of identical operations on different data).

2. Architectural Differences

a. Core Count and Structure

CPU: Few cores (4–32 for most systems), but each is very powerful and capable of handling complex tasks and branching logic.
GPU: Hundreds to thousands of simpler cores, optimized for executing the same instruction across many data points simultaneously (SIMD).

b. Instruction Handling

CPU: Supports complex instructions, out-of-order execution, speculative execution, and heavy branch prediction.
GPU: Designed for predictable, uniform execution, avoids branching where possible to maintain SIMD efficiency.

3. Processing Power, Memory Access, and Power Efficiency

a. Raw Processing Power

CPU: Higher per-core performance; excels at tasks that require low latency and logic-heavy control flows.
GPU: Much higher aggregate throughput, especially in floating-point or vector math operations.

b. Memory Access

CPU: Large cache hierarchy (L1, L2, L3), optimized for low-latency access and random access patterns.
GPU: High-bandwidth memory (e.g., GDDR6, HBM), optimized for streaming large amounts of data in parallel but not for complex memory access patterns.

c. Power Efficiency

CPU: Consumes more power per core due to its complex logic and versatility.
GPU: More power-efficient per operation when performing uniform, parallel tasks.

4. CPU and GPU Working Together

In modern systems, especially in high-performance computing (HPC), AI, and gaming, the CPU and GPU work together, each taking on the tasks it's best suited for.

a. Task Division

CPU: Manages system logic, high-level coordination, branching code, and irregular control flows.
GPU: Performs massively parallel computation, such as matrix multiplications, pixel shading, or training neural networks.

b. Data Sharing

Data is usually transferred between CPU and GPU via a bus (e.g., PCIe).
Shared memory models (e.g., Unified Memory in CUDA, AMD's HSA) allow more seamless memory access across both processors.

c. Coordinated Execution

CPU launches GPU kernels (parallel functions).
CPU waits for completion or checks with asynchronous execution, allowing concurrent CPU-GPU operation.
Libraries such as CUDA, OpenCL, or Vulkan define APIs for CPU-GPU coordination.

Summary Table:

Feature	CPU	GPU
Purpose	General-purpose, sequential logic	Specialized, parallel processing
Cores	Few, complex	Many, simple
Strength	Flexibility, logic-heavy tasks	High throughput, vector operations
Memory	Low-latency cache hierarchy	High-bandwidth memory, streaming access
Power	Higher per-core power usage	Efficient per operation (for parallel tasks)
Collaboration	Manages logic and coordination	Executes bulk computations under CPU direction