A1.1.3 Explain the differences between the CPU and the GPU. (HL only)
• Differences in their design philosophies, usage scenarios
• Differences in their core architecture, processing power, memory access, power efficiency
• CPUs and GPUs working together: task division, data sharing, coordinating execution
📚 You can find additional information in the course companion pages 8 to 12
Big Idea:
The CPU (Central Processing Unit) and the GPU (Graphics Processing Unit) are both types of processors, but they are optimized for different types of tasks. The CPU is a general-purpose processor, ideal for sequential and logic-heavy operations, while the GPU is a specialized processor designed for high-throughput, massively parallel computations, such as rendering graphics or performing deep learning computations.
If you are interested in what is actually in the core of a CPU vs the core of a GPU, please click here.
1. Design Philosophies and Usage Scenarios
| Feature | CPU | GPU |
|---|---|---|
| Design Philosophy | Optimized for low-latency, general-purpose computing with complex control logic | Optimized for high-throughput, data-parallel computing, often using SIMD (Single Instruction, Multiple Data) |
| Usage Scenarios | OS operations, logic branching, interactive applications, databases, compiling | 3D graphics rendering, image/video processing, deep learning, scientific simulations |
- CPU example task: Executing instructions in a program with many branches (e.g., an operating system scheduler).
- GPU example task: Rendering all the pixels on the screen at once (thousands of identical operations on different data).
2. Architectural Differences
a. Core Count and Structure
- CPU: Few cores (4–32 for most systems), but each is very powerful and capable of handling complex tasks and branching logic.
- GPU: Hundreds to thousands of simpler cores, optimized for executing the same instruction across many data points simultaneously (SIMD).
b. Instruction Handling
- CPU: Supports complex instructions, out-of-order execution, speculative execution, and heavy branch prediction.
- GPU: Designed for predictable, uniform execution, avoids branching where possible to maintain SIMD efficiency.
3. Processing Power, Memory Access, and Power Efficiency
a. Raw Processing Power
- CPU: Higher per-core performance; excels at tasks that require low latency and logic-heavy control flows.
- GPU: Much higher aggregate throughput, especially in floating-point or vector math operations.
b. Memory Access
- CPU: Large cache hierarchy (L1, L2, L3), optimized for low-latency access and random access patterns.
- GPU: High-bandwidth memory (e.g., GDDR6, HBM), optimized for streaming large amounts of data in parallel but not for complex memory access patterns.
c. Power Efficiency
- CPU: Consumes more power per core due to its complex logic and versatility.
- GPU: More power-efficient per operation when performing uniform, parallel tasks.
4. CPU and GPU Working Together
In modern systems, especially in high-performance computing (HPC), AI, and gaming, the CPU and GPU work together, each taking on the tasks it's best suited for.
a. Task Division
- CPU: Manages system logic, high-level coordination, branching code, and irregular control flows.
- GPU: Performs massively parallel computation, such as matrix multiplications, pixel shading, or training neural networks.
b. Data Sharing
- Data is usually transferred between CPU and GPU via a bus (e.g., PCIe).
- Shared memory models (e.g., Unified Memory in CUDA, AMD's HSA) allow more seamless memory access across both processors.
c. Coordinated Execution
- CPU launches GPU kernels (parallel functions).
- CPU waits for completion or checks with asynchronous execution, allowing concurrent CPU-GPU operation.
- Libraries such as CUDA, OpenCL, or Vulkan define APIs for CPU-GPU coordination.
Summary Table:
| Feature | CPU | GPU |
|---|---|---|
| Purpose | General-purpose, sequential logic | Specialized, parallel processing |
| Cores | Few, complex | Many, simple |
| Strength | Flexibility, logic-heavy tasks | High throughput, vector operations |
| Memory | Low-latency cache hierarchy | High-bandwidth memory, streaming access |
| Power | Higher per-core power usage | Efficient per operation (for parallel tasks) |
| Collaboration | Manages logic and coordination | Executes bulk computations under CPU direction |