The Big Idea
A Graphics Processing Unit (GPU) is designed to perform millions of mathematical calculations in parallel.
Most of these calculations involve floating-point numbers — numbers that can represent fractions, very large values, and very small values with high precision.
A floating-point operation (FLOP) is simply any arithmetic operation — such as addition, subtraction, multiplication, or division — performed on floating-point numbers.
GPUs are measured by how many of these operations they can perform per second, called FLOPS (Floating-Point Operations Per Second). Modern GPUs can perform trillions of FLOPs per second, which is what makes them ideal for graphics, scientific simulations, and machine learning.
Why Floating-Point Matters in Graphics
Images and 3D scenes depend on continuous quantities: color intensity, light reflection, distance, angle, velocity, and so on.
These quantities are not whole numbers — they require decimal precision. For example:
| Quantity | Example | Needs Fractional Precision? |
|---|---|---|
| Pixel color intensity | 0.73 (on a scale from 0.0 to 1.0) | Yes |
| Vertex coordinate | (12.25, 8.75, 4.5) | Yes |
| Light brightness | 0.004 → 1.0 | Yes |
| Rotation angle | 45.3° | Yes |
To handle these, the GPU must use floating-point arithmetic rather than integer arithmetic.
Floating-point formats store real numbers using scientific notation.
This allows very large or small numbers to be represented efficiently — essential for graphics and physical simulation.
Floating-Point Hardware Inside the GPU
Each GPU contains thousands of floating-point units (FPUs) — specialized parts of the arithmetic logic units (ALUs) designed to handle real-number operations.
These FPUs are arranged inside streaming multiprocessors (SMs), which execute many threads at once.
Each SM can:
- Perform addition, multiplication, or fused multiply-add (FMA) operations on floating-point data.
- Execute SIMD (Single Instruction, Multiple Data) instructions — applying the same operation to many data points simultaneously.
- Handle different precision modes depending on the workload.
Common Floating-Point Precisions
| Type | Bits | Approximate Precision | Typical Use |
|---|---|---|---|
| FP32 (single precision) | 32 | ~7 decimal digits | Games, general rendering |
| FP16 (half precision) | 16 | ~3–4 decimal digits | AI inference, mobile GPUs |
| FP64 (double precision) | 64 | ~15 decimal digits | Scientific and engineering simulations |
Modern GPUs dynamically switch between these precisions to balance speed and accuracy. For instance, an AI model might train in FP32 but run (infer) in FP16 for efficiency.
Floating-Point Operations in Practice
1. Rendering and Shading
Every pixel color and light reflection in a 3D scene is computed through floating-point math.
The GPU computes this for millions of pixels per frame — billions of floating-point operations every second.
2. Physics and Simulation
In a simulation (e.g., cloth movement, explosions), each vertex’s position and velocity are updated by floating-point formulas using small time steps.
3. Machine Learning
Neural networks use floating-point weights and activations. A GPU’s throughput in floating-point operations determines how quickly models can train or make predictions.
Numerical Accuracy and Rounding
Floating-point representation is approximate. Not all decimal fractions (like 0.1) can be represented exactly in binary. This leads to rounding errors — small differences that can accumulate.
GPUs include hardware logic to:
- Round results consistently (often to nearest-even).
- Detect overflow/underflow conditions.
- Use fused multiply-add (FMA) to improve precision by performing
(a × b) + cin one step without intermediate rounding.
Measuring GPU Performance: FLOPS
GPU performance is often quoted in GFLOPS (billions) or TFLOPS (trillions).
For example:
- A gaming GPU might reach 20 TFLOPS (20 trillion floating-point operations per second).
- A data-center GPU used for AI may exceed 100 TFLOPS using FP16 precision.
The formula for peak FLOPS is:
This measure helps compare GPU models and understand computational capability.
CPU vs GPU Floating-Point Performance
| Feature | CPU | GPU |
|---|---|---|
| Floating-point units | Few, complex | Thousands, simple |
| Throughput | Tens of GFLOPS | Tens to hundreds of TFLOPS |
| Strength | Complex logic, serial tasks | Massive parallel arithmetic |
| Example task | Spreadsheet calculation | 3D rendering, AI matrix math |
A CPU handles a few precise floating-point calculations efficiently; a GPU handles thousands of simple ones simultaneously.
In Summary
Floating-point operations are the numerical foundation of everything a GPU does.
Because GPUs contain thousands of floating-point units that work in parallel, they can render realistic graphics, simulate physics, and train deep neural networks — all in real time.
The ability to process floating-point numbers quickly and accurately is what distinguishes a GPU from general-purpose processors.