Floating-point operation

This article is not assessed by the IB but may be helpful to deepen your understanding. Plus, I think it's cool.

The Big Idea

A Graphics Processing Unit (GPU) is designed to perform millions of mathematical calculations in parallel.
Most of these calculations involve floating-point numbers — numbers that can represent fractions, very large values, and very small values with high precision.

A floating-point operation (FLOP) is simply any arithmetic operation — such as addition, subtraction, multiplication, or division — performed on floating-point numbers.

GPUs are measured by how many of these operations they can perform per second, called FLOPS (Floating-Point Operations Per Second). Modern GPUs can perform trillions of FLOPs per second, which is what makes them ideal for graphics, scientific simulations, and machine learning.


Why Floating-Point Matters in Graphics

Images and 3D scenes depend on continuous quantities: color intensity, light reflection, distance, angle, velocity, and so on.
These quantities are not whole numbers — they require decimal precision. For example:

QuantityExampleNeeds Fractional Precision?
Pixel color intensity0.73 (on a scale from 0.0 to 1.0)Yes
Vertex coordinate(12.25, 8.75, 4.5)Yes
Light brightness0.004 → 1.0Yes
Rotation angle45.3°Yes

To handle these, the GPU must use floating-point arithmetic rather than integer arithmetic.
Floating-point formats store real numbers using scientific notation.

 

This allows very large or small numbers to be represented efficiently — essential for graphics and physical simulation.


Floating-Point Hardware Inside the GPU

Each GPU contains thousands of floating-point units (FPUs) — specialized parts of the arithmetic logic units (ALUs) designed to handle real-number operations.
These FPUs are arranged inside streaming multiprocessors (SMs), which execute many threads at once.

Each SM can:

  • Perform addition, multiplication, or fused multiply-add (FMA) operations on floating-point data.
  • Execute SIMD (Single Instruction, Multiple Data) instructions — applying the same operation to many data points simultaneously.
  • Handle different precision modes depending on the workload.

Common Floating-Point Precisions

TypeBitsApproximate PrecisionTypical Use
FP32 (single precision)32~7 decimal digitsGames, general rendering
FP16 (half precision)16~3–4 decimal digitsAI inference, mobile GPUs
FP64 (double precision)64~15 decimal digitsScientific and engineering simulations

Modern GPUs dynamically switch between these precisions to balance speed and accuracy. For instance, an AI model might train in FP32 but run (infer) in FP16 for efficiency.


Floating-Point Operations in Practice

1. Rendering and Shading

Every pixel color and light reflection in a 3D scene is computed through floating-point math.
 

The GPU computes this for millions of pixels per frame — billions of floating-point operations every second.

2. Physics and Simulation

In a simulation (e.g., cloth movement, explosions), each vertex’s position and velocity are updated by floating-point formulas using small time steps.

3. Machine Learning

Neural networks use floating-point weights and activations. A GPU’s throughput in floating-point operations determines how quickly models can train or make predictions.


Numerical Accuracy and Rounding

Floating-point representation is approximate. Not all decimal fractions (like 0.1) can be represented exactly in binary. This leads to rounding errors — small differences that can accumulate.

GPUs include hardware logic to:

  • Round results consistently (often to nearest-even).
  • Detect overflow/underflow conditions.
  • Use fused multiply-add (FMA) to improve precision by performing (a × b) + c in one step without intermediate rounding.

Measuring GPU Performance: FLOPS

GPU performance is often quoted in GFLOPS (billions) or TFLOPS (trillions).
For example:

  • A gaming GPU might reach 20 TFLOPS (20 trillion floating-point operations per second).
  • A data-center GPU used for AI may exceed 100 TFLOPS using FP16 precision.

The formula for peak FLOPS is:
 

This measure helps compare GPU models and understand computational capability.


CPU vs GPU Floating-Point Performance

FeatureCPUGPU
Floating-point unitsFew, complexThousands, simple
ThroughputTens of GFLOPSTens to hundreds of TFLOPS
StrengthComplex logic, serial tasksMassive parallel arithmetic
Example taskSpreadsheet calculation3D rendering, AI matrix math

A CPU handles a few precise floating-point calculations efficiently; a GPU handles thousands of simple ones simultaneously.


In Summary

Floating-point operations are the numerical foundation of everything a GPU does.
Because GPUs contain thousands of floating-point units that work in parallel, they can render realistic graphics, simulate physics, and train deep neural networks — all in real time.

The ability to process floating-point numbers quickly and accurately is what distinguishes a GPU from general-purpose processors.