Time Quantum

This article is not assessed by the IB but may be helpful to deepen your understanding. Plus, I think it's cool.

The big idea

A time quantum (also called a time slice) is the fixed interval of CPU time that a pre-emptive scheduler grants to a runnable thread before it may be forcibly interrupted and another thread allowed to run. In effect, the quantum is the metronome that turns a single physical CPU into the rapid illusion of many simultaneously executing tasks.


Why a quantum exists

  • Fairness and interactivity – by pre-empting long-running threads, the OS guarantees that newly-arrived or I/O-bound tasks see the CPU within a bounded delay.
  • Fault containment – an infinite-looping user process cannot monopolise the processor.
  • Policy enforcement – once execution is quantised, higher-level rules (priority, cgroups, real-time classes) can be layered atop the basic round-robin rotation.

Choosing the “right” quantum size

If the quantum is too shortIf the quantum is too long
• Context-switch overhead rises (each switch incurs pipeline flushes, TLB reloads, cache pollution).• Throughput falls because more wall-clock time is spent in kernel mode.• Scheduler degenerates toward first-come-first-served; interactive response time suffers.• Long bursts delay timer-driven latency deadlines.

A classic back-of-the-envelope target is 1–2 × the average I/O “think-time” of interactive programs. Experiments show that when the quantum approaches the mean CPU-burst length, waiting time grows rapidly, while when it is an order of magnitude smaller, context switches dominate system time. For example, a single 10-unit CPU burst causes nine context switches if the quantum is 1 unit, but zero switches if the quantum is ≥10 units, directly harming or helping run-time respectively (Computer Science Stack Exchange).


Concrete values in real systems

Operating system / schedulerHow the quantum is derivedTypical range
Windows 10/11 clientFixed clock tick model: two slices of 15.625 ms on client SKU; server SKU shortens it under load (Medium)≈ 31 ms per thread before pre-emption
Linux CFS (Completely-Fair Scheduler)No global constant; each task’s slice = sched_latency / n (clamped to min-granularity) where n = number of runnable threads (Stack Overflow, Unix & Linux Stack Exchange)1 – 6 ms for desktop loads; tens of µs inside real-time cgroups
Classic UNIX “round-robin”Quantum tied to scheduler clock (often 10 ms), multiplied by priority factor10 – 100 ms
Embedded / hard real-time kernelsQuantum often disabled; tasks run to completion or until they block, under strict priorities0 (no pre-emption timer)

Implementation details behind the scenes

  1. Periodic timer interrupt – the quantum boundary is detected by a hardware timer (APIC, HPET, or ARM SysTick).
  2. Kernel pre-emption path – the interrupt handler updates the running thread’s accounting, sets a need-resched flag, and exits; the low-level context switch happens on return to kernel mode.
  3. Accounting and decay – per-thread counters feed load averages, vruntime (CFS), or CPU share calculations; these influence the next quantum length or priority boost.
  4. NUMA & SMP nuance – on multi-core systems the quantum may be consumed entirely on one core, after which the thread can migrate; cache-warming penalties therefore grow with quantum length.

Secondary effects of quantum tuning

  • System call batching: Longer slices favour vectorised I/O because applications stay in cache longer.
  • Energy efficiency: Mobile OSs sometimes extend the quantum when the screen is off to reduce wake-ups.
  • Latency-sensitive networking: Datapath threads in NFV stacks may request -R (real-time) priority plus SCHED_FIFO scheduling to bypass the general quantum altogether.

Practical guidance

  1. Measure first – collect context_switches and schedstat counters; a switch rate > 20 000 Hz on desktop workloads usually signals an overly small quantum.
  2. Tune with care – on Linux, kernel.sched_min_granularity_ns raises the lower bound; decreasing sched_latency_ns shrinks per-task slices and improves GUI snappiness at a mild throughput cost.
  3. Use class separation – isolate real-time or latency-critical tasks in dedicated cgroups or RT priority bands instead of globally shrinking the quantum for everyone.

Looking forward

Modern schedulers increasingly adapt the quantum dynamically or abandon the fixed-slice model altogether—e.g. Linux’s EEVDF patchset computes virtual deadlines rather than assigning identical slices (students.mimuw.edu.pl). Nonetheless, the foundational idea of a bounded execution window remains central to pre-emptive multitasking, and understanding the trade-offs around quantum length is key to designing responsive, efficient software systems.