The big idea
A time quantum (also called a time slice) is the fixed interval of CPU time that a pre-emptive scheduler grants to a runnable thread before it may be forcibly interrupted and another thread allowed to run. In effect, the quantum is the metronome that turns a single physical CPU into the rapid illusion of many simultaneously executing tasks.
Why a quantum exists
- Fairness and interactivity – by pre-empting long-running threads, the OS guarantees that newly-arrived or I/O-bound tasks see the CPU within a bounded delay.
- Fault containment – an infinite-looping user process cannot monopolise the processor.
- Policy enforcement – once execution is quantised, higher-level rules (priority, cgroups, real-time classes) can be layered atop the basic round-robin rotation.
Choosing the “right” quantum size
| If the quantum is too short | If the quantum is too long |
|---|---|
| • Context-switch overhead rises (each switch incurs pipeline flushes, TLB reloads, cache pollution).• Throughput falls because more wall-clock time is spent in kernel mode. | • Scheduler degenerates toward first-come-first-served; interactive response time suffers.• Long bursts delay timer-driven latency deadlines. |
A classic back-of-the-envelope target is 1–2 × the average I/O “think-time” of interactive programs. Experiments show that when the quantum approaches the mean CPU-burst length, waiting time grows rapidly, while when it is an order of magnitude smaller, context switches dominate system time. For example, a single 10-unit CPU burst causes nine context switches if the quantum is 1 unit, but zero switches if the quantum is ≥10 units, directly harming or helping run-time respectively (Computer Science Stack Exchange).
Concrete values in real systems
| Operating system / scheduler | How the quantum is derived | Typical range |
|---|---|---|
| Windows 10/11 client | Fixed clock tick model: two slices of 15.625 ms on client SKU; server SKU shortens it under load (Medium) | ≈ 31 ms per thread before pre-emption |
| Linux CFS (Completely-Fair Scheduler) | No global constant; each task’s slice = sched_latency / n (clamped to min-granularity) where n = number of runnable threads (Stack Overflow, Unix & Linux Stack Exchange) | 1 – 6 ms for desktop loads; tens of µs inside real-time cgroups |
| Classic UNIX “round-robin” | Quantum tied to scheduler clock (often 10 ms), multiplied by priority factor | 10 – 100 ms |
| Embedded / hard real-time kernels | Quantum often disabled; tasks run to completion or until they block, under strict priorities | 0 (no pre-emption timer) |
Implementation details behind the scenes
- Periodic timer interrupt – the quantum boundary is detected by a hardware timer (APIC, HPET, or ARM SysTick).
- Kernel pre-emption path – the interrupt handler updates the running thread’s accounting, sets a need-resched flag, and exits; the low-level context switch happens on return to kernel mode.
- Accounting and decay – per-thread counters feed load averages, vruntime (CFS), or CPU share calculations; these influence the next quantum length or priority boost.
- NUMA & SMP nuance – on multi-core systems the quantum may be consumed entirely on one core, after which the thread can migrate; cache-warming penalties therefore grow with quantum length.
Secondary effects of quantum tuning
- System call batching: Longer slices favour vectorised I/O because applications stay in cache longer.
- Energy efficiency: Mobile OSs sometimes extend the quantum when the screen is off to reduce wake-ups.
- Latency-sensitive networking: Datapath threads in NFV stacks may request -R (real-time) priority plus SCHED_FIFO scheduling to bypass the general quantum altogether.
Practical guidance
- Measure first – collect
context_switchesandschedstatcounters; a switch rate > 20 000 Hz on desktop workloads usually signals an overly small quantum. - Tune with care – on Linux,
kernel.sched_min_granularity_nsraises the lower bound; decreasingsched_latency_nsshrinks per-task slices and improves GUI snappiness at a mild throughput cost. - Use class separation – isolate real-time or latency-critical tasks in dedicated cgroups or RT priority bands instead of globally shrinking the quantum for everyone.
Looking forward
Modern schedulers increasingly adapt the quantum dynamically or abandon the fixed-slice model altogether—e.g. Linux’s EEVDF patchset computes virtual deadlines rather than assigning identical slices (students.mimuw.edu.pl). Nonetheless, the foundational idea of a bounded execution window remains central to pre-emptive multitasking, and understanding the trade-offs around quantum length is key to designing responsive, efficient software systems.