A1.3.5 Explain the role of the operating system in managing multitasking and resource allocation. (HL only)
• The challenges of multitasking and resource allocation, including task scheduling, resource contention and deadlock
The big idea
An operating system (OS) turns a single collection of physical components into the illusion of many concurrently running programs, each safely and efficiently sharing finite resources. It does this by multitasking—rapidly switching the processor among contexts—and by resource allocation—deciding which task may consume CPU cycles, memory pages, disk/I-O bandwidth, network sockets, or other critical assets at any instant.
There have been some student questions about what happens when a process is sleeping (or blocked) and a network request comes in for that process. Read this article for clarification on interrupts, schedulers, and processes.
1 Multitasking in depth
| Concept | Precise meaning | How the OS realises it |
|---|---|---|
| Task | An execution context: a process with its own virtual address space and descriptors, or a kernel thread/lightweight process that shares an address space. | PCB/TCB structures store registers, page-table base, open-file table, credentials, cgroup membership.
Registers
Page-table base
Open-file table
Credentials
cgroup membership
At a conceptual level: the PCB/TCB is the OS’s memory of a task, capturing everything needed to stop it, manage it safely, and resume it later as if nothing happened. |
| Context switch | The act of saving the CPU state of the current task and restoring that of the next runnable task. | Triggered by timer interrupt, I/O completion, system call, or explicit yield; involves TLB flush or ASID update, pipeline drain, cache interference.
Timer interrupt
I/O completion
System call
Explicit yield
TLB (Translation Lookaside Buffer)
At a conceptual level: the TLB exists to make virtual memory fast, but it also explains why context switches are expensive—because cached translations may no longer be valid when the active task changes. TLB flush
ASID update
Pipeline drain
Cache interference
At a conceptual level: a context switch is triggered by control events and incurs real hardware costs, because the CPU and memory system must be reoriented to safely and correctly execute a different task. |
| Pre-emption | Forcible suspension of a running task by the kernel before the task voluntarily blocks or exits. | Driven by the scheduler tick or higher-priority wake-ups; guarded by pre-emption disable sections inside the kernel. Scheduler tick
Higher-priority wake-up
Pre-emption disable sections
At a conceptual level: pre-emption is policy-driven but carefully controlled, allowing responsive scheduling without corrupting kernel state. |
1.1 Task scheduling
The policy and mechanism that choose which task’s context is loaded next.
- Policy layer — defines goals (fair share, priority, energy) and computes eligibility (e.g., CFS’s virtual-runtime minimum, EDF’s earliest deadline).
- Mechanism layer — an O(1) or O(log n) ready-queue data structure (red-black tree, bitmap + array, skip-list) and per-core run queue selection.
Policy layer
- The part of the scheduler that defines what the system is trying to optimise.
- It encodes scheduling goals such as fairness, priority enforcement, deadline compliance, or energy efficiency, and computes which tasks are eligible to run and in what order.
Fair share
- A scheduling goal where CPU time is distributed proportionally among tasks or groups.
- It prevents starvation by ensuring no task can monopolise the processor over time.
Priority
- A numerical or ordered value assigned to a task that influences its preference for CPU access.
- Higher-priority tasks are selected to run before lower-priority ones when contention exists.
Energy (energy-aware scheduling)
- A scheduling goal that considers power consumption alongside performance.
- The scheduler may consolidate tasks onto fewer cores or prefer energy-efficient cores to reduce overall energy use.
Virtual runtime (CFS)
- A logical time value representing how much processor time a task has received, weighted by priority.
- The Completely Fair Scheduler selects the task with the smallest virtual runtime to approximate ideal multitasking fairness.
Earliest deadline (EDF)
- A real-time scheduling policy where tasks are ordered by their absolute deadlines.
- The task with the closest deadline is always selected to run next.
Mechanism layer
- The part of the scheduler that defines how scheduling decisions are implemented.
- It provides concrete data structures and algorithms to efficiently select the next runnable task.
Ready queue
- A kernel-managed data structure holding tasks that are ready to execute but not currently running.
- Its organisation directly affects scheduling overhead and scalability.
O(1) / O(log n)
- Descriptions of the time complexity required to select the next task to run.
- O(1) schedulers make constant-time decisions, while O(log n) schedulers trade slightly higher cost for more precise ordering.
Red-black tree
- A self-balancing binary search tree used to keep tasks ordered by scheduling criteria.
- It allows efficient insertion, deletion, and minimum-value selection.
Bitmap + array
- A scheduling structure where priority levels map to arrays, tracked by a bitmap.
- It enables constant-time lookup of the highest-priority runnable task.
Skip list
- A layered probabilistic data structure that maintains ordered elements with logarithmic access time.
- It offers simpler implementation than balanced trees with similar performance.
Per-core run queue
- A separate ready queue maintained for each CPU core in a multicore system.
- It improves scalability and cache locality by reducing contention between cores.
At a conceptual level: the policy layer decides who should run, and the mechanism layer makes that decision fast enough to be practical.
Challenges
- High utilisation vs. overhead – short time slices improve responsiveness but increase cache-destructive context switches.
- Fairness vs. predictability – weight-proportional share conflicts with hard real-time latency bounds.
- Priority inversion – a low-priority task holding a lock needed by a high-priority task stalls the latter until the kernel boosts (“priority inheritance”).
2 Resource allocation
2.1 Definition
The kernel must partition, arbitrate, and account for finite resources so that no task corrupts another’s state and overall performance meets design goals.
| Resource | Allocation techniques |
|---|---|
| CPU | Scheduler quanta, CPU affinity masks, cgroups CPU, share, energy-aware placement. Scheduler quanta (time quantum / time slice)
CPU affinity masks
cgroups CPU
CPU share
Energy-aware placement
At a conceptual level: these parameters tune how aggressively and where tasks run, shaping fairness, performance predictability, and energy efficiency in modern schedulers. |
| Memory | Demand paging, copy-on-write, NUMA page migration, working-set size limits, slab/slub caches. Demand paging
Copy-on-write (CoW)
NUMA page migration
Working-set size limits
Slab / SLUB caches
NUMA (Non-Uniform Memory Access)
Slab allocator
SLUB allocator
At a conceptual level: these mechanisms control when, where, and how memory is allocated, balancing efficiency, isolation, and performance under memory pressure. |
| Disk / NVMe | Per-I/O priority queues (CFQ, BFQ), IOThrottle, tagged command queuing. Per-I/O priority queues
CFQ (Completely Fair Queuing)
BFQ (Budget Fair Queuing)
IOThrottle
Tagged Command Queuing (TCQ)
At a conceptual level: modern I/O scheduling separates requests by intent and priority, then carefully meters access to slow devices to keep the system responsive. |
| Network | Token Bucket Filters, XDP, QoS classes, eBPF shaping, receive-side scaling (RSS) steering.
XDP (eXpress Data Path)
QoS (Quality of Service) classes
eBPF shaping
Receive-Side Scaling (RSS) steering
At a conceptual level: modern network scheduling combines rate control, programmable filtering, and parallelism to manage high-throughput traffic efficiently. |
| Kernel objects | Reference counts for files, sockets; semaphore counts; quota tables. Reference counts (files, sockets)
Semaphore counts
Quota tables
At a conceptual level: counting mechanisms let the OS safely share finite resources without leaks, races, or starvation. |
2.2 Resource contention
Two or more runnable tasks require the same non-shareable resource concurrently.
- Symptoms – rising wait times in run queue, lock contention spin-time, cache-line bouncing (false sharing), elevated latency.
- Kernel mitigations – adaptive spin locks, MCS locks for fairness, NUMA-aware memory placement, per-CPU data structures, I/O merging, lock elision with transactional memory.
3 Deadlock
3.1 Formal definition
A deadlock exists when a set of tasks is permanently blocked because each holds at least one resource and waits to acquire a resource held by another task in the same set. Coffman’s four necessary conditions:
- Mutual exclusion
- Hold-and-wait
- No pre-emption of held resources
- Circular wait (closed wait-for graph)
3.2 OS strategies
| Strategy | Principle | Example |
|---|---|---|
| Prevention | Remove at least one Coffman condition | Force tasks to request all locks at once (hold-and-wait gone); impose global lock ordering (circular wait gone). |
| Avoidance | Admit a request only if system would remain in a safe state | Banker’s algorithm for fixed-size resource vectors. |
| Detection & recovery | Periodically scan wait-for graph; if cycle found, kill or roll back a victim task | Linux’s CONFIG_DETECT_HUNG_TASK, databases’ deadlock detectors. |
| Static analysis | Prove absence of cycles at design time in synchronous reactive systems | Model checking of mutex protocols in avionics RTOS. |
4 Putting it all together: the OS control loop
- Hardware timer fires → trap to kernel.
- Scheduler charges the running task’s wall-time and consults the policy (Wall-time is the actual elapsed real-world time during which a task is running on the CPU. It includes all time spent executing, regardless of instruction count or work performed.)
- If pre-emption needed, context switch to next eligible task.
- That task’s memory pages are ensured resident via page-fault handling.
- I/O requests are queued to device schedulers that observe per-task quotas.
- Synchronisation primitives gate critical regions; kernel monitors for deadlock hints (long lock hold times, wait-for cycles).
- Accounting subsystem logs CPU, memory, and I/O usage for fairness rebalancing and billing.
Throughout, the kernel maintains invariants: no address-space overlap, no unauthorised device register access, no starvation beyond configured policy limits.
Key takeaways
- Multitasking turns scarce CPU cycles into the appearance of parallel threads through pre-emption and rapid context switches.
- Resource allocation guarantees isolation and efficient usage of all hardware resources; its correctness depends on the scheduler, memory manager, and I/O subsystems cooperating.
- Resource contention and deadlock are inherent risks; kernels deploy prevention, avoidance, detection, and recovery mechanisms, plus modern techniques like lock-free data structures and eBPF-driven scheduling, to sustain throughput without sacrificing correctness.