Data Alignment

This article is not assessed by the IB but may be helpful to deepen your understanding. Plus, I think it's cool.

Big idea


Data alignment is the rule that certain pieces of data must start at memory addresses that are multiples of their natural size. A 4-byte integer is “aligned” when it begins at an address divisible by 4; an 8-byte double is aligned at an address divisible by 8. Alignment ensures that the CPU can fetch the data in the fewest possible cycles, using the natural width of its data bus and word size.

When alignment rules are violated (misalignment), the CPU may require multiple loads, extra microcode, or even fault. This makes alignment a fundamental design constraint for performance, correctness, and portability.

 

1. Why alignment exists: the relationship to word size

CPUs have a natural “word” (32-bit or 64-bit) that determines the width of registers, the ALU, and most importantly, the width of memory buses. The IB syllabus explicitly covers the importance of word size and memory interactions under A1.1 and A1.2 (CPU architecture and binary representation) .

A word-aligned access means:

  • A 4-byte load on a 32-bit CPU can be fetched in one bus cycle.
  • An 8-byte load on a 64-bit CPU can be fetched in one bus cycle.

If data begins at an address not divisible by the word size, the CPU may need two memory transactions—and possibly a merge operation—to reconstruct the value.

Example:
Assume a 32-bit CPU (4-byte word). If a 4-byte integer starts at address 0x1002, the CPU must:

  1. Load bytes from 0x1000–0x1003
  2. Load bytes from 0x1004–0x1007
  3. Merge the correct halves

This incurs delay and increases pipeline stalls.


2. How alignment interacts with binary encoding

The syllabus topic A1.2.2 asks students to explain how binary is used to store data; alignment determines where these binary patterns reside in memory .

Each piece of encoded data—integers, characters, floating-point values, structures—has:

  • A binary representation (the bits)
  • A size (number of bytes)
  • A preferred alignment (boundary requirements)

Mapping this:

ConceptWhat it meansWhy alignment matters
BitThe smallest unitAlignment groups bits into word-sized chunks efficiently.
Byte (8 bits)Hardware’s basic addressable unitAlignment avoids accessing bytes across multiple bus boundaries.
WordCPU’s natural unitAlignment ensures each word read/write matches the bus width.
Encoding ruleInterpretation of bit-patternsEncoded structures must satisfy alignment so decoders can load them quickly.
MetadataExtra bits that describe structureStructured formats (e.g., headers) include padding bytes to preserve alignment expectations.

3. Alignment in compound data structures

Most languages (C, C++, Rust) enforce alignment rules when laying out structs or records in memory.

Example:

struct Record {
    char flag;        // 1 byte
    int count;        // 4 bytes, must align to 4
    double score;     // 8 bytes, must align to 8
};

The compiler may insert padding bytes:

  • flag occupies 1 byte
  • 3 bytes of padding ensure count starts at a 4-byte boundary
  • After count, 4 more bytes of padding ensure score begins at an 8-byte boundary

Result: The structure may use 24 bytes even though the fields total only 13 bytes.

This padding preserves alignment so the CPU can fetch each field efficiently.


4. Alignment and performance

Alignment affects:

1. Memory access speed

Aligned loads avoid multi-cycle read–merge operations.

2. Cache behavior

Aligned data fits neatly into cache lines and reduces the chance of crossing boundaries.

3. Pipeline efficiency

Misaligned loads increase stalls, especially in superscalar or pipelined architectures (A1.1.6 on pipelining) .

4. Vector instructions (SIMD)

Modern CPUs require 16-, 32-, or even 64-byte alignment for vector loads; otherwise instructions may fall back to slower non-aligned variants.


5. Alignment and operating systems

The IB syllabus discusses OS memory management under A1.3 (memory, registers, efficiency) . Alignment policies permeate OS responsibilities:

  • Heap allocators (malloc/new) return memory correctly aligned for the largest native type.
  • Page boundaries (4 KB, 16 KB) are aligned by design.
  • Stack frames are typically aligned to 16 bytes on modern systems.

This allows system libraries, compilers, and the CPU to cooperate safely.


6. Alignment and networking / data formats

Binary protocols (e.g., TCP/IP headers, file formats like PNG) may include padding fields so multi-byte values remain aligned. Even though the IB networks topic (A2) does not explicitly name alignment, it is implicit in how structured binary data is serialised and parsed within protocol headers.

Conclusion

Data alignment is the bridge between binary encoding and computer architecture. It ensures that the CPU’s natural unit of work—the word—can be applied directly to the binary data stored in memory. Without alignment, every operation would be slower, less predictable, and more complex.

Linking topics

In the IB Computer Science context, alignment deepens student understanding of:

  • how binary data is structured (A1.2)
  • how the CPU fetches and manipulates memory (A1.1)
  • how higher-level abstractions depend on lower-level constraints