Checksums

This article is not assessed by the IB but may be helpful to deepen your understanding. Plus, I think it's cool.

The big idea

A checksum is a compact numerical summary of a block of data.
By recalculating the checksum after transmission or storage and comparing it with the original value, a system can detect whether any bits flipped—quickly, with very little additional data.


1 What a checksum really is

Given a message MM viewed as a sequence of bytes or bits, a checksum function C()C(\cdot) produces an integer that accompanies the message:

transmit (M,  C(M))\text{transmit } \bigl(M,\; C(M)\bigr)

At the destination the same algorithm recomputes C(M)C(M).
If the new value differs from the received checksum, an error must have occurred somewhere in MM or in the checksum itself.


2 Why checksums work

Accidental corruption (thermal noise, EMI, flaky memory, cosmic rays) changes the pattern of bits.
Most checksum algorithms are designed so that any single-bit error—and, with appropriate design, most multi-bit patterns—yield a different checksum.
Thus a simple equality test flags corruption without having to inspect the entire payload in detail.


3 Common checksum families

FamilyBasic ideaTypical sizeWhere you meet it
ParityXOR of all bits (even/odd)1 bitDRAM, legacy serial links
Modular sumsSum of bytes/words mod 2n2^{n} (Internet Checksum, Fletcher-16/32)8–32 bitsIP, UDP, TCP, embedded firmware
CRC (Cyclic Redundancy Check)Treat message as polynomial; divide by generator polynomial; remainder is checksum16–64 bitsEthernet, SATA, USB, MPEG-2 TS
Adler-32Two running sums of bytes (slightly stronger than simple sum)32 bitszlib streams
Cryptographic hashes (MD5, SHA-256)One-way compression function with avalanche effect128–512 bitsSoftware distribution, digital signatures

Checksum vs. cryptographic hash
Both summarise data, but a checksum is optimised for detecting random errors with minimal overhead, whereas a cryptographic hash is designed to resist intentional tampering (pre-image and collision attacks). Use CRC for noisy channels; use SHA-256 for verifying a software download.


4 How a checksum is calculated — two concrete examples

4.1 Internet Checksum (RFC 1071)

  1. Split the message into 16-bit words.
  2. Add them using one’s-complement arithmetic (end-around carry).
  3. One’s-complement the final sum.

If any single bit flips, the sum at the receiver cannot match, so IP, UDP and TCP discard the packet.

4.2 CRC-32 (Ethernet)

  1. Append 32 zero bits to the frame.
  2. Divide the bitstring by the generator polynomial
    G(x)=x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1G(x)=x^{32}+x^{26}+x^{23}+x^{22}+x^{16}+x^{12}+x^{11}+x^{10}+x^{8}+x^{7}+x^{5}+x^{4}+x^{2}+x+1.
  3. Replace the zeros with the 32-bit remainder (the CRC field).

Any burst error of ≤ 32 bits—or most longer random patterns—changes the remainder, so the receiver’s CRC check fails and the frame is dropped.


5 Strengths and limitations

AspectChecksumCRCCryptographic hash
Detects single-bit errors
Detects many multi-bit patterns⚠︎ (depends)✅ (provably strong up to degree of poly)
Protects against malicious alteration✅ (by design)
Computational costVery lowLow (table-driven)Moderate–high
Checksum size vs. protectionGoodBetterMuch larger

Checksums are excellent for catching noise but not for proving authenticity.


6 Typical workflow in a network protocol

  1. Sender side
    • Build the packet.
    • Compute checksum over certain header + payload fields.
    • Insert checksum into a dedicated header field.
    • Transmit.
  2. Receiver side
    • Recompute checksum over the same fields (checksum field treated as zero if required).
    • Compare with the value in the packet.
    • If mismatch: discard or request retransmission; if match: accept.

This pattern repeats at multiple layers: Ethernet adds a CRC at the frame trailer; IP adds the Internet Checksum in its header; TCP validates its segment payload with another checksum.


A pocket-size checksum example

Goal: detect any single-bit error in a tiny 5-byte message using the simplest possible algorithm—
an 8-bit mod-256 sum (often called “addition with wrap-around”).


7 Simplified example

1 Sender side                     

PlaintextASCII code (decimal)
H72
E69
L76
L76
O79
Sum372
Sum mod 256116
  • We add all five bytes: 72+69+76+76+79=37272+69+76+76+79 = 372.
  • Because we keep only 8 bits, we take 372mod256=116372 \bmod 256 = 116.
  • 116 (0x74) is appended as the checksum.

Transmitted frame: H E L L O 0x74


2 Receiver side                     

  1. Receives H E L L O 0x74.
  2. Recomputes the sum of the first five bytes ⇒ 372 ⇒ 372 mod 256 = 116.
  3. Compares with the received checksum (0x74 = 116).
    • If equal, accept the message.
    • If not equal, discard or request retransmission.

3 Error detection in action

Suppose noise flips one bit so E (0x45) becomes F (0x46).

ByteGood valueCorruptedΔ
E6970+1

New sum = 373373 ⇒ 373mod256=117373 \bmod 256 = 117 (0x75).
But the frame still carries the old checksum 0x74 → mismatch detected; the packet is rejected.


Key points

  • Computation is trivial—just an 8-bit adder.
  • Any single-byte change (and most multi-bit errors) alters the result, so errors are spotted.
  • The stronger the channel noise, the larger and more sophisticated the checksum you choose (e.g., a CRC instead of a simple sum).

 

 

 Selecting a checksum in practice

If your channel is…And your error model is……choose
Low-speed I²C bus on PCBMostly single-bit flips8-bit parity or Fletcher-16
Gigabit Ethernet linkBurst noise up to 64 bitsCRC-32
Firmware update over OTANeeds tamper evidenceSHA-256 with a digital signature
Real-time voice packetsSome loss acceptable; overhead must be tinyUDP with optional 16-bit checksum

Key take-aways

  • A checksum enables fast, lightweight error detection by sending a short summary alongside the data.
  • Simple sums or parities are cost-effective but weaker; CRCs provide mathematically proven detection of burst errors; cryptographic hashes add security against deliberate changes.
  • Modern systems often layer these mechanisms: CRC in the physical link, modular checksum in the transport header, and a hash or MAC in the application payload.
  • Choose the simplest algorithm that meets your error environment and threat model—and always verify it at the receiver before trusting the data.