Checksums | Computer Science KB

The big idea

A checksum is a compact numerical summary of a block of data.
By recalculating the checksum after transmission or storage and comparing it with the original value, a system can detect whether any bits flipped—quickly, with very little additional data.

1 What a checksum really is

Given a message $M$ viewed as a sequence of bytes or bits, a checksum function $C(\cdot)$ produces an integer that accompanies the message:

\text{transmit } \bigl(M,\; C(M)\bigr)

At the destination the same algorithm recomputes $C(M)$ .
If the new value differs from the received checksum, an error must have occurred somewhere in $M$ or in the checksum itself.

2 Why checksums work

Accidental corruption (thermal noise, EMI, flaky memory, cosmic rays) changes the pattern of bits.
Most checksum algorithms are designed so that any single-bit error—and, with appropriate design, most multi-bit patterns—yield a different checksum.
Thus a simple equality test flags corruption without having to inspect the entire payload in detail.

3 Common checksum families

Family	Basic idea	Typical size	Where you meet it
Parity	XOR of all bits (even/odd)	1 bit	DRAM, legacy serial links
Modular sums	Sum of bytes/words mod $2^{n}$ (Internet Checksum, Fletcher-16/32)	8–32 bits	IP, UDP, TCP, embedded firmware
CRC (Cyclic Redundancy Check)	Treat message as polynomial; divide by generator polynomial; remainder is checksum	16–64 bits	Ethernet, SATA, USB, MPEG-2 TS
Adler-32	Two running sums of bytes (slightly stronger than simple sum)	32 bits	zlib streams
Cryptographic hashes (MD5, SHA-256)	One-way compression function with avalanche effect	128–512 bits	Software distribution, digital signatures

Checksum vs. cryptographic hash
Both summarise data, but a checksum is optimised for detecting random errors with minimal overhead, whereas a cryptographic hash is designed to resist intentional tampering (pre-image and collision attacks). Use CRC for noisy channels; use SHA-256 for verifying a software download.

4 How a checksum is calculated — two concrete examples

4.1 Internet Checksum (RFC 1071)

Split the message into 16-bit words.
Add them using one’s-complement arithmetic (end-around carry).
One’s-complement the final sum.

If any single bit flips, the sum at the receiver cannot match, so IP, UDP and TCP discard the packet.

4.2 CRC-32 (Ethernet)

Append 32 zero bits to the frame.
Divide the bitstring by the generator polynomial
$G(x)=x^{32}+x^{26}+x^{23}+x^{22}+x^{16}+x^{12}+x^{11}+x^{10}+x^{8}+x^{7}+x^{5}+x^{4}+x^{2}+x+1$ .
Replace the zeros with the 32-bit remainder (the CRC field).

Any burst error of ≤ 32 bits—or most longer random patterns—changes the remainder, so the receiver’s CRC check fails and the frame is dropped.

5 Strengths and limitations

Aspect	Checksum	CRC	Cryptographic hash
Detects single-bit errors	✅	✅	✅
Detects many multi-bit patterns	⚠︎ (depends)	✅ (provably strong up to degree of poly)	✅
Protects against malicious alteration	❌	❌	✅ (by design)
Computational cost	Very low	Low (table-driven)	Moderate–high
Checksum size vs. protection	Good	Better	Much larger

Checksums are excellent for catching noise but not for proving authenticity.

6 Typical workflow in a network protocol

Sender side
- Build the packet.
- Compute checksum over certain header + payload fields.
- Insert checksum into a dedicated header field.
- Transmit.
Receiver side
- Recompute checksum over the same fields (checksum field treated as zero if required).
- Compare with the value in the packet.
- If mismatch: discard or request retransmission; if match: accept.

This pattern repeats at multiple layers: Ethernet adds a CRC at the frame trailer; IP adds the Internet Checksum in its header; TCP validates its segment payload with another checksum.

A pocket-size checksum example

Goal: detect any single-bit error in a tiny 5-byte message using the simplest possible algorithm—
an 8-bit mod-256 sum (often called “addition with wrap-around”).

7 Simplified example

1 Sender side

Plaintext	ASCII code (decimal)
H	72
E	69
L	76
L	76
O	79
Sum	372
Sum mod 256	116

We add all five bytes: $72+69+76+76+79 = 372$ .
Because we keep only 8 bits, we take $372 \bmod 256 = 116$ .
116 (0x74) is appended as the checksum.

Transmitted frame: H E L L O 0x74

2 Receiver side

Receives H E L L O 0x74.
Recomputes the sum of the first five bytes ⇒ 372 ⇒ 372 mod 256 = 116.
Compares with the received checksum (0x74 = 116).
- If equal, accept the message.
- If not equal, discard or request retransmission.

3 Error detection in action

Suppose noise flips one bit so E (0x45) becomes F (0x46).

Byte	Good value	Corrupted	Δ
E	69	70	+1

New sum = $373$ ⇒ $373 \bmod 256 = 117$ (0x75).
But the frame still carries the old checksum 0x74 → mismatch detected; the packet is rejected.

Key points

Computation is trivial—just an 8-bit adder.
Any single-byte change (and most multi-bit errors) alters the result, so errors are spotted.
The stronger the channel noise, the larger and more sophisticated the checksum you choose (e.g., a CRC instead of a simple sum).

Selecting a checksum in practice

If your channel is…	And your error model is…	…choose
Low-speed I²C bus on PCB	Mostly single-bit flips	8-bit parity or Fletcher-16
Gigabit Ethernet link	Burst noise up to 64 bits	CRC-32
Firmware update over OTA	Needs tamper evidence	SHA-256 with a digital signature
Real-time voice packets	Some loss acceptable; overhead must be tiny	UDP with optional 16-bit checksum

Key take-aways

A checksum enables fast, lightweight error detection by sending a short summary alongside the data.
Simple sums or parities are cost-effective but weaker; CRCs provide mathematically proven detection of burst errors; cryptographic hashes add security against deliberate changes.
Modern systems often layer these mechanisms: CRC in the physical link, modular checksum in the transport header, and a hash or MAC in the application payload.
Choose the simplest algorithm that meets your error environment and threat model—and always verify it at the receiver before trusting the data.