The big idea
A checksum is a compact numerical summary of a block of data.
By recalculating the checksum after transmission or storage and comparing it with the original value, a system can detect whether any bits flipped—quickly, with very little additional data.
1 What a checksum really is
Given a message viewed as a sequence of bytes or bits, a checksum function produces an integer that accompanies the message:
At the destination the same algorithm recomputes .
If the new value differs from the received checksum, an error must have occurred somewhere in or in the checksum itself.
2 Why checksums work
Accidental corruption (thermal noise, EMI, flaky memory, cosmic rays) changes the pattern of bits.
Most checksum algorithms are designed so that any single-bit error—and, with appropriate design, most multi-bit patterns—yield a different checksum.
Thus a simple equality test flags corruption without having to inspect the entire payload in detail.
3 Common checksum families
| Family | Basic idea | Typical size | Where you meet it |
|---|---|---|---|
| Parity | XOR of all bits (even/odd) | 1 bit | DRAM, legacy serial links |
| Modular sums | Sum of bytes/words mod (Internet Checksum, Fletcher-16/32) | 8–32 bits | IP, UDP, TCP, embedded firmware |
| CRC (Cyclic Redundancy Check) | Treat message as polynomial; divide by generator polynomial; remainder is checksum | 16–64 bits | Ethernet, SATA, USB, MPEG-2 TS |
| Adler-32 | Two running sums of bytes (slightly stronger than simple sum) | 32 bits | zlib streams |
| Cryptographic hashes (MD5, SHA-256) | One-way compression function with avalanche effect | 128–512 bits | Software distribution, digital signatures |
Checksum vs. cryptographic hash
Both summarise data, but a checksum is optimised for detecting random errors with minimal overhead, whereas a cryptographic hash is designed to resist intentional tampering (pre-image and collision attacks). Use CRC for noisy channels; use SHA-256 for verifying a software download.
4 How a checksum is calculated — two concrete examples
4.1 Internet Checksum (RFC 1071)
- Split the message into 16-bit words.
- Add them using one’s-complement arithmetic (end-around carry).
- One’s-complement the final sum.
If any single bit flips, the sum at the receiver cannot match, so IP, UDP and TCP discard the packet.
4.2 CRC-32 (Ethernet)
- Append 32 zero bits to the frame.
- Divide the bitstring by the generator polynomial
. - Replace the zeros with the 32-bit remainder (the CRC field).
Any burst error of ≤ 32 bits—or most longer random patterns—changes the remainder, so the receiver’s CRC check fails and the frame is dropped.
5 Strengths and limitations
| Aspect | Checksum | CRC | Cryptographic hash |
|---|---|---|---|
| Detects single-bit errors | ✅ | ✅ | ✅ |
| Detects many multi-bit patterns | ⚠︎ (depends) | ✅ (provably strong up to degree of poly) | ✅ |
| Protects against malicious alteration | ❌ | ❌ | ✅ (by design) |
| Computational cost | Very low | Low (table-driven) | Moderate–high |
| Checksum size vs. protection | Good | Better | Much larger |
Checksums are excellent for catching noise but not for proving authenticity.
6 Typical workflow in a network protocol
- Sender side
- Build the packet.
- Compute checksum over certain header + payload fields.
- Insert checksum into a dedicated header field.
- Transmit.
- Receiver side
- Recompute checksum over the same fields (checksum field treated as zero if required).
- Compare with the value in the packet.
- If mismatch: discard or request retransmission; if match: accept.
This pattern repeats at multiple layers: Ethernet adds a CRC at the frame trailer; IP adds the Internet Checksum in its header; TCP validates its segment payload with another checksum.
A pocket-size checksum example
Goal: detect any single-bit error in a tiny 5-byte message using the simplest possible algorithm—
an 8-bit mod-256 sum (often called “addition with wrap-around”).
7 Simplified example
1 Sender side
| Plaintext | ASCII code (decimal) |
|---|---|
| H | 72 |
| E | 69 |
| L | 76 |
| L | 76 |
| O | 79 |
| Sum | 372 |
| Sum mod 256 | 116 |
- We add all five bytes: .
- Because we keep only 8 bits, we take .
- 116 (0x74) is appended as the checksum.
Transmitted frame:
H E L L O 0x74
2 Receiver side
- Receives
H E L L O 0x74. - Recomputes the sum of the first five bytes ⇒ 372 ⇒ 372 mod 256 = 116.
- Compares with the received checksum (0x74 = 116).
- If equal, accept the message.
- If not equal, discard or request retransmission.
3 Error detection in action
Suppose noise flips one bit so E (0x45) becomes F (0x46).
| Byte | Good value | Corrupted | Δ |
|---|---|---|---|
| E | 69 | 70 | +1 |
New sum = ⇒ (0x75).
But the frame still carries the old checksum 0x74 → mismatch detected; the packet is rejected.
Key points
- Computation is trivial—just an 8-bit adder.
- Any single-byte change (and most multi-bit errors) alters the result, so errors are spotted.
- The stronger the channel noise, the larger and more sophisticated the checksum you choose (e.g., a CRC instead of a simple sum).
Selecting a checksum in practice
| If your channel is… | And your error model is… | …choose |
|---|---|---|
| Low-speed I²C bus on PCB | Mostly single-bit flips | 8-bit parity or Fletcher-16 |
| Gigabit Ethernet link | Burst noise up to 64 bits | CRC-32 |
| Firmware update over OTA | Needs tamper evidence | SHA-256 with a digital signature |
| Real-time voice packets | Some loss acceptable; overhead must be tiny | UDP with optional 16-bit checksum |
Key take-aways
- A checksum enables fast, lightweight error detection by sending a short summary alongside the data.
- Simple sums or parities are cost-effective but weaker; CRCs provide mathematically proven detection of burst errors; cryptographic hashes add security against deliberate changes.
- Modern systems often layer these mechanisms: CRC in the physical link, modular checksum in the transport header, and a hash or MAC in the application payload.
- Choose the simplest algorithm that meets your error environment and threat model—and always verify it at the receiver before trusting the data.