A1.1.8 Describe the concept of compression.
• The differences between lossy compression methods and lossless compression methods
• Run-length encoding and transform coding
📚 You can find additional information in the course companion pages 28 to 31
The Big Idea
Compression is a fundamental technique in computer science that reduces the size of data to save storage space or decrease transmission time. It works by removing redundancy, either reversibly or irreversibly, depending on the method used.
Compression is essential in areas such as multimedia (images, audio, video), data transmission, file storage, and archival.
1. What Is Compression?
Compression refers to encoding information using fewer bits than the original representation. The goal is to reduce:
- File size (e.g., MB to KB)
- Bandwidth usage during transmission
- Storage requirements
The output of compression is a compressed file or data stream, which can be decompressed later to reconstruct the original data (either exactly or approximately).
2. Lossless vs. Lossy Compression
2.1 Lossless Compression
Lossless compression allows the exact original data to be reconstructed perfectly after decompression. No information is lost.
Characteristics:
- Reversible
- Used when data integrity is critical (e.g., text files, executable programs, databases)
Common Methods:
- Run-Length Encoding (RLE)
- Huffman Coding
- Lempel-Ziv (used in ZIP, PNG)
Example:
Original: AAAAABBBCCDAA
Compressed with RLE: 5A3B2C1D2A
2.2 Lossy Compression
Lossy compression removes some data permanently, usually data that is considered redundant or imperceptible to human senses.
Characteristics:
- Irreversible
- Achieves higher compression ratios
- Used in media files (audio, image, video) where perfect accuracy is not required
Common Methods:
- Transform Coding (e.g., DCT in JPEG)
- Quantization
- Psychoacoustic models in MP3
Example:
An image compressed with JPEG might discard color precision or fine details the human eye won’t notice.
3. Run-Length Encoding (RLE) – Lossless
Run-Length Encoding is a simple lossless compression method used when data contains consecutive repeated values (runs).
How it works:
- Replace repeated elements with a single value and a count.
Example:
Original binary string: 00000011110000
Compressed: 6×0, 4×1, 4×0 → Represented as (0,6)(1,4)(0,4)
Best for:
- Data with long repeated sequences
- Simple images (e.g., black-and-white bitmaps)
- Text with repeated characters
Limitations:
- Inefficient on data with high variability (e.g., random data or noisy signals)
4. Transform Coding – Lossy
Transform coding is the basis of most modern lossy compression algorithms, especially for images and audio.
How it works:
- Transform the data from the time/spatial domain to the frequency domain.
- E.g., apply the Discrete Cosine Transform (DCT) in JPEG images.
- Quantize the frequency components (round or remove small values).
- Encode the quantized values using entropy coding (e.g., Huffman).
Example in JPEG:
- An image is divided into 8×8 blocks.
- Each block is transformed using DCT to get frequency coefficients.
- Coefficients representing fine details (high-frequency) are often quantized to zero.
- The result is high compression with acceptable loss of visual quality.
Best for:
- Images (JPEG)
- Audio (MP3)
- Video (MPEG, H.264)
Trade-offs:
- Compression ratio vs. perceptual quality
- Loss of fine detail and possible visible artifacts
Summary Table
| Compression Type | Method | Reversible | Best For | Example Formats |
|---|---|---|---|---|
| Lossless | Run-Length Encoding | Yes | Text, binary data | PNG, ZIP, FLAC |
| Lossless | Huffman / LZ | Yes | General files | DEFLATE (ZIP), PNG |
| Lossy | Transform Coding | No | Images, audio, video | JPEG, MP3, MPEG |
Conclusion
Compression is an important tool in managing digital data efficiently. The distinction between lossless and lossy compression determines whether original data can be perfectly reconstructed. Run-Length Encoding offers a simple method for lossless scenarios, while transform coding underpins powerful lossy formats like JPEG and MP3. Understanding when and how to use each method enables better design of storage systems, transmission protocols, and multimedia applications.