A1.1.8 Describe the concept of compression.

• The differences between lossy compression methods and lossless compression methods

• Run-length encoding and transform coding

📚 You can find additional information in the course companion pages 28 to 31

The Big Idea

Compression is a fundamental technique in computer science that reduces the size of data to save storage space or decrease transmission time. It works by removing redundancy, either reversibly or irreversibly, depending on the method used.

Compression is essential in areas such as multimedia (images, audio, video), data transmission, file storage, and archival.

1. What Is Compression?

Compression refers to encoding information using fewer bits than the original representation. The goal is to reduce:

File size (e.g., MB to KB)
Bandwidth usage during transmission
Storage requirements

The output of compression is a compressed file or data stream, which can be decompressed later to reconstruct the original data (either exactly or approximately).

2. Lossless vs. Lossy Compression

2.1 Lossless Compression

Lossless compression allows the exact original data to be reconstructed perfectly after decompression. No information is lost.

Characteristics:

Reversible
Used when data integrity is critical (e.g., text files, executable programs, databases)

Common Methods:

Run-Length Encoding (RLE)
Huffman Coding
Lempel-Ziv (used in ZIP, PNG)

Example:

Original: AAAAABBBCCDAA
Compressed with RLE: 5A3B2C1D2A

2.2 Lossy Compression

Lossy compression removes some data permanently, usually data that is considered redundant or imperceptible to human senses.

Characteristics:

Irreversible
Achieves higher compression ratios
Used in media files (audio, image, video) where perfect accuracy is not required

Common Methods:

Transform Coding (e.g., DCT in JPEG)
Quantization
Psychoacoustic models in MP3

Example:

An image compressed with JPEG might discard color precision or fine details the human eye won’t notice.

3. Run-Length Encoding (RLE) – Lossless

Run-Length Encoding is a simple lossless compression method used when data contains consecutive repeated values (runs).

How it works:

Replace repeated elements with a single value and a count.

Example:

Original binary string: 00000011110000
Compressed: 6×0, 4×1, 4×0 → Represented as (0,6)(1,4)(0,4)

Best for:

Data with long repeated sequences
Simple images (e.g., black-and-white bitmaps)
Text with repeated characters

Limitations:

Inefficient on data with high variability (e.g., random data or noisy signals)

4. Transform Coding – Lossy

Transform coding is the basis of most modern lossy compression algorithms, especially for images and audio.

How it works:

Transform the data from the time/spatial domain to the frequency domain.
- E.g., apply the Discrete Cosine Transform (DCT) in JPEG images.
Quantize the frequency components (round or remove small values).
Encode the quantized values using entropy coding (e.g., Huffman).

Example in JPEG:

An image is divided into 8×8 blocks.
Each block is transformed using DCT to get frequency coefficients.
Coefficients representing fine details (high-frequency) are often quantized to zero.
The result is high compression with acceptable loss of visual quality.

Best for:

Images (JPEG)
Audio (MP3)
Video (MPEG, H.264)

Trade-offs:

Compression ratio vs. perceptual quality
Loss of fine detail and possible visible artifacts

Summary Table

Compression Type	Method	Reversible	Best For	Example Formats
Lossless	Run-Length Encoding	Yes	Text, binary data	PNG, ZIP, FLAC
Lossless	Huffman / LZ	Yes	General files	DEFLATE (ZIP), PNG
Lossy	Transform Coding	No	Images, audio, video	JPEG, MP3, MPEG

Conclusion

Compression is an important tool in managing digital data efficiently. The distinction between lossless and lossy compression determines whether original data can be perfectly reconstructed. Run-Length Encoding offers a simple method for lossless scenarios, while transform coding underpins powerful lossy formats like JPEG and MP3. Understanding when and how to use each method enables better design of storage systems, transmission protocols, and multimedia applications.