A1.2.2 Explain how binary is used to store data.

• The fundamentals of binary encoding and the impact on data storage and retrieval

• The mechanisms by which data such as integers, strings, characters, images, audio and video are stored in binary form

The big idea

Every datum in a computer—whether it looks to us like a number, a paragraph of text, or a high-definition movie—is ultimately just an ordered sequence of binary digits (bits). A bit can hold only two states (0 or 1), but by grouping bits into larger units (bytes, words, blocks) and applying well-defined encodings we can represent any information, store it on physical media, transmit it, and reconstruct it later. The choice of encoding directly determines how much space the data occupies, how fast it can be moved, and what fidelity is preserved when it is read back.

1 Fundamentals of binary encoding and its impact

Concept	What it means	Why it matters
Bit	Smallest addressable state (0/1).	Everything else is built from bits.
Byte (8 bits)	Smallest conventional unit addressable by most hardware.	File sizes, memory addresses, and buses are byte-oriented.
Word	Natural unit for the CPU (32, 64 bits).	Affects instruction set width and data-alignment rules.
Encoding rule	Agreed mapping between patterns of bits and an abstraction (e.g. “01000001” ↔ A).	Without the rule, the bits are meaningless noise.
Metadata	Extra bits that describe how to interpret the payload (type, length, timestamps, checksums).	Guarantees correct decoding and integrity.

Impact on storage & retrieval

Capacity: A higher-precision encoding (e.g. 24-bit colour vs 8-bit) multiplies required space.
Throughput: Wider buses and parallelism move more bits per clock.
Latency: Complex encodings (e.g. compressed video) trade space efficiency for extra CPU time during decode.
Compatibility: Interoperability depends on all parties agreeing on the same binary layout (endianness, character set, container format).

2 How common data types are stored in binary

2.1 Integers

Representation

Unsigned: pure binary magnitude.
Example (8-bit): 0011 0101₂ = 53₁₀.
Signed (two’s-complement): high bit = sign; range −2ⁿ⁻¹ … 2ⁿ⁻¹−1.
Example (8-bit): 1111 0101₂ represents −11₁₀.

Impact: Bit-width fixes the numeric range; overflow on addition is a consequence of the finite pattern space.

There is an additional article about signed and unsigned integers at this link.

2.2 Characters and strings

Encoding sets

ASCII (7 bits): English letters/symbols. 'A' = 0x41 = 0100 0001₂.
Unicode: assigns a code-point to every written symbol worldwide.
UTF-8 encodes those points in 1–4 bytes; ASCII bytes stay unchanged, keeping legacy text readable.

Storage layout

Length-prefixed ([len][data…]) – fast random access.
Null-terminated ([data…][0]) – C-style strings; length must be re-scanned.

2.3 Images (please reference this article on images)

Raw bitmap

Header (width, height, bit-depth) | Pixel 0 | Pixel 1 | …

Each pixel might be 24 bits: 8 bits red, green, blue.
Example: A 1920 × 1080 RGB image needs
1920 × 1080 × 24 = 49 766 400 bits ≈ 5.94 MiB.

Compressed formats (PNG, JPEG) convert blocks of pixels into frequency coefficients, then entropy-code them—dramatically shrinking size at the cost of CPU time (and for JPEG, some quality).

2.4 Audio

Pulse-code modulation (PCM)
Parameters: sampling rate fₛ, bit-depth b, channels c.
For CD audio fₛ = 44 100 Hz, b = 16, c = 2:

bits per second = fₛ × b × c
                = 44 100 × 16 × 2
                = 1 411 200 bit/s ≈ 176 KiB/s

Samples are stored interleaved: L₀, R₀, L₁, R₁, …

Compressed codecs (MP3, AAC, Opus) transform blocks to frequency space and psycho-acoustically quantise, reaching 90 %+ savings.

Pulse-Code Modulation (PCM)

Uncompressed digital audio format where an analog waveform is discretised by sampling and then quantised into fixed-width binary integers.

Sampling Rate (fₛ)

Number of samples taken per second (Hz). Determines the temporal resolution of the signal.
CD audio: 44,100 samples/s.

Bit Depth (b)

Number of bits per sample. Determines quantisation resolution and signal-to-quantisation-noise ratio.
CD audio: 16 bits/sample.

Channels (c)

Independent audio streams stored in parallel (e.g., mono = 1, stereo = 2).
CD audio: 2 channels.

Bits per Second (bit/s)

Total binary storage rate:

CD example:

Interleaving (L₀, R₀, L₁, R₁, …)

PCM stores multi-channel samples alternating by channel to maintain sample-accurate alignment and simplify buffer reads.

Compressed Codecs (MP3, AAC, Opus)

Lossy encoders that take PCM blocks, apply a frequency-domain transform, and quantise according to a psychoacoustic model to remove masked/inaudible components. Achieves ≈90% storage reduction relative to raw PCM.

“Transform to Frequency Space”

Application of transforms (e.g., MDCT) to convert time-domain PCM samples into spectral coefficients that expose frequency-localized structure suitable for perceptual analysis and selective quantisation.

“Psycho-acoustically Quantise”

Non-uniform, perceptually weighted quantisation guided by auditory masking thresholds. Precision is reduced where the ear cannot detect error (masked frequencies, low-sensitivity bands), enabling large bit-rate reductions without proportional perceptual loss.

2.5 Video

A video file = container + stream(s)

Frames (bitmaps)
Frame types:
I-frame (self-contained), P/B-frames (store only differences).
Codec (H.264, AV1) orchestrates block transforms, motion vectors, entropy coding.

The binary stream is timestamped so the playback engine reconstructs each frame in order and at the correct rate (e.g. 30 fps ≈ one frame every 33 ms).

3 Worked mini-examples

Data item	Binary view	Size implication
Integer 300₁₀	0000 0001 0010 1100₂ (16-bit unsigned)	Needs 2 bytes; 8-bit would overflow.
Word “Hi” (UTF-8)	0x48 0x69	2 bytes; UTF-16 would use 4 bytes (0×00 48 0×00 69).
One pixel, RGBA (255, 128, 0, 255)	1111 1111 1000 0000 0000 0000 1111 1111₂	32 bits.
0.01 s of CD audio	17 600 samples × 16 bits × 2 ≈ 0.352 Mbit	≈ 44 KiB.
Single 1080p I-frame, 4🔢0 JPEG-like	~150 kB compressed vs 6 MiB raw	Compression ≈ 25× smaller.

Take-away

Bits are the universal currency of digital information.
Carefully chosen binary encodings trade off space, speed, and fidelity.
Interpreting stored data correctly demands that software consult metadata and apply the exact inverse transformation of whatever encoding was used to write it.

Grasp these binary mechanisms now, and later topics—file systems, compilers, multimedia processing, networking protocols—will make much more sense.