B2.5.1 Construct code to perform file-processing operations.

B2.5.1 Construct code to perform file-processing operations. 
• Programs that manipulate text files 
• Opening a sequential file in various modes (read, write, append) 
• How to read from and write to files, append data to an existing file, and close a file once operations are completed 
• Classes for Java users may include Scanner, FileWriter, BufferedReader. 
• Functions for Python users may include open(), read(), readline(), write(), close()

 

The Big Idea

File processing is the mechanism by which programs persist data across executions by reading from and writing to files stored on disk. Unlike variables, which disappear when a program ends, file data remains. This makes file processing critical in real-world applications such as log files, configuration storage, databases, and document systems.

The command term here is “Construct.” This means students must not only describe but also write working code that opens, reads, writes, and closes files.

  • A weak answer would say “.read() gets the contents of a file.”
  • A strong answer would construct a Python program that demonstrates .read(), shows how the file pointer moves, and handles EOF correctly.

Sequential Access and File Pointers

When you open a file, the operating system provides a file pointer.

  • This pointer marks your current position in the file.
  • Every read or write advances the pointer sequentially unless you explicitly move it (using seek()).
  • Reading always starts at the beginning unless the pointer is moved.

For example, after calling .readline() twice, the pointer is now positioned at the start of the third line. If you call .read(), it continues from that position, not from the top.


End-of-File (EOF)

The special condition EOF (end-of-file) signals when no more data can be read.

  • Functions like .readline() return an empty string ("") at EOF.
  • EOF is not a character in the file but a logical state in the operating system.
  • Good code checks for EOF to avoid infinite loops.

Bytes and Buffer Size

Reading and writing are measured in bytes, the smallest unit of storage.

  • A character like "A" typically occupies 1 byte (in ASCII), while "ą" may take multiple bytes (in UTF-8).
  • Methods like .read(n) allow you to specify at most n bytes to read in one call.
  • The system buffer determines how much data can be moved at once. Reading very large files in one go can consume a lot of memory, so chunked reads are often more efficient.
  • Please carefully read this article to understand buffers and systems buffers.

Example:

with open("largefile.txt", "rb") as file:  # binary mode
    chunk = file.read(1024)  # read up to 1024 bytes
    while chunk:
        process(chunk)
        chunk = file.read(1024)

Here the file is read in sequential 1024-byte chunks until EOF is reached.


File Modes in Python

ModeDescription
'r'Read (file must exist)
'w'Write (overwrite if exists, else create new)
'a'Append (write to end, create if missing)
'rb', 'wb'Binary mode (raw bytes, not text)

Examples

Example 1: Writing to a File (w mode)

file = open("names.txt", "w")
file.write("Alice\n")
file.write("Bob\n")
file.write("Charlie\n")
file.close()

Example 2: Reading Line by Line

file = open("names.txt", "r")
for line in file:
    print(line.strip())
file.close()
  • File pointer starts at the top.
  • Each iteration reads sequentially until EOF.

Example 3: Appending to a File (a mode)

file = open("names.txt", "a")
file.write("Diana\n")
file.write("Eli\n")
file.close()
  • Pointer moves directly to the end.
  • Existing content is preserved.

Example 4: Reading with .readline() and .read()

file = open("names.txt", "r")

first = file.readline()  # reads up to newline
print("First line:", first.strip())

rest = file.read()  # reads remaining bytes until EOF
print("Rest of file:\n", rest.strip())

file.close()

Best Practices

  • Always close files using .close() or use the safer with open(...) as f: syntax, which ensures the file closes even if an error occurs.
  • Be mindful of memory: avoid .read() on very large files—use .read(n) instead.
  • Understand that all reads/writes are sequential unless you reposition the pointer with seek().

Summary

  • Pointers track your current position in the file.
  • Sequential access means reading/writing progresses linearly unless you move the pointer.
  • EOF signals the logical end of data.
  • Bytes and buffer sizes define how much data can be read in a single operation.
  • Strong HL answers will construct working code that demonstrates these principles under exam conditions.