The debugging process

Big Idea

Debugging is not guessing. It is a process of forming a hypothesis about where the bug is, gathering evidence, and narrowing the search until the cause is certain. Programmers who guess — changing things at random until the program stops crashing — take longer, introduce new bugs, and do not learn anything. Programmers who follow a systematic process find bugs faster, even in code they have never seen before.

This article is part of the Debugging section. It assumes you can read Python error messages and know how to use print() statements and the VS Code debugger as investigation tools. This article is about the process that tells you where to point those tools.

LevelWhat it coversWhen to read it
Basic ideaThe core debugging mindset; the difference between a symptom and a cause; the three-step processRight now.
At a deeper levelSpecific strategies: split-half search, isolating components, minimal reproduction, rubber duck debuggingOnce you have tried the three-step process on your own program.
At the deepest levelHow professional programmers think about bugs; scientific method applied to code; when to stop and rewriteWhen the strategies feel familiar and you want to think more deeply about what debugging is.

1   Symptoms vs Causes

Basic Idea

Every bug has a symptom and a cause. The symptom is what you observe — the wrong output, the crash, the missing data. The cause is the specific line or decision in your code that produced the symptom.

The symptom and the cause are almost never the same thing. A program that crashes on line 42 may have a cause on line 7, where a variable was given a bad value that only became a problem 35 lines later. Fixing line 42 without understanding why it crashed does not fix the bug — it just moves the symptom.

The entire debugging process is about moving from the symptom you can see to the cause you cannot see yet.

At a Deeper Level

Here are some concrete examples of how far apart a symptom and its cause can be:

SymptomWhere it appearsLikely causeWhere the cause is
TypeError adding values in a loopInside the loopinput() result not converted to int()The line that read user input, before the loop
Final total is wrong by exactly one itemThe print() at the endThe loop starts at index 1 instead of index 0The loop header
KeyError reading from a dictionaryThe dictionary readThe key was never added, or was added with a different spellingEarlier in the program where the dictionary was built
Function returns None instead of a valueWhere the return value is usedThe function uses print() instead of returnInside the function definition
List is empty when it should have itemsWhere the list is readItems were appended to a different variable, or the list was reset inside the loopWhere the list was being built
Never fix the symptom — fix the cause

A symptom-fix feels fast: the error message disappears, the program runs. But if you have not found the cause, you have only hidden the problem. It will reappear, often in a different form, when the program is used with different data or in a different situation. The test of a good fix is that you can explain exactly what was wrong and why your change addresses it.

At the Deepest Level

The distinction between symptom and cause reflects a deeper idea in systems thinking: the point where a failure becomes visible is not necessarily the point where the failure originated. Complex systems — and programs with multiple functions, data structures, and control flows qualify as complex systems — can propagate bad state over many steps before anything observable goes wrong.

This is why large software projects use assertions — statements that check that a condition must be true at a specific point in the program, and crash immediately with a clear message if it is not. Assertions turn latent bad state into immediate, local errors. When bad data is caught at the point where it is created rather than ten steps later where it causes a problem, the symptom and the cause are much closer together, and debugging becomes dramatically easier.

2   The Three-Step Process

Basic Idea

Every debugging session follows the same three steps. The specific tools change — error messages, print statements, the debugger, trace tables — but the process does not.

  1. Reproduce the bug reliably. Make the bug happen on purpose, every time. If you cannot reproduce it, you cannot debug it. Note exactly what inputs or actions cause it.
  2. Locate the cause. Find the specific line or decision where the program first does the wrong thing. Not where it crashes — where the bad state originated. Use evidence to narrow the search.
  3. Fix and verify. Make one change. Explain why that change addresses the cause. Confirm the bug is gone. Confirm you have not introduced a new bug.

Steps 1 and 3 are often skipped by beginners. Step 1 is skipped because the bug seems obvious. Step 3 is skipped because the program stopped crashing. Both skips cause problems.

At a Deeper Level

Why step 1 matters. A bug you cannot reproduce consistently is a bug you cannot trust that you have fixed. Before doing anything else, find a specific input — a particular value, a particular sequence of actions — that causes the bug every single time. Write it down. Use it as your test case throughout the debugging session. If your fix makes the bug disappear with one input but the bug was never reliably caused by that input in the first place, you do not know whether you fixed anything.

Why step 3 matters. Once a fix is in place, test it with the input that originally caused the bug. Then test it with other inputs — including edge cases and inputs you did not try before. A fix that works for one test case but breaks another is not a fix; it is a trade. Sometimes fixes also introduce regressions — new bugs in parts of the program that previously worked. Running a few basic checks after any fix catches regressions before they compound.

Write down your test case before you start debugging

Before touching any code, write down: the exact input that causes the bug, what you expected the program to do, and what it actually did. Keep this written down. It is easy to lose track of these details once you are deep in the code. A written test case also means you can check Step 3 rigorously rather than relying on the feeling that it seems to work now.

At the Deepest Level

The three-step process is a simplified version of the scientific method applied to software. The inputs that cause the bug are the experimental conditions. Your hypothesis about the cause is the theory. The fix is the intervention. The verification is the experiment. If the bug is gone after the fix, the hypothesis was correct. If not, the hypothesis was wrong and you form a new one.

Professional software engineers who are unusually good at debugging are often described as thinking like scientists: they do not trust intuition alone, they treat the program as a system to be observed, and they form and test explicit hypotheses. The strategies in the rest of this article are tools for forming better hypotheses faster.

3   Strategy: Split-Half Search

Basic Idea

When you do not know where a bug is, the split-half strategy finds it in the fewest possible steps. The idea is simple: check the midpoint of your program. If the data is correct there, the bug is in the second half. If the data is wrong there, the bug is in the first half. Repeat on the half that contains the bug. Each check cuts the search space in half.

In practice: add a print() statement or set a breakpoint in the middle of your program. Check whether the relevant variables look correct at that point. If they do, the bug is below. If they do not, the bug is above. Move your check to the middle of the remaining half and repeat.

A 100-line program requires at most 7 checks to locate the bug to within a single line: 100 → 50 → 25 → 13 → 7 → 4 → 2 → 1.

At a Deeper Level

Here is the strategy applied to a concrete example. The program below is supposed to read a list of scores, double any score below 50, then print the adjusted list. The final output is wrong but you do not know where the problem starts.

scores = [80, 45, 90, 30, 70, 20, 65]

# ---- STEP 1: filter ----
passing = []
for s in scores:
    if s >= 40:
        passing.append(s)

# ---- STEP 2: double low scores ----
adjusted = []
for s in passing:
    if s < 50:
        adjusted.append(s * 2)
    else:
        adjusted.append(s)

# ---- STEP 3: sort ----
adjusted.sort()

print(adjusted)

The output is wrong. You do not know whether the problem is in the filter, the doubling, or the sort. Apply split-half:

Check 1 — midpoint, after the filter:

print("After filter:", passing)

Output: After filter: [80, 45, 90, 70, 65]. The score 30 is gone (correctly filtered), but 20 is also gone. It should not be — the filter should only remove scores below 40, and 20 is below 40 so it is correctly removed. Actually: 20 is below 40, so removing it is correct. 30 is below 40, so removing it is also correct. The filter looks right. The bug is in the second half — either the doubling or the sort.

Check 2 — midpoint of remaining half, after the doubling:

print("After doubling:", adjusted)

Output: After doubling: [90, 90, 80, 70, 65]. The score 45 should have been doubled to 90 — and there are two 90s. But the original list also had 90 in it. Wait: 45 doubled is 90, and the existing 90 is also in the list. Both correctly appear. But — the condition is s < 50 and 45 is below 50, so 45 is doubled to 90. That is correct. But 45 itself has disappeared from adjusted and been replaced by 90. That is also correct — that is what doubling means. The doubling looks right.

Actually looking more carefully: the expected output given the corrected logic would be [65, 70, 80, 90, 90]. But the output was different. Let us re-examine: adjusted before sorting shows [90, 90, 80, 70, 65] — that is five items. After adjusted.sort() that becomes [65, 70, 80, 90, 90]. If this is wrong, the error must be in what the programmer intended — perhaps they wanted to keep 45 as-is and add the doubled value as a bonus, not replace it. The bug is a logic error in the specification of Step 2, not a Python error.

Split-half located the problem in Step 2 in two checks, without reading all the code in detail.

Split-half works on any linear sequence of steps

The strategy applies whenever your program processes data through a sequence of stages. Each stage either transforms the data correctly or introduces the error. Check the midpoint, eliminate half, repeat. It works equally well with print statements, the debugger, or even just reading the code carefully at the midpoint. The halving is the technique, not the tool.

At the Deepest Level

Split-half search is a direct application of binary search — one of the most fundamental algorithms in computer science, typically introduced as a way to find a value in a sorted list. The same O(log n) efficiency that makes binary search fast for finding values makes split-half debugging fast for finding bugs: the number of checks required grows only with the logarithm of the program's length, not linearly.

Version control systems make a formal version of this available: git bisect automates binary search through a project's commit history to find the exact commit that introduced a bug. You mark a known-good commit and a known-bad commit, and git bisect checks out the midpoint commit and asks whether the bug is present. You answer yes or no, and it halves the range. It continues until it identifies the single commit where the bug was introduced. This is split-half debugging at the level of the project's history rather than a single file.

4   Strategy: Isolate the Smallest Failing Case

Basic Idea

When a bug appears in a large, complex program, it can be hard to see what is causing it because there is too much happening at once. The isolation strategy is to strip away everything that is not necessary to reproduce the bug. What is the smallest, simplest version of the program that still shows the wrong behaviour?

Remove features. Simplify inputs. Replace real data with tiny made-up data. Comment out everything that is not directly involved in the bug. Keep doing this until you have a short program — ideally five to fifteen lines — that still produces the bug. At that scale, the cause is usually obvious.

At a Deeper Level

Here is an example of isolation. A student has a 60-line program that calculates student grades and generates a report. The report totals are wrong. Instead of debugging the entire program, they isolate the calculation:

# Full program (60 lines) - too much to reason about at once

# Isolated version (12 lines) - just the calculation
scores = [85, 90, 78]
weights = [0.3, 0.4, 0.3]

total = 0
for i in range(len(scores)):
    total = total + scores[i] * weights[i]

print("Weighted total:", total)
print("Expected:", 85*0.3 + 90*0.4 + 78*0.3)

Running this small version with known inputs makes the calculation visible in isolation. Either the output matches the expected value — in which case the bug is elsewhere in the 60-line program — or it does not, in which case the bug is in this calculation and can now be diagnosed with only 12 lines to read.

Isolation also has a secondary benefit: it forces you to think clearly about what the code is supposed to do. Writing the expected value explicitly — 85*0.3 + 90*0.4 + 78*0.3 — makes it possible to compare intended behaviour with actual behaviour directly.

A new file is useful for isolation

When isolating a bug, it sometimes helps to create a new file — call it test_calc.py or similar — rather than commenting out parts of your main file. Copy only the relevant section, add test inputs at the top, add print statements to check the output, and run it independently. You can delete the file when the bug is found. This approach keeps your main file intact while you experiment freely.

At the Deepest Level

The isolation process is the manual equivalent of what professional developers call a minimal reproducible example (often abbreviated MRE or MCVE — Minimal Complete Verifiable Example). When filing a bug report against a library or framework, developers are expected to produce the shortest possible program that demonstrates the bug — stripping out all business logic, real data, and unrelated features. This is required not just to help others understand the bug, but because the act of reducing to a minimal case often reveals the cause before anyone else even looks at it.

Stack Overflow, the most widely used programmer question-and-answer site, requires minimal reproducible examples for debugging questions. Questions that include sixty lines of context rarely get answered; questions that isolate the problem to ten lines almost always do. The ability to isolate and simplify is valued as much as the ability to code.

5   Strategy: Check Your Assumptions

Basic Idea

Most bugs live in the gap between what you assume is true and what is actually true. You assume a variable contains a number — it contains a string. You assume a list has three items — it is empty. You assume a function returns a value — it returns None.

When a bug is resisting your other strategies, stop and list your assumptions explicitly. Write them down. Then check every single one against the evidence. The bug is almost always in an assumption you considered too obvious to question.

At a Deeper Level

Here is a process for checking assumptions systematically. Before looking at the code, write down your assumptions about the state of the program at the moment of the bug:

  1. What type is each variable involved? (Integer? String? List? None?)
  2. What value does each variable contain? (The exact value — not "a number around 10" but the specific number.)
  3. Has the function I am calling been tested on its own? Does it actually work?
  4. Is the condition in the if statement what I think it is? Have I evaluated it manually?
  5. Have I confirmed the data coming in — from user input, from a list, from a function call — is what I expect?

Then check each assumption using a print statement, the debugger, or a trace. Any assumption you cannot confirm from evidence is a potential location for the bug.

The most dangerous assumptions are the ones that seem most obvious

Programmers routinely spend an hour debugging a program before discovering that a variable they were certain contained an integer actually contained a string — because they forgot that input() always returns strings. The more obvious an assumption seems, the less likely you are to check it, and the longer it takes to find if it is wrong. When you are stuck, systematically verify your most basic assumptions first.

At the Deepest Level

Checking assumptions is the rationale behind type annotations and static type checkers like mypy in Python. Type annotations let you declare what type a variable or function parameter is expected to be, and a type checker can then verify these declarations without running the code. Many assumption-based bugs — passing a string where an integer was expected, calling a method that does not exist on a particular type — are caught by the type checker before the program ever runs.

This is a preview of a larger idea in software engineering: the earlier in the process a bug is caught, the cheaper it is to fix. Catching a type mismatch with a type checker before running is cheaper than catching it with a test case. Catching it with a test case is cheaper than catching it in production when real users are affected.

6   Strategy: Test One Thing at a Time

Basic Idea

When debugging, change one thing at a time and test after each change. This sounds obvious. It is routinely ignored under pressure.

If you change three things simultaneously and the bug disappears, you do not know which change fixed it — or whether the bug is actually gone or just hidden. If the bug is still present, you do not know which of your three changes was irrelevant, which made things worse, and which was on the right track.

One change. Run. Observe. One change. Run. Observe.

At a Deeper Level

This principle extends to testing functions: before combining multiple functions into a larger program, verify that each function works correctly on its own with simple, known inputs. If you cannot test a function independently, write a short test at the bottom of your file:

def celsius_to_fahrenheit(c):
    return (c * 9/5) + 32

# Quick test — remove before submission
print(celsius_to_fahrenheit(0))    # Expected: 32.0
print(celsius_to_fahrenheit(100))  # Expected: 212.0
print(celsius_to_fahrenheit(-40))  # Expected: -40.0

If each function works in isolation, and the program still produces wrong output when they are combined, the bug is in how the functions interact — how values are passed between them, or how their results are combined. This is a much smaller space to search than the entire program.

Testing edge cases is particularly important. An edge case is an input at the boundary of what the function is designed to handle: zero, an empty list, the minimum or maximum valid value, a string with spaces, a negative number. Many bugs only appear at the edges, and many programmers only test with comfortable middle-of-the-range values.

Function typeObvious testEdge cases to test
Calculates an average[80, 90, 70]Empty list; one-item list; all the same value; list with a zero
Finds the largest value[3, 9, 2, 7]One-item list; all items equal; negative numbers; largest at index 0; largest at last index
Counts words in a string"hello world"Empty string; one word; multiple spaces between words; leading or trailing spaces
Converts user input to integer"42"Zero; negative number as string; decimal as string; non-numeric string; empty string
At the Deepest Level

The practice of testing functions in isolation is the foundation of unit testing — a formal approach to software quality in which every function (unit) has a set of automated tests that verify its behaviour. Python's built-in unittest module and the popular third-party library pytest provide frameworks for writing and running unit tests systematically. The quick manual tests shown above are an informal version of the same idea.

In Test-Driven Development (TDD), tests are written before the function — you define what correct behaviour looks like first, then write the function to pass the tests. This is not a requirement in Grade 9, but the habit of thinking about test cases before or alongside writing code produces better functions with fewer edge-case bugs.

7   Strategy: Rubber Duck Debugging

Basic Idea

Rubber duck debugging is the practice of explaining your code out loud, line by line, to an imaginary listener — traditionally, a rubber duck on your desk. The duck does not respond. It does not matter. The act of explaining forces you to make your assumptions explicit and often reveals the bug before you finish.

The technique works because explaining something requires you to slow down and say out loud what you believe each line does. When you say something out loud that is wrong, you often hear it — in a way you would not if you were only reading silently.

Your listener does not need to be a rubber duck. A classmate, a piece of paper, or an AI assistant in Mode 2 all work. What matters is that you explain — line by line, out loud, in plain language — what you think the code does.

At a Deeper Level

The rubber duck technique is most effective when you follow a structure:

  1. Describe the intended behaviour. "This function is supposed to take a list of numbers and return the average."
  2. Describe the actual behaviour. "Instead, it is returning zero every time."
  3. Walk through the code line by line. "Line 1 initialises total to zero. Line 2 starts a loop over the list. Line 3 adds each item to total..." At the point where you say something and realise it is not what you intended, stop. That is the bug.

In a course context, this is exactly the skill tested in oral desk checks and project defenses. The ability to explain your code line by line — and notice when your explanation does not match the code — is the same skill as rubber duck debugging. Practicing it while debugging trains the same muscle that oral explanations require.

Explaining to an AI assistant counts — with conditions

In Mode 2, you can use an AI assistant as your rubber duck. Paste the relevant section of code and explain what you think it does and what is going wrong. The AI may identify the bug. Before changing any code, explain in your own words what the AI found and why. If you cannot explain it, you do not understand the fix. Copying a fix you do not understand is not debugging — it is hoping the problem goes away.

At the Deepest Level

The rubber duck technique is effective for a cognitive reason: reading and explaining are processed differently by the brain. When you read silently, your brain applies pattern-matching and often fills in what you expect to see rather than what is actually there. This is why you can read a piece of code ten times without seeing an obvious error. When you explain out loud, you are forced to generate language from what is actually on the page — a different cognitive process that is less susceptible to the same blind spots.

This is also why code review — having another person read your code — is so effective in professional software development. The reviewer has no prior expectation of what the code does, so they read what is actually there. Fresh eyes catch bugs that the author has become blind to. When a classmate is not available, the rubber duck is the next best option — it forces you to adopt a more external perspective on your own code.

8   When You Are Completely Stuck

Basic Idea

Sometimes you have tried everything and the bug is still there. Before asking for help, run through this checklist. Most bugs that survive more than fifteen minutes of debugging are hiding behind one of these:

  • You are looking at the wrong file. VS Code has the file open, but you are running a different version. Check that the file you are editing is the one Python is running.
  • You made a change but did not save. VS Code shows unsaved changes with a dot on the tab. The debugger runs the saved version on disk. Save before running.
  • The variable name has a typo. Score and score are two different variables in Python. Case matters everywhere.
  • You are testing with the wrong input. The fix works for the input you tested but not the input the program originally broke on.
  • The bug is not where you think it is. You have been looking at the same section for twenty minutes. Apply split-half from scratch, as if you had never looked at the code before.
  • There are two bugs. You fixed one, but the program is still wrong because of a second, separate problem. Start the three-step process again from the beginning.
At a Deeper Level

If the checklist does not help, step away. This is not avoidance — it is a deliberate technique. Extended focus on a problem creates tunnel vision: you start to see what you expect rather than what is there, and your mental model becomes increasingly resistant to revision. A five-minute break, followed by reading the code from the top as if for the first time, frequently surfaces bugs that twenty minutes of intense focus missed.

When asking a teacher or classmate for help, do not say "my code is broken." Explain: the input that causes the bug, the expected output, the actual output, the section of code you believe is responsible, what you have already tried, and what you currently think the cause might be. This preparation forces you to organise your thinking — and often surfaces the answer before you finish explaining.

Preparing to ask for help often solves the problem

Experienced programmers know this well: the act of writing up a clear question — the precise inputs, the actual and expected outputs, the code you have tried — frequently reveals the bug before anyone else sees the question. This is sometimes called the help-desk effect. The discipline required to write a good question is the same discipline required to find the bug. If you are stuck, try writing out the question in full even if you do not intend to send it.

At the Deepest Level

The psychological phenomenon behind tunnel vision in debugging is called confirmation bias — the tendency to search for and interpret evidence in ways that confirm your existing hypothesis. Once you have decided the bug is in a particular function, you read that function's code and see what you expect to see. Evidence that contradicts the hypothesis is subconsciously discounted.

The antidote is to periodically challenge your hypothesis explicitly. Ask: what evidence would I expect to see if my hypothesis is wrong? Go look for that evidence. If you find it, your hypothesis is wrong and you need a new one. Professional debuggers who are exceptionally good at their job tend to be unusually willing to abandon a hypothesis that is not supported by evidence, even when they have invested significant time in it. The time invested in a wrong hypothesis is a sunk cost — pursuing it further does not recover that time, it wastes more.

9   Putting It Together — A Debugging Session

Basic Idea

Here is how a systematic debugging session looks from start to finish. The program below is supposed to take a list of student names and scores, filter out anyone below 60, and print the remaining names in alphabetical order. The output is wrong.

students = [
    ("Alice", 85),
    ("Bob", 45),
    ("Charlie", 72),
    ("Diana", 58),
    ("Eve", 91)
]

passing = []
for name, score in students:
    if score > 60:
        passing.append(name)

passing.sort()
print(passing)

Expected output: ['Alice', 'Charlie', 'Eve']. Actual output: ['Alice', 'Charlie', 'Eve']. Wait — that looks right. But Diana has a score of 58, which is below 60 and correctly excluded. Bob has 45, also excluded. What is wrong?

The bug: the condition should be score >= 60, not score > 60. A score of exactly 60 would be excluded by the current condition even though 60 is a passing score. This is a logic error — the program does not crash and produces plausible-looking output. The only way to find it is to test with a score of exactly 60.

At a Deeper Level

Here is the same session written out as a process:

Step 1 — Reproduce reliably. The output looks correct for the given input. Add a student with a score of exactly 60 to confirm the bug: ("Frank", 60). Expected: Frank appears in the output. Actual: Frank does not appear. Bug confirmed and reproducible.

Step 2 — Locate the cause. The program is short enough that split-half and isolation point immediately to the filter loop. Check the assumption: does the condition score > 60 include 60? Evaluate manually: 60 > 60 is False. So 60 is excluded. The cause is confirmed: the condition uses strict greater-than when it should use greater-than-or-equal.

Step 3 — Fix and verify. Change score > 60 to score >= 60. Run with Frank: he now appears in the output. Run with the original data: output unchanged (no original student had exactly 60). Run with Bob (45): still excluded. Run with a score of 59: excluded. Run with a score of 61: included. The fix is correct and has not broken anything.

The whole session took three steps and required checking one assumption. Without the test case of exactly 60, the bug would never have surfaced with the original data — which is precisely why testing edge cases and boundary values matters.

Check Your Understanding
  1. Explain the difference between a symptom and a cause. Why does fixing the symptom without finding the cause create problems?
  2. Describe the three-step debugging process. For each step, explain what goes wrong when it is skipped.
  3. A program processes data through five stages. The final output is wrong. Describe how you would use split-half search to find which stage is responsible. How many checks would you need in the worst case?
  4. A student has a 80-line program with a bug. They cannot work out where it is. Describe the isolation strategy and explain how they would apply it here.
  5. List five assumptions a programmer might make about a variable that could be wrong. For each one, describe how you would check it.
  6. A student changes four things in their code simultaneously and the bug disappears. Why is this a problem, even though the program now works?
  7. Explain how rubber duck debugging works and why it is effective. Could you use it on a program you did not write? How?
  8. Look at this program. Without running it, identify the assumption a programmer would most likely make that is actually wrong, and explain what test case would expose the bug.

    def divide_total(values):
        total = 0
        for v in values:
            total = total + v
        return total / len(values)
    
    print(divide_total([10, 20, 30]))
  9. You have been staring at the same bug for twenty minutes and cannot find it. Walk through the "completely stuck" checklist from Section 8. For each item on the list, describe what you would actually do to check it.