Wednesday, July 8, 2009

Why did it do THAT?

In keeping with this blog's theme of decomposition (insert Beethoven joke here) bughunting can be divided into 2 parts, identifying the bug and determining the cause. Repair takes us back to the "How do you do this?" question.

Here I will focus on the tracking down the cause part of the issue. Identification is more about testing than code writing.

Finding hard bugs is a subspecies of the scientific method. Create a theory of why the bug occurs and prove it wrong. Repeat. Eventually you find a correct theory.

Being able to easily construct mental models of what is going on is critical to this process. These models come in two flavours, a model of how things are supposed to work and models that explain a given (erroneous) behaviour.

Given a good model of how things were designed will lead to the software component that is generating the error. At that point an inspection of the local code and its inputs can start you on the trail of the bug. Breakpoints, tracing, and debugging output come into play here.

Lacking such a model (the joys of legacy code) you need to start constructing possible models of what could be going on. At this point the inventiveness referred to the in the design post becomes very useful.

1 comment:

  1. Hi Jim,
    You don't seem to have a lot of comments, so I thought I'd bug you (well, that and you're wrong :-). Anyway, you missed an important step from both the scientific method and from bug hunting. First gather evidence. Understand what is happening. When you have evidence, then you can formulate a theory, which you can test with experiments (which may themselves generate more evidence, allowing another cycle to commence).

    Bug hunting is the same. Don't try to fix a problem until you know what is actually happening. Follow the data through the system and measure it's value before and after each transformative step. See where the control flows - either by printing debug statements or by tracking with a debugger. Capture the state of the whole system when the bug occurs (heck, even finding out exactly when/where a bug occurs can be a major challenge and a vital piece of information).

    It is amazing how often a complete understanding of *what* is happening finding a solution a trivial (and, above all, safe and unintended-side-effect free) activity.

    The alternative seems to be a guess-try-guess again approach that plaster the code with random patches until the observed problem goes away. It rarely, if ever, actually fixes the root problem, but rather hides it until it emerges in a new, usually harder to find, and almost always nastier problem.

    You must have seen developers like this at DJ (I know they had some there - I saw them with my own two eyes).

    Above all, good developers fix the root problem, not the symptoms.

    ReplyDelete