Classification Used in This Book
In the existing writing on programming, many articles attempt to classify bugs into types. In fact, there have been roughly as many classifications as articles. One notable feature of the bug classification literature is that everyone feels the need to devise his or her own system.
The categorization in this book is based on one devised by Donald Knuth, who is the author of the typesetting package known as TeX (although as befits the field, I'll make some changes to his categories).
Knuth is an expert programmer who kept a detailed log of all the code changes he made during the development of TeX-both bugs and enhancements. He later wrote a paper called "The Errors of TeX" in which he grouped the changes into 15 categories, 9 for bugs and 6 for enhancements, each assigned a letter of the alphabet for reference. We'll ignore the enhancement categories and focus on the 9 bug categories, which are as follows:
Algorithmic Anomalies.
The code correctly follows the intent of the programmer, but the intent was wrong.
Blunders.
As Knuth put it, "thinking the right thing, but writing it wrong." The algorithm was correct, but the code did not implement it correctly.
Data Disasters.
Data was incorrectly modified in some way such that the result did not reflect the programmer's intent.
Forgetfulness.
A simple error of omission; leaving out some code so that the program did not do all that it was supposed to do.
Language Lossage.
Errors related to misunderstanding or not considering the specific features of the syntax, such as the precedence of operators.
Mismatches.
Calling a subroutine with incorrect parameters in a way that the compiler won't report an error.
Robustness.
Crashing on bad input data, reporting uninformative error messages, and the like.
Surprises.
Unforeseen interaction between different sections of the program.
Typographic Trivia.
Simple errors when typing in the program.
This book's classification merges some of these categories and ignores some others. The categories used are as follows:
A-Algorithm.
This category combines Knuth's Algorithmic Anomalies and Surprises. The distinction between the two is somewhat difficult anyway (this was pointed out by Marc Eisenstadt in his paper, "Tales of Debugging from the Front Lines"), and seems to be primarily related to the distance in the code between the error and the fault, or the fault and the failure. All bugs-except perhaps malicious ones introduced on purpose-are "surprises" to the code's author. So I will merge these two categories.
D-Data.
This category is Knuth's Data Disasters. Although you could argue that data winds up incorrect because of an A-Algorithm bug, the D-Data category refers more specifically to cases where the code reads or writes incorrect data, or accesses the wrong storage location.
F-Forgotten.
This category is Knuth's Forgetfulness, but generalized to include all control flow errors that occur because the statements in the program are not executed in the order that the programmer intended-the most common reason being that the statement is not present in the program, as Knuth defined it, but also including statements in the wrong location.
B-Blunder.
This category includes Knuth's Blunders and Typographic Trivia. The latter category exists because Knuth wrote his code on paper and then typed it, so he made occasional transcription errors in the process. Because many programmers type code directly into the computer, they would never encounter that problem. (Marc Eisenstadt also pointed out the difficulty of distinguishing between those two categories.) Some A-Algorithm bugs can manifest themselves as small mistakes in the code, but in most cases, it can be ascertained if the programmer intended to type it that way or not.
The book doesn't use Knuth's three other categories: Mismatches, Robustness, and Language Lossage.
In the real world, mismatches do occur, but usually only in large programs, especially when calling functions written by someone else. To the extent that we see these in the small programs in this book, they will go under B-Blunder. TeX is written specifically to process input and produce output, as a compiler would, so handling incorrect input and producing good error messages for the user is important. I assume this is why Knuth gave Robustness its own category. In this book they would be put under A-Algorithm. There is another kind of robustness, writing functions so they can handle incorrect parameters passed to them; these won't get their own category, either. Finally, because this book uses several languages for its examples, and I did not want to require in-depth knowledge of them, there are no bugs that would fall under Language Lossage, although as with Mismatches, in the real world, these can be common. (Knuth includes floating-point rounding errors under Language Lossage, although he says it was a "close call" over Algorithmic Anomalies. I would classify such errors under D-Data.)
I will not make any attempt to order the different types by frequency or difficulty. Any of them can produce bugs that are easy or difficult to find, depending on the way they manifest themselves, the techniques used to find them, and pure luck. The only thing that can be said is that the ease of fixing bugs after they are found can vary with the type: A B-Blunder can usually be patched up with a minor change, whereas the solution to an A-Algorithm might involve fundamental changes to the entire program.
The following sections break each of the four main categories into subcategories with discussion and examples. The subcategories are identified with the notation C.subcategory, where C is the initial of one of the main categories (A, D, F, or B) and subcategory is a descriptive name.
|