Preface

If you're a programmer, you've probably spent a fair bit of time looking through source code trying to find a bug. You know the routine: You've narrowed the bug down to a small code section, you know it's lurking there somewhere, but you can't see where it is. You are confident that, eventually, you will have the satisfaction of finding and fixing the bug, but for the moment all you feel is frustration.

So why, you might ask, would you voluntarily read a book that requires you to do exactly that . . . repeatedly?

The answer is that finding bugs by looking through source code is really, in the end, the only way to fix bugs. You can run your tests, gather your data, wade through a debugger session, print all the verbose text you want, but it eventually comes down to seeing where the code has to be changed. Sometimes, the bug can be glaringly obvious, but oftentimes, it is not. Given the theory that practicing something makes you better at it, it seems logical to practice a skill that makes finding bugs easier, faster, and less frustrating.

Furthermore, the more time you spend reading source code looking for bugs, the better you become at finding bugs when you first review your code, when fixing them is still cheap, as opposed to later, when the program has already been through a series of tests that need to be re-run if any code is changed-or even later, when the software has shipped and the bug is found by a disgruntled end user who wants a fix right now.

This book lists the source code to 50 programs. Each program has exactly one bug in it (unless I missed one).

The 50 programs consist of five chapters of ten programs each, with each chapter's programs written in one of five different languages. Don't be concerned if you are unfamiliar with some of the languages; each chapter includes a description of the relevant syntactic features of each language. The goal of these descriptions is not to present a complete tutorial, but to provide enough information to allow you to extract the logic from the code, and from there, find the flaw in the logic. If you're a programmer familiar with any language, you will be able to follow along. The specific language really doesn't matter here-the required skills are relevant to all programming languages.

For each program, I explain what it is trying to do and point out any unusual features of the language, after which comes the source code. Ideally, you will be able to find the bug by looking at the source. If you have trouble, I offer suggestions on how to approach analyzing the program, followed by hints on specific inputs to use when walking through the code. Finally, I give an explanation of the bug, and discuss how it would manifest itself (something I encourage you to come up with on your own, because it improves your understanding of the code).

The kinds of bugs vary: improperly calculated arithmetic expressions, bad algorithms, incorrect assignments, returning the wrong variable, and so on. There are no subtle tricks that are apparent only to those who are experts in a language. All the code goes through a compiler or interpreter without errors.

The inspiration for this book came from years working as a Microsoft programmer. One of the duties programmers had was interviewing candidates for programming jobs. During those interviews, employees almost always asked the candidates to write some code on the office whiteboard. The problems were not especially complicated, just simple algorithms such as sorts, linked list operations, and so on-the kind you could write, debug, and discuss in half an hour.

The code could be written in any language that the candidate felt comfortable with (as long as he or she could explain it to the interviewer). The goal was not to see if the candidate knew the precise syntax of a language, but to see if he or she could come up with something that was logically correct, and then offer a reasonable proof of it.

These coding questions were designed as a challenge for the candidates, but they also wound up being a challenge for the interviewer. Evaluating a candidate meant evaluating the code, which meant quickly understanding and analyzing whether the logic was correct so you could discuss what the candidate had written, ask about optimizations, and project an air of benevolent omnipotence. Because candidates often came up with somewhat "unique" logic, you had to do a quick job of emulating a computer and "executing" their code to see if it worked. You weren't interested in what the candidate thought the code was going to do, or what he or she was busy telling you it was going to do, or what the code looked like it was going to do (and I never saw a single candidate include comments in the whiteboard source code). You cared about what it actually did.

Emulating the computer and seeing past the surface of the code to its internal logic can be tricky. Just because someone states, "This code sorts an array," does not mean it necessarily sorts an array. Just because a variable is named distance_from_center does not mean it necessarily has the properly calculated distance from the center. Just because a for loop appears in the code does not mean that it actually loops the correct number of times.

In fact, knowing what a program is supposed to do can blind you to what the code actually does. It's hard to focus on every line of code, every assignment, every loop, every comparison, and really think about what the code actually does. Yet you have to be able to do this because that's what the computer does.

Beyond helping you debug your own programs, this practice can also help you review other people's code. Increasingly, code reviews are becoming a part of a programmer's job description, and not just informal ones to cover formatting and variable-naming conventions. Code reviewers are now asked to vouch for code quality-almost to the same extent as its original author.

Reviewing code that someone else has written (or code that you wrote long enough ago to forget the details) is an acquired skill. It has been compared to proofreading, but there is a key difference. The goal of writing is to pass information to someone who does not have it. Problems with writing, in general, often involve an imperfect simulation of the intended audience: Because the author knows the material so well, it is difficult to imagine how the writing comes across to someone lacking that knowledge. Thus, putting your writing away for a couple of weeks and then coming back to proofread it later (or reviewing someone else's writing) makes you more like the intended audience. Therefore, you can do a better job of seeing how they react than you could immediately after you wrote it.

With code, your "audience" is an infallible computer that interprets the code exactly as it is written, and in doing so, unfailingly extracts the logic contained in the code. For a person to do the same requires some careful study of the code. If code is unfamiliar, you probably don't understand the details, and thus are less like your intended audience. This is why code reviews are so difficult-it is hard for people to simulate the dispassionate, perfect way in which computers execute software, and easy for them to unintentionally skip mistakes.

Back when I was a candidate for a Microsoft programmer myself, I got into an argument with one of the employees who interviewed me. He asked me a typical question: Write a program to recursively reverse a sentence. I produced some code and declared it correct. He disputed my assertion and pointed out what he claimed was a bug. I responded by showing how it would work success fully on some particular sample input. He continued to insist that there was a bug. Eventually, we decided to type the program into a computer to see who was right. Unfortunately, we couldn't get it to compile for some reason, so we wound up debating the issue with only the source code as evidence, each of us simulating the computer in our minds. In the end, I convinced him I was right (I think). Well, I did get hired.

In this book, I present to you 50 programs, each of the type that was asked in Microsoft interviews (including recursive sentence reversal), although some of them are slightly longer than what would fit on a whiteboard. In the tradition of the rule that interview code could be written in any language, each chapter's 10 programs are written in a different language:

C. A general-purpose language that, for years, was the language of choice for complicated systems and application development, and the language in which I wrote almost all my code for Microsoft. C was originally designed by Dennis Ritchie at Bell Laboratories.
Python. An object-oriented scripting language. It's powerful, but also useful for quickly writing small pieces of code. Python was developed by Guido van Rossum.
Java. An object-oriented programming language designed to allow programs to be downloaded from a network and executed on any platform. Java was invented by a team at Sun Microsystems.
Perl. A scripting language that's especially optimized for processing text, and often used to write Common Gateway Interface (CGI) programs to run on web servers. Perl is the brainchild of Larry Wall.
x86 Assembly Language. The native language used by the x86 family of microprocessors. It's difficult to understand and rarely written directly in nowadays, but it often needs to be read and understood by programmers analyzing code in a debugger. Intel Corporation designed this language.

If you know one of these languages well, you might be tempted to start with that chapter. This is fine, but I encourage you to also try unfamiliar languages. As previously mentioned, the summary of the language at the beginning of each chapter is enough to get you going.

The bugs in each program, I should mention, were not found "in the wild" (in code that someone else wrote). The programs were written by me. A few programs are written in a non-intuitive way (non-intuitive to some people, anyway) to showcase a feature of a particular computer language or allow a certain type of bug to be hidden. In many cases, the bugs were artificially injected; for the rest, I simply left in one of the bugs that I found when debugging the code. I usually had plenty from which to choose.

Before you get to the bugs, Chapter 2 gives you some tips on how to walk through code. If you are confident of your skills, you can skip this chapter.

In each chapter, the programs are arranged in roughly increasing order of difficulty (emphasis on "roughly" because different bugs baffle different people). The programs are mostly unrelated; you can tackle them at your leisure in any order. In a few places, programs build on previous ones to solve a larger problem.

The bugs are arranged according to a classification scheme, which is shown briefly in Chapter 1, and explained in its entirety in Appendix A. Appendix B, "Index of Bugs by Type," is an index of bugs by classification type, perfect if you want to focus only on a certain type of bug.

What is the goal of this book? First and foremost, it's a chance to improve your code reviewing and debugging skills. It's also a way to challenge yourself to solve the logic puzzle that each program represents, both in figuring out how it works and finding the bug. You might be able to gain some understanding of a programming language with which you are unfamiliar. If you're curious, it presents somewhat of a glimpse into what a programming job interview at Microsoft is like. And if you want to use the programs (after you fix the bug) for your own purposes, feel free to do so.

Please check the web site, www.findthebug.com, for updates or, if necessary, corrections to the programs. Have at them, and good luck.

Adam Barr
June 2004

Table of Contents