Split the Code into Sections with Goals
The first step to understanding the code is to split it into sections and identify the goals of each section.
A section is a snippet of code that accomplishes a specific task. No specific number of lines constitute a section; it depends entirely on the code. A section can be one statement or function call, or it can be a loop with 30 lines of code in it. We can loosely define a section to be any sequence of program statements that accomplishes enough that you should take the time to define goals for it.
The "goal" of a section of code is the set of changes that the code is intended to make to the data structures used by the program. If a section is an entire function, the name of the function usually provides a general indication of what the section is trying to accomplish, but not in a detailed-enough way to help debug it. It's more of a starting point to help you think about the goals of the entire function.
Identify the Sections in the Code
If you are familiar with the code that you are looking at, it might be easy for you to divide the code into sections because you know which parts of the code correspond to different parts of the algorithm it's implementing. If you aren't familiar with the code-either because someone else wrote it or because you wrote it so long ago that you forgot what you were thinking when you wrote it-you need to spend some time thinking about how to split up the code.
The most basic step is to locate the main part of the algorithm. Most functions begin with introductory code to handle special cases, deal with errors, and so on, and end with code that cleans up and possibly returns values to the calling function. In between these is the code that implements the main algorithm.
The main algorithm is the part that you would talk about if you were telling someone what the code did. You might say, "The function looks up a key in a dictionary," without mentioning that it first checks whether the dictionary is valid, and later frees a temporary buffer that it allocated.
Of course, the introductory and cleanup code can still harbor bugs and need to be checked as carefully as any other piece of code. However, it is true that the introductory and cleanup code usually execute on any input, so they're tested all the time. Tricky input-specific bugs might hide in the main algorithm; this is the part that actually corresponds to the mathematical algorithm that the code implements.
Therefore, it is useful to note where the introductory code ends and where the cleanup code begins. Mark the area between those as the location of the main algorithm. Consider the following code:
int find_largest_hash(String s[]) {
if (s.length == 0) {
throw new InvalidParameterException();
}
HashCalculator hb = new HashCalculator();
int largesthash = hb.hash(s[0]);
int newhash;
for (int j = 1; j < s.length; j++) {
newhash = hb.hash(s[j]);
if (newhash > largesthash) {
largesthash = newhash;
}
}
hb.flush();
return largesthash;
}
In this example, the code to check s.length == 0, plus the next three lines to define hb, largesthash, and newhash, are the introductory code. The call to hb.flush() and the return statement are the cleanup code. The rest, in between those, is the main algorithm.
This example also shows that you don't necessarily need to know everything about the code to figure out where the sections are. Although no information was provided for the HashCalculator class, it is still readily apparent where it is initialized, used in the main algorithm, and cleaned up.
If the main algorithm consists of more than just a few lines of code, it needs to be split into smaller sections. Again, consider how you would describe the algorithm to someone else: Each part of that description is probably one section. If you would describe an algorithm as "first read in the data, then organize it by key, then output it," you would try to separate the code into those three sections.
Identify Goals for Each Section
After you split the code into sections, identify the goals of each section. At the end of the section, what variables should be modified and how? What invariant conditions should be true? How should the data structures be set up?
When you have finished mentally dividing the code into sections with goals, check that each goal is well contained: Code that starts working on the next goal before it logically finishes a previous one can be prone to bugs. Some languages allow assert statements, which are logical expressions (usually only tested in debug versions of the code) that cause the program to halt if they are false. The gaps between sections are often a good place to put assert statements that verify if the goal of a section was properly achieved, as shown in the following code:
public class MyArray {
public boolean isSorted() {
for (j = 0; j < data.length-1; j++) {
if (data[j] > data[j+1]) {
return false;
}
}
return true;
}
}
MyArray ma;
// Now sort the array
ma.sort();
assert (ma.isSorted());
If a section of code is a loop, you need to determine the overall goal of the loop. However, you should also try to determine the goal of the loop after one iteration. For example, for a loop that sorts an array, the goal after the first iteration of the loop might be "the first element in the array holds the smallest value."
For if statements, try to state the goal of the if condition itself, as in "The if() block will execute if the user has not been validated yet."
Comments
Comments are an important part of determining the goal of a piece of code. They represent the only chance a programmer has to communicate his or her ideas in plain language.
Many programmers write comments as hints for when they come back to look at the code. In many cases, comments-particularly long comments-indicate areas that the original programmer felt were tricky, unclear, or in some other way unlikely to be obvious upon later viewing. The presence of such comments usually indicates the location of the key parts of the algorithm.
Often, comments can also help identify useful sections within the code, because many times, a multiline explanatory comment precedes a block of code worth grouping into one section, and the comment tries to explain the goal of the code.
However, it is important not to let comments mislead you. The compiler and/or interpreter ignore comments, and at times, so should you. Comments can be out of sync with more recent changes to the code, or they might have been wrong to begin with. Although they represent a starting point to understanding code, they need to be verified against the actual code to ensure their accuracy.
Some comments are done by rote, in the apparent belief that mundane operations need a comment, such as the following:
// add this price to the total
total += this_price;
These types of comments are unlikely to highlight buggy areas. On the other hand, a simple comment like the following, which is obviously wrong, is a sign that significant changes might have been made to the code since it was originally written:
// update the x coordinate
y_coord += delta;
Someone likely changed this code in a hurry, perhaps pasting it in from elsewhere and then renaming variables with the automated search-and-replace functions in an editor. The semantics and goals might have been broken in the process.
|