Identify the Meaning of Each Variable
After you identify the goal of each section, look at the variables used in the code and identify the "meaning" of each one.
The meaning of a variable refers to what value, conceptually, it is supposed to contain.
Variable Names
Variable names, like comments, can be both useful and misleading.
Unlike sections of code, all variables have names, which can usually be counted on to provide some hint of the variable's meaning. A variable's name is like a miniature comment from the programmer that appears every place the variable is used. As with comments, however, you have to make sure that the variable really is used the way the name indicates. Furthermore, some variables, even important ones, have single-letter or other uninformative names:
float average_balance; // good
string name; // OK, but name of what?
int k; // unclear; could be anything
Unlike comments, a compiler or interpreter does not completely ignore variable names, because a variable name refers to a specific piece of storage. But the compiler or interpreter doesn't care about the actual name. Naming a variable a, total, or wxyz won't affect how the compiler treats it. What matters is that a variable is properly declared, defined, and used throughout the program.
If a variable has an unclear name or a name that does not match its real meaning, you should try to come up with a new name, or at least a verbal definition of the meaning. For example, for a variable named i, you might make a note that it is only used as a loop counter, or that it stores the current user ID, or that it holds a pointer to the next line of input.
Look at the Usage of Each Variable
For each variable used in the function or block of code, see where it is used. The first step is to distinguish where the variable is used in an expression-and therefore does not change-from where it is modified to hold a new value. This is not always obvious; some variables, especially data structures, can be modified inside of the functions they are passed to as arguments. Some languages have ways to indicate that a variable will not be changed inside a function (such as the const qualifier in C and C++), but these are not always used:
tot += data[j]; // tot is modified, data and j are used
print(counter); // counter is used
update(mystruct); // mystruct may be modified
After you determine where a variable is modified, you can start to understand how the variable is used. Is it constant for the entire length of the function? Is it constant in one section of code? Is it used only in one part of the code, or everywhere? If it is used in more than one part, is it merely being reused to save declaring an extra variable (loop counters are often used this way), or does its value at the end of one section remain important at the start of the next section?
When looking at loops, think about the state of each variable at the end of the loop. Separate the variables into those that were invariant during the loop, those that were used only during the loop (such as variables used to hold temporary values), and those that will be used after the loop code with an expectation about their value (based on what happened during the loop). A loop counter can fall into either of those last two categories: Often, it is only used to control the loop, but sometimes, it is used after the loop is done to help determine what happened in the loop (in particular, if it terminated early):
for (j = 0; j < total_records; j++) {
if (end_of_file) {
break;
}
}
if (j == total_records) {
// loop did not terminate due to end_of_file
}
Because the return value of a function is important, note whether a variable is used temporarily inside a function, or if it is actually going to be part of the data returned to the caller of the function:
def sum_array( arr ):
tot = 0
for j in arr:
tot = tot + j
return tot
arr is used inside the function, but it is not modified; j is modified, but it is discarded at the end of the function; and tot is modified and then returned to the caller.
Make sure that all variables are initialized before they are used (some compilers and interpreters warn you if this is not the case). Many variables are not given an initial value when they are defined, so it is important that those variables are assigned a value, in all possible code paths, before they are used in an expression.
Restricted Variables
Restricted variables can only hold a particular subset of the values that they would normally be allowed to hold based on their type. For example, when writing a simulation of a racetrack with eight lanes, you might define an integer variable named lane. Normally, an integer could hold a large range of values, but in this case, you are restricting lane to holding values between 1 and 8, or perhaps between 0 and 7. Consider this part of the variable's meaning.
Some languages allow such variables to have their restrictions explicitly stated, but often, programmers don't take advantage of this even where it is available. For example, a programmer could define a set of enumerated constants, ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, and EIGHT, and then state that lane can only hold one of those specific values. However, there is often a tradeoff between strict type-checking (the compiler or interpreter ensuring that lane is only ever assigned one of those eight enumerated values) and ease of programming (allowing the code to do arithmetic operations on lane, such as adding one).
Ideally, any restricted variable would be identified as such-at least in a comment when it is defined, possibly in the name of the variable itself. Restricted variables are often used in ways that cause errors if the variable ever contains a value outside of its intended set. So, it is important to determine if and how a variable is restricted:
char * get_lane_name(lane) {
static char * lane_names = { "one", "two", "three",
"four", "five", "six",
"seven", "eight" };
return lane_names[lane];
}
The preceding code will crash if the lane parameter to get_ lane_name() is not between 0 and 7.
An array index is a form of restricted variable because the proper values are defined by the size of the array. Some languages check array access at runtime and generate an error; other languages silently access whatever memory the index winds up indicating. The runtime error is preferable because it makes it apparent that something is wrong, but both errors can occur for the same reasons.
Unfortunately, the size of an array can itself be dynamic and difficult to determine at a given point in the code. Furthermore, an array can be indexed using a complicated expression. Take the example shown here:
int array[100];
y = array[x];
It is readily apparent that at this point x is restricted to values between 0 and 99, inclusive (assuming a language with zero-based indexing). Instead, if the array access appears as
y = array[x-2];
then x is restricted to values between 2 and 101. In a statement such as this
y = array[somefunction(x) / 3];
it can be difficult to determine what the proper values for x are, especially if the number of elements in array[] was determined at runtime.
Invariant Conditions
Invariant conditions are a more general form of a restricted variable. An invariant condition is an expression, involving one or more variables, that is supposed to be true at any point during the execution of the program, except for brief moments when related variables are being updated. An invariant condition is usually a convention established by the programmer based on how he or she wants to manage the data structures used by the program.
When considering a variable that is a nontrivial data structure, try to think of any invariant conditions that will be true if the data structure is in a consistent state. (For example, a data structure that holds a string and a length might require that the length always contain the string's length.) Make sure that all the relevant parts of the data structure are initialized if needed. When the data structure is changed, ensure that the invariant conditions are still satisfied.
In the previous example using lane, the invariant condition might be stated as follows:
(lane >= 1) && (lane <= 8)
Another example, with a linked list, might be
if ((list_head != NULL) && (list_head->next != NULL))
(list_head->next->previous == list_head)
You need to note invariant conditions where they exist because they constitute an implicit goal before and after every block of code in the program. Because goals are a theoretical idea that the compiler or interpreter is not actively concerned about, invariant conditions are also good candidates for including in assert statements for languages that support them.
The previous statement about invariant conditions being true "except for brief moments when related variables are being updated" is important. For multithreaded programs, take care that those "brief moments" are synchronized, so that another thread won't find the variables in a state where the invariant condition is false.
Track Changes to Restricted Variables
As previously mentioned, some variables are restricted in that they should only contain a subset of the possible values they could contain. For example, an integer being used as a Boolean value might be restricted to the values 0 or 1. Because these restrictions are usually logical rather than enforced by the compiler or interpreter, it is important to check modifications to the variable to make sure that the value remains properly restricted.
Modification of restricted variables can be checked with an inductive process. That is, before a variable is modified, if you assume that the current value is properly restricted, it is possible to prove that the value after modification is properly restricted. If you can show that the variable is initialized with a proper value, and that every modification keeps the variable properly restricted as long as it is properly restricted beforehand, you prove that the variable is always properly restricted.
For example, if a variable grade is supposed to contain a value between 1 and 4, the following statement
grade = 3;
always keeps grade within the restricted range. However, with a statement such as this
grade = 5 - grade;
it is unclear if grade will still be properly restricted. But, if you assume that grade is properly restricted to 1 through 4 beforehand, at that point, you know that the expression 5 - grade keeps grade in the proper range.
|