0x210 What Is Programming?

Programming is a very natural and intuitive concept. A program is nothing more than a series of statements written in a specific language. Programs are everywhere, and even the technophobes of the world use programs every day. Driving directions, cooking recipes, football plays, and DNA are all programs that exist in the lives and even the cellular makeup of people everywhere.

A typical "program" for driving directions might look something like this:

Start out down Main Street headed east. Continue on Main until you see a church on your right. If the street is blocked because of construction, turn right there at 15th street, turn left on Pine Street, and then turn right on 16th street. Otherwise, you can just continue and make a right on 16th street. Continue on 16th street and turn left onto Destination Road. Drive straight down Destination Road for 5 miles and then the house is on the right. The address is 743 Destination Road.

Anyone who knows English can understand and follow these driving directions; they're written in English. Granted, they're not eloquent, but each instruction is clear and easy to understand, at least for someone who reads English.

But a computer doesn't natively understand English; it only understands machine language. To instruct a computer to do something, the instructions must be written in its language. However, machine language is arcane and difficult to work with. Machine language consists of raw bits and bytes, and it differs from architecture to architecture. So to write a program in machine language for an Intel x86 processor, one would have to figure out the value associated with each instruction, how each instruction interacts, and a myriad of other low-level details. Programming like this is painstaking and cumbersome, and it is certainly not intuitive.

What's needed to overcome the complication of writing machine language is a translator. An assembler is one form of machine-language translator: It is a program that translates assembly language into machine-readable code. Assembly language is less cryptic than machine language, because it uses names for the different instructions and variables, instead of just using numbers. However assembly language is still far from intuitive. The instruction names are very esoteric and the language is still architecture-specific. This means that just as machine language for Intel x86 processors is different from machine language for Sparc processors, x86 assembly language is different from Sparc assembly language. Any program written using assembly language for one processor's architecture will not work in another processor's architecture. If a program is written in x86 assembly language, it must be rewritten to run on Sparc architecture. In addition, to write an effective program in assembly language, one must still know many low-level details of that processor's architecture.

These problems can be mitigated by yet another form of translator called a compiler. A compiler converts a high-level language into machine language. High-level languages are much more intuitive than assembly language and can be converted into many different types of machine language for different processor architectures. This means that if a program is written in a high-level language, the program only needs to be written once, and the same piece of program code can be compiled by a compiler into machine language for various specific architectures. C, C++, and FORTRAN are all examples of high-level languages.

A program written in a high-level language is much more readable and English-like than assembly language or machine language, but it still must follow very strict rules about how the instructions are worded or the compiler won't be able to understand it.

Programmers have yet another form of programming language called pseudo-code. Pseudo-code is simply English arranged with a general structure similar to a high-level language. It isn't understood by compilers, assemblers, or any computers, but it is a useful way for a programmer to arrange instructions. Pseudo-code isn't well defined. In fact, many people write pseudo-code slightly differently. It's sort of the nebulous missing link between natural languages, such as English, and high-level programming languages, such as C. The driving directions from before, converted into pseudo-code, might look something like this:

Begin going east on Main street;
Until (there is a church on the right)
{
   Drive down Main;
}
If (street is blocked)
{
   Turn(right, 15th street);
   Turn(left, Pine street);
   Turn(right, 16th street);
}
else
{
   Turn(right, 16th street);
}
Turn(left, Destination Road);
For (5 iterations)
{
   Drive straight for 1 mile;
}
Stop at 743 Destination Road;

Each instruction is broken down into its own line, and the control logic of the directions has been broken down into control structures. Without control structures, a program would just be a series of instructions executed in sequential order. But our driving directions weren't that simple. They included statements like, "Continue on Main until you see a church on your right" and "If the street is blocked because of construction …." These are known as control structures, and they change the flow of the program's execution from a simple sequential order to a more complex and more useful flow.

In addition, the instructions to turn the car are much more complicated than just "Turn right on 16th street." Turning the car might involve locating the correct street, slowing down, turning on the blinker, turning the steering wheel, and finally speeding back up to the speed of traffic on the new street. Because many of these actions are the same for any street, they can be put into a function. A function takes in a set of arguments as input, processes its own set of instructions based on the input, and then returns back to where it was originally called. A turning function in pseudo-code might look something like this:

Function Turn(the_direction, the_street)
{
   locate the_street;
   slow down;
   if(the_direction == right)
   {
      turn on the right blinker;
      turn the steering wheel to the right;
   }
   else
   {
      turn on the left blinker;
      turn the steering wheel to the left;
   }
   speed back up
}

By using this function repeatedly, the car can be turned on any street, in any direction, without having to write out every little instruction each time. The important thing to remember about functions is that when they are called the program execution actually jumps over to a different place to execute the function and then returns back to where it left off after the function finishes executing.

One final important point about functions is that each function has its own context. This means that the local variables found within each function are unique to that function. Each function has its own context, or environment, which it executes within. The core of the program is a function, itself, with its own context, and as each function is called from this main function, a new context for the called function is created within the main function. If the called function calls another function, a new context for that function is created within the previous function's context, and so on. This layering of functional contexts allows each function to be somewhat atomic.

The control structures and functional concepts found in pseudo-code are also found in many different programming languages. Pseudo-code can look like anything, but the preceding pseudo-code was written to resemble the C programming language. This resemblance is useful, because C is a very common programming language. In fact, the majority of Linux and other modern implementations of Unix operating systems are written in C. Because Linux is an open source operating system with easy access to compilers, assemblers, and debuggers, this makes it an excellent platform to learn from. For the purposes of this book, the assumption will be made that all operations are occurring on an x86-based processor running Linux.