Brief Summary of x86 Assembly Language

Assembly language is converted into executable code by a program called an assembler. There is no fundamental difference between an assembler and a compiler. They both take source code as input and produce machine code that the computer can execute. However, a single line of assembly language generally produces a single line of machine code, whereas a single line of code in a higher-level language can produce multiple lines of machine code.

As a result of this, assembly language is more primitive than higher-level languages. Constructs such as loops, functions, and strings must be supported with only the barest of help from the language.

Another result of the close relationship between assembly language and machine language is that different families of microprocessors support different assembly languages. This chapter uses x86 assembly language, which originated with the Intel line of chips that includes the 80386, 486, and Pentium microprocessors.

Rules about how statements are separated and the like are up to the individual assembler. The programs in this chapter follow the common convention that each statement is on a line by itself, and a comment begins with a semicolon (after which everything else on the line is ignored).

Data Types and Variables

A microprocessor can access data either in memory or in registers. Registers are storage locations, but they are located on the microprocessor itself. It is faster to perform operations on registers than on memory. In addition, some instructions are defined to only work on registers.

The x86 microprocessors have four general-purpose registers named eax, ebx, ecx, and edx. These are 32 bits each. The low 16 bits can be addressed as ax, bx, and so on; you can't address the high 16 bits directly. Within the 16 bits of ax, the low 8 bits (1 byte) are addressed as al and the high 8 bits are addressed as ah, and similarly for the other registers. An 8-bit quantity is known as a byte, a 16-bit quantity is known as a word, and a 32-bit quantity is known as a doubleword (dword).

x86 assembly language instructions are generally in this form (an immediate value is a constant, such as a number):

For example, one of the most basic instructions moves a value into a register or memory location. The value can come from another register, a memory location, or an immediate value. (But you can't move directly from memory location to memory location. It is a general rule in x86 assembly language that both operands can't be memory locations.) The opcode is mov and what data is moved is specified by the operands that come after the opcode. The destination of the move comes first after the opcode, as shown in the following:

The number of bits moved depends on the operands. Because ax and bx are both 16 bits, the first instructions move 16 bits. ecx is a dword, so the second instruction moves 32 bits. It is illegal for mov to have operands of different sizes.

Immediate values can be decimal numbers. They can also be hexadecimal numbers if they begin with 0x, or characters in single quotes, which are converted to their ASCII equivalent. (Some assemblers require that decimal numbers be followed by a d and hexadecimal numbers be followed by an h, as in 100d or f5h.)

The x86 microprocessors support a stack, which is stored in memory. The stack grows downward (toward lower memory addresses). The two main instructions for accessing the stack are push and pop, which affect the lowest location on the stack:

The push and pop opcodes can be used to temporarily store data on the stack if you don't have a spare register to use. Just be sure to push and pop in the proper order (which means that the last value pushed is the first one popped).

The address of the current location of the stack is stored in another register, esp, which can be directly read or written. Because esp is the current location, push first subtracts from esp, then stores the pushed value at the current location of esp. pop reads the value at the current location of esp and then adds to esp.

Square brackets around a register indicate indirect addressing: Treat the contents of the register as the memory address of a value. This is equivalent to a pointer in some other languages. Because esp is a register, you can access values on the stack without using pop:

The number of bits to move is usually implied by the other operand (for example, eax in the previous case implies 32 bits), but in the case of instructions that don't have another operand (or have a constant operand whose size is not known), you can use the following syntax to specify it:

When writing in x86 assembly language, it can be important to know that the x86 family is "little-endian." This means that the least significant byte of a number is stored first. Understand the following sequence:

Indirect addressing also allows you to specify a displacement from the register, as shown in the following:

You can modify esp the same as any other register, so the following is equivalent to pop edx

This also shows the add and sub opcodes, which do addition and subtraction. The destination (where the result goes) is the first operand. These are discussed more in the next section.

The lea (load effective address) instruction loads the address of a memory variable, as shown in the following:

lea can also be used to assign an offset from another register in a single instruction:

Other registers on the x86 microprocessors include esi, edi, and ebp. These have rough meanings assigned to them, which can manifest themselves as implied parameters to an instruction (esi and edi are always used to specify the source and destination in certain string operations) or in more efficient execution. (ebp is often used as the stack-frame base pointer, and instructions that access offsets in the stack frame using ebp as the base can be encoded more efficiently.) However, you can use the registers for simple operations without adhering to those meanings. The examples explain situations where an instruction either requires or assumes particular operands. The esi, edi, ebp, and esp registers all have separate names for the low 16 bits (si, di, bp, and sp), but do not have the equivalent of al and ah for directly accessing the low bytes within those registers.

The x86 microprocessors have six segment registers that give programmers more flexibility in addressing memory. The examples ignore the segment registers. The final two registers to worry about are the flags register, which is discussed next, and eip, which holds the instruction pointer-the location at which the processor is executing code. There are also control, debug, and test registers, which are used for things such as setting watchpoints (which trigger when a certain address is accessed). This book doesn't get into those.

Arithmetic Operations

Previous examples used the add and sub instructions, which do addition and subtraction. The result goes in the first operand:

To remember that the result goes in the first operand, it might be helpful to think of these as the equivalent of the binary assignment operators (such as +=) in some languages. In other words, add ebp, eax is the same as ebp += eax.

x86 assembly language also supports opcodes for multiplication and division, but they are not as generic as addition and subtraction. The mul instruction, which performs an unsigned multiply, works only on al, ax, or eax, and the result goes in a specific location:

In the comments, the notation dx:ax (or edx:eax) indicates that the high 16 (or 32) bits of the result are stored in dx (or edx), and the low 16 (or 32) bits of the result are stored in ax (or eax).

Similarly, div performs an unsigned divide with the dividend specified as one of the same combinations of eax and edx, and quotient and remainder being stored in the same place. (For example, div eax, ebx divides edx:eax by ebx, and stores the quotient in eax and the remainder in edx. You can figure out the equivalent with 8- and 16-bit divisors by working backward from how mul works.) An imul instruction does signed multiplication and is more generic about which opcodes are allowed, but it does not offer the 32-bit ¥ 32-bit = 64-bit form that mul does. (idiv, which does signed division, takes the same operands as div.)

x86 assembly language allows shifting to the left and right by a specified number of bits. Shifting a number 1 bit to the left multiplies it by 2, and shifting it 1 bit to the right divides it by 2:

If you xor a number with itself, it converts it to zero. On some microprocessors in the x86 family, a statement in the following form

Flags, Conditionals, and Jumps

There is a special register on the x86 microprocessors that contains flags. A subset of the flags are known as status flags, and most status flags are set after arithmetic operations, depending on whether it makes sense for a particular operation. The status flags that are important in this book are as follows:

The flags will not be set after an instruction such as mov. Because it is often desirable to set the flags without actually performing an operation, the cmp instruction does this. (It sets the flags the same as if a sub had been performed, without actually doing the subtraction operation.) There is also the test instruction, which performs a logical and, then sets the flags (again, without actually modifying the operands).

The flags are paired with conditional jump instructions that transfer control to any point in the program. For example

jumps to the instruction labeled with mylabel if the result of sub ecx, 1 is zero. As another example

The jz instruction is one of the conditional jumps supported. It performs the jump if the zero flag (ZF) was set. This matches the likely thought process used in the previous code, where you subtract one from ecx and then test if it is zero. However, if you instead do a cmp instruction,

the zero flag is set if the result of a sub would have been zero. This does not mean that eax or ebx are necessarily zero-just that they are equal. For this situation, x86 assembly language also has the je conditional jump, which matches up more logically with the intent of the programmer, but turns out to be the same as jz; it jumps if the zero flag is set.

For the comparisons, the two forms simply mean the same thing: "greater" is the same as not less or equal.

The x86 stores numbers in two's complement format, which is covered in more detail in Appendix A, "Classification of Bugs." The key takeaway about two's complement numbers is that negative numbers have the high bit turned on. It can be amusing, or occasionally challenging, to sit down and work out why exactly "signed less" corresponds to the sign flag being different from the overflow flag after a subtraction (one example of non-obvious status flag values), but it isn't really necessary. It is enough to know that in a comparison/jump sequence such as the following

you can read the meaning of the code by placing the jump condition between the first and second operand, as in "jump if ecx is not greater than edx."

Finally, an unconditional jump, opcode jmp, always jumps. This often follows a conditional jump and corresponds to the else case, if you think of the conditional jump as corresponding to the if case:

Loops

There is no direct support for loops as you think of them in other languages. You must construct them on your own:

However, a loop opcode exists that assumes ecx is being used as the loop counter. In one operation, it decrements ecx and jumps to a label if the result is nonzero. So the previous loop could be rewritten as follows:

Note

Other forms of loop check the zero flag before looping, but they aren't used in the book. (The value of the zero flag is checked before ecx is decremented, so the decrement of ecx won't affect the flag for this purpose.)

The prefix rep can be used to repeat string instructions. The string instructions used in the book are cmps (compare), movs (move), scas (compare a string to a value), and stos (store a value in a string).

Those four string instructions can be used without the rep prefix: cmps compares [esi] to [edi], movs moves [esi] to [edi], scas compares eax to [edi], and stos stores eax in [edi]. The opcodes are usually written with a b, d, or w tacked at the end to specify if the operation works on bytes, words, or dwords. In the case of byte or word operations, this implies that a subset of the eax register is used. For example

The key to the string instructions is what happens at the end; they increment edi (and esi in the case of cmps and movs) at the end of the instruction. This is most useful when combined with the rep instruction prefix, which repeats a string operation as long as ecx is non-zero, decrementing it each time. For example, the following code moves 10 dwords (40 bytes) from [esi] to [edi]:

At this point, it's worth mentioning that there is also a lods instruction that is the opposite of scas (scas stores at [edi], while lods reads from [esi]). It can be used with a rep prefix, but it makes more sense to use it with loop because you usually want to do some processing on each value as it is loaded into al/ax/eax. For example, you could xor together 10 dwords starting at [esi]:

For the cmps and scas instructions, there are two other forms of rep: repe (repeat while equal) and repne (repeat while not equal). In addition to exiting the rep loop when ecx reaches zero, these also check after each cmps or scas instruction and exit the loop if the zero flag is 0 (in the case of repe), or if the zero flag is 1 (in the case of repne). If you have trouble grasping that, remember that "equality" implies the result of the comparison is zero, which means that the zero flag is 1. If that doesn't help, just know that they work "as expected" in code such as the following, which searches for the character 'A' in a 5-byte string:

Of course, when it comes to debugging the programs in this chapter, don't assume they work as expected.

When applied to scas or cmps, the rep prefix is the same as repe (which means it exits if the zero flag is 1 after the primitive instruction). Also, the language provides repz and repnz as aliases for repe and repne, although these are somewhat superfluous. With rep scas and rep cmps, you usually think about equality, not "zeroness."

There is a special conditional jump instruction, jecxz, which jumps if ecx is zero. This is the "didn't match" result of a rep/repe/repne; it means the instruction terminated naturally. For example, the previous line that read as follows

For completeness, it's necessary to mention that the direction of the string operations is actually controlled by the direction flag, a control flag in the flags register. The direction flag can be cleared with the cld instruction and set with the std instruction. It is normally cleared, and the examples will assume it is. If it is set, the string operations go in reverse, which means that edi (and esi) are decremented-rather than incremented, as assumed in the previous examples-by 1, 2, or 4 after each string operation. This is useful for certain overlapping memory moves, to compare strings starting at the end, and so on.

Procedures

Procedures (also known as functions or subroutines) can be called using the call instruction, which takes the address of the procedure:

The only thing call does is push eip on the stack and then jump to myfunction. To return from a procedure, use the ret instruction all by itself:

ret pops the top value off the stack and jumps to that address. Because it assumes the value on the stack is correct, this usually results in a crash if the stack is incorrect:

Note

More dangerously, modifying the stack so that ret jumps to an unexpected instruction is a key technique used by exploits, malicious code that tries to gain control of a machine by pointing eip to externally injected instructions.

Beyond call and ret, constructs such as parameters and return value are up to the author of the code.

Higher-level languages have standards on how they pass parameters to procedures. They are passed on the stack, in registers, or a combination of both. The key is that the caller of the procedure follows the same conventions as the procedure itself. For parameters passed on the stack, the two important questions are whether the parameters are pushed left-to-right or right-to-left, and whether the caller or the procedure cleans up the stack at the end.

For example, in the C language calling convention known as stdcall, parameters are pushed on the stack from right to left, which means that code such as the following

stdcall also specifies that the procedure cleans up the stack, which means that the procedure must pop those three values off the stack before it returns. (The ret instruction can take an optional argument of the number of bytes to pop to make this easier.)

Meanwhile, in the cdecl calling convention, arguments are still pushed right to left, but the calling code is responsible for cleaning up the stack.

In both cases, return values are usually passed back in eax-if they are small enough to fit.

Because parameters are pushed on the stack, procedures index off of esp to obtain the parameters. Because call pushes the return value on the stack, it is at the current stack location when the procedure begins. The parameters pushed on the stack before the call are just above the return value, starting at [esp+4]. For example, if the calling code calls a procedure with

If a procedure wants room for local variables, it can decrease the stack pointer and then index off of it. For example, it could begin with the following code

and then have room for two dwords, which would be addressed (assuming esp did not change) as [esp] and [esp+4]. Procedures must be careful to put esp back before they call ret, so that the return value is at the top of the stack. Also, such variables initially contain whatever value happened to be at that location on the stack from the execution of previous code.

It is a general rule that a procedure that uses registers will save the old values and restore them, generally by pushing them on the stack at the beginning, and popping them off at the end. As a result, procedures often start with code such as the following

The reason for saving esp in ebp is that parameters on the stack can then be accessed by indexing from ebp, which means that you don't have to worry about the offset of the parameters changing (as it would if you indexed directly off of esp and happened to push or pop during the procedure). In this case, ebp would have captured esp just after the old ebp was pushed. This means that [ebp] holds the old ebp, [ebp+4] has the return value, and parameters start at [ebp+8]. Meanwhile, local variables-if room was allocated for them by subtracting from esp after it was saved in ebp-would be accessed with negative indices from ebp.

But, these are all just conventions. If you are writing your own assembly-language code and do not have to interoperate with any other code in any other language, you can handle parameters, stack cleanup, register preservation, return values, and all that in whatever way you want.

Output

There is no code in the book that accepts input from the keyboard or produces output to the screen. A Perl function such as print() actually hides a lot of operating-system-specific code that is required underneath to produce a character on the screen. To keep our assembly language somewhat generic, the examples are restricted to procedures or blocks of code that accept parameters and return values in specified ways. It is certainly possible to call operating system input/output routines from assembler as long as the calling conventions are respected.