I am very excited about this topic, because I think that the process of exploiting a buffer overflow vulnerability is very creative and a bit difficult to understand because all the different knowledge required to pull out this type of attack. I want to approach this by splitting the post in two parts. In the first part I am going to review how the CPU and Memory of computers works and how the buffer overflow works, the problems to exploit it and how resolve those problems. I am leaving for the second part a more practical post where we are going to exploit a simple application to illustrate how a buffer overflow attack is performed, it will involve a little bit of programming in Pyhton, and a little bit of playing with the Immunity debugger.
Now that we have a clear how this post will be divided, lets begin with the first part.
How the Memory is organized?
The memory model for an X86 Processor is segmented and organized from higher address to the lower address, like you could see on the figure.
We are not going to enter in details of which are the purpose of each segment, it is suffice to know that the instructions of the program are at the lower end of the memory and the stack is at the higher end of the memory, in the next sections we will talk about the stack and why is important.
What about the Registers?
The second thing that we are going to review are the registers,why are used and what they are used for. It is important to understand that each Processor Architecture is different and we will find that the registers on x86 processors are different than maybe the Motorola 6800 Processors, 8080 processors, etc… even between x86 processors we will find differences about the size of the register and that is why we have 16, 32 and 64 bit registers depending the type of architecture that you have.
On the x86-32bit architecture there are 8 general purpose registers that are used to store data and address that point to other positions in memory, this registers are:
From this list the most important registers for us are ESP and EBP, there is also a very specialized register called EIP that is going to be very important so we are going to talk about these important registers:
ESP is the Extended Stack Pointer and this register purpose is to let you know where on the stack you are, it means that the ESP always mark the top of the stack
EBP is the Extended Base Stack Pointer and its purpose is to point to the base address of the stack.
EIP is the Extended Instruction Pointer it is a read-only register and it contains the address of the next instruction to read on the program, point always to the “Program Code” memory segment.
And the Stack?
The stack is very important in the assembly language, the stack is a part of the memory configured as a LIFO (Last Input First Output) data structure, used as a temporary storage area to quickly access data used for the assembly program, the stack could be defined in different positions any time, and that is why the EBP register is used, to point to the base address of the stack, it is important to mention that the stack grows from high memory address to low memory address and grows each each time that we “push” some data in the stack and shrinks each time that we “pop” something from the stack, the top boundary it is always pointed by the ESP, which is always changing his value.
Why EBP is so important?
EBP is important because it provides an anchor point in memory and we could have a lot of things referenced to that value, when we call a function inside a program and we have some parameters to send to it, the positions in memory are always referenced by EBP as well as the local variables as you could see on the figure.
Putting all Together
Now that we “know” a little bit about how some things works inside the CPU and memory of a computer when run a code, we can now understand what is a buffer overflow.
At very high level when you call a function inside a program what happens is the following:
- The Function Stack is created, inserting the register EBP in the stack to set the anchor
- The parameters are passed as a memory address to EBP+8, EBP+12, etc…
- The Function is called and the returned data is saved in memory and pointed by the RET variable on the position EBP+4
Lets focus now on the step number 2 and lets say that we send a string formed by 12 A’s, the memory look like the following figure:
Analyzing a little bit the figure we see that PARAM1 point to the address where the data is saved in the stack, and as we know ESP points to the top to the stack so the string is copied from ADDR1 4 bytes at a time to Higher memory, and this happens because is the only way to remain inside the stack.
If the function does not control the length of the buffer before writing the data on the stack and we send a large number of A’s we could end with a case like the next figure:
If it happens that the EIP register is overwrite by the A’s, then you altered the address to return for the execution of the next instruction, obviously if the EIP is overwrite with “noise” you will have an exception raised and the program will stop.
Exploiting the Buffer Overflow
As I said in the previous section if EIP is overwritten by “noise” the program will stop with an error, but what happens if the string sent is crafted so the EIP is overwritten with an address that “makes sense”? and what happens if the address that “makes sense” results to be the beginning of a code that do something? If that happens the you exploited successfully the buffer overflow.
Unfortunately there are some things standing between you and a successful buffer overflow attack:
- You don’t really know where the EIP is located, without the address of the EIP register then you could not craft the string to overwrite the address with an address of your choose.
- The second problem is that the address that “makes sense” is the ESP address, and as I said earlier this value change constantly, that means that you should capture the ESP value when the buffer overflow exception occur
- The third problem is that each function has some hexadecimal values that could not be used, this hexadecimal values translate to a character that could cause a problem and stop the execution of the function, examples are: 0x0a (Line Feed LF), 0x0d (Carriage Return CR), etc… So what happens is that if the address of ESP contains one of this characters then the exploit will not be successful.
How resolve each of this problems:
Resolving the First Problem (Locate the EIP Address offset)
The First problem is resolved easily, all that you have to do is send a unique pattern of characters as a string to provocate the buffer overflow, once the string is sent, EIP will be overwritten by a unique 4-bytes pattern that could be searched in the original string to find the offset position in the string, then you could replace this part of the string with the ESP address, you should also add the payload that will be used to exploit successfully the buffer overflow.
Resolving the Second Problem(Finding ESP value)
This problem could be resolved in two ways, the first one is using any Disassembler/Debugger (Ollydbg, IDAPro, Immunity, etc…) and manually attach to the process and analyze the registers at the moment of the exception, another way to do this is using a programming language like python to write a pycommand for Immunity (this solution always involve the use of a debugger, Immunity in this case) or you could use pydbg (a python debugger library) to analyze the exception and print the register values.
Resolving the Third problem (Finding bad characters)
As I explain previously there are some characters that force the function to stop working when passed as parameters. Find out which characters are considered bad characters in a function is an easy but tedious process, the main idea behind it is to send a string formed by all the characters in Hex (from 0x00 to 0xFF) and monitor manually with a Disassembler/Debugger or programmatically using pycommands for Immunity or pydbg for python. When a bad character is sent the string sent is truncated just before the bad character revealing one of the bad characters, then you need to repeat this process eliminating each bad character found, until all the string is passed.
Now that we have all the bad characters individuated, we could check the ESP address to see if there are any of this characters, if there are then we need to solve it:
- In assembly there is a mnemonic (command) called JMP, what it does is to “jump” to the address specified after the command, so JMP ESP will jump to the address specified in the ESP register, now we only have to find a memory address without any bad character inside the area where the code is loaded that have this instruction and use it as the EIP address, you could do this with a pycommand or using pydbg.
- Unfortunately the shellcode to use as payload also should not contain any bad character, this could be avoided using an encoder to exclude the bad characters
Lets return to the basics
Ok, we know abut Memory, Registers, Stack, what is a buffer overflow and the theory about how to exploit it, this is a very useful information, but… How do I know that an application has a buffer overflow vulnerability?
There is no an exact procedure on how to know if an application is or isn’t vulnerable, normally you could disassembly the executable, and see if there are calls to functions in dll’s that are known to be prone to this vulnerability, for example calls to functions like strcpy, gets, scanf, and others, if this functions are present there is a good chance that the programmer did not have some type of boundary check and that mistake could lead to a vulnerability.
You could also poke all the user interaction within the application using scripts or applications called fuzzers, the fuzzers are codes that send noise to the user inputs to see if crash, then you should analyze why the crash happened and to identifying which type of vulnerability do you have on you hands, the user interaction could be anything like text input fields, configuration files, files uploaded or processed by the application, etc…