Basic reverse engineering ========================= Lab objective: In week1, you will learn how to dissect a binary program to understand its internal logic. To this end, we will use the following tools: [ltrace] and [gdb], and we will cover level0 for this tutorial. After finishing this tutorial and the entire lab for the week1, you are expected to: 1. Know how to run and debug the program using gdb; 2. Understand basic Intel x86 (32-bit) assembly language; 3. Know how a program uses stack for making function calls; 4. Know how to examine arguments of a function call in gdb; 5. Know how to detect loop (for, while) and branch (if) conditions in disassembly; and 6. Know how to control the program to achieve what you want to do -> read the flag file! Preparing for level0 -------------------- Let's start with week1/level0. Please connect to asi-mon.utdallas.edu via SSH (which is on UTD VPN) (e.g., run `ssh your_id@asi-mon.utdallas.edu` on your shell). ```shell= kxj190011@vm-ctf1 : ~ $ ls ./ ../ .bash_logout .bashrc bin/ .gdbinit .peda/ .profile .ssh/ .vim/ .vimrc ``` ```shell= kxj190011@vm-ctf1 : ~ $ pwd /home/users/kxj190011 ``` Now you will be at your home directory. Let's move to the challenge directory (/home/labs/week1). ```shell= kxj190011@vm-ctf1 : ~ $ cd /home/labs/week1/ ``` ```shell= kxj190011@vm-ctf1 : /home/labs/week1 $ ls ./ ../ level0/ level1/ level2/ level3/ level4/ level5/ ``` You can see currently available challenges at there. Let's move on to level0. ```shell= kxj190011@vm-ctf1 : /home/labs/week1 $ cd level0 ``` ```shell= kxj190011@vm-ctf1 : /home/labs/week1/level0 $ ls ./ ../ flag level0* ``` There is a level0 program (this is the challenge program) and the flag file. Your job is to read that flag and submit that to the submission website (ctf.unexploitable.systems). Let's try to read that. ```shell= kxj190011@vm-ctf1 : /home/labs/week1/level0 $ cat flag cat: flag: Permission denied ``` Ugg, it does not work. This is because we do not have permission to read the flag (WHAT??). ```shell= kxj190011@vm-ctf1 : /home/labs/week1/level0 $ ls -als total 20 4 drwxr-xr-x 2 root root 4096 Sep 20 12:21 ./ 4 drwxr-xr-x 8 root root 4096 Sep 20 12:17 ../ 4 -r--r----- 1 root week1-level0-ok 29 Sep 20 12:20 flag <-- THIS FILE 8 -rwxr-sr-x 1 root week1-level0-ok 7340 Sep 20 12:20 level0* ``` The flag file can only be read by the user 'root' and group 'week1-level0-ok' because its permission flag is -r--r----- (r-- for user root, r-- for group week1-level0-ok, --- for you). This means that your account cannot read this. However, don't give up at this point. The target program, level0, is marked as r-s for its group permission, which indicates that while executing the program 'level0', you will inherit the group id of week1-level0-ok, and this will let you read the flag. In other words, you need to control the program 'level0' to read the flag file. Executing level0 ---------------- Then, let's run level0! ```shell= kxj190011@vm-ctf1 : /home/labs/week1/level0 $ ./level0 What's the password? asdfasdf Wrong password!! ``` It asks for the password and let me type for the password. Seems we need to know the password to move forward. To do this, let's use [gdb] to debug the program. Debugging level0 ---------------- ```shell= kxj190011@vm-ctf1 : /home/labs/week1/level0 $ gdb ./level0 pwndbg: loaded 181 commands. Type pwndbg [filter] for a list. pwndbg: created $rebase, $ida gdb functions (can be used with print/break) Reading symbols from ./level0...(no debugging symbols found)...done. pwndbg> ``` 'gdb' is a debugger that allows you to read assembly code of any programs and also you can read/write memory contents while executing the program. Let's check what the program does, by 'disassemble' command. ```shell= pwndbg> disassemble main Dump of assembler code for function main: 0x08048578 <+0>: lea 0x4(%esp),%ecx 0x0804857c <+4>: and $0xfffffff0,%esp 0x0804857f <+7>: pushl -0x4(%ecx) 0x08048582 <+10>: push %ebp 0x08048583 <+11>: mov %esp,%ebp 0x08048585 <+13>: push %ecx 0x08048586 <+14>: sub $0x204,%esp 0x0804858c <+20>: sub $0xc,%esp 0x0804858f <+23>: push $0x80486bb 0x08048594 <+28>: call 0x80483d0 0x08048599 <+33>: add $0x10,%esp 0x0804859c <+36>: sub $0x8,%esp 0x0804859f <+39>: lea -0x208(%ebp),%eax 0x080485a5 <+45>: push %eax 0x080485a6 <+46>: push $0x80486d0 0x080485ab <+51>: call 0x8048410 <__isoc99_scanf@plt> 0x080485b0 <+56>: add $0x10,%esp 0x080485b3 <+59>: sub $0x8,%esp 0x080485b6 <+62>: push $0x80486d3 0x080485bb <+67>: lea -0x208(%ebp),%eax 0x080485c1 <+73>: push %eax 0x080485c2 <+74>: call 0x80483b0 0x080485c7 <+79>: add $0x10,%esp 0x080485ca <+82>: test %eax,%eax 0x080485cc <+84>: jne 0x80485e5 0x080485ce <+86>: sub $0xc,%esp 0x080485d1 <+89>: push $0x80486dc 0x080485d6 <+94>: call 0x80483d0 0x080485db <+99>: add $0x10,%esp 0x080485de <+102>: call 0x804852b 0x080485e3 <+107>: jmp 0x80485f5 0x080485e5 <+109>: sub $0xc,%esp 0x080485e8 <+112>: push $0x80486e5 0x080485ed <+117>: call 0x80483d0 0x080485f2 <+122>: add $0x10,%esp 0x080485f5 <+125>: mov $0x0,%eax 0x080485fa <+130>: mov -0x4(%ebp),%ecx 0x080485fd <+133>: leave 0x080485fe <+134>: lea -0x4(%ecx),%esp 0x08048601 <+137>: ret End of assembler dump. pwndbg> ``` This is the disassembly of 'main' function. Please do not get overwhelmed by this long lines. You may think this as a source code of the program. Although you do not have the real source code (written in C) of the program, you can almost always take a look at this kind of disassembly to understand the program. Before getting into the details of assembly, let's check all 'call' instructions in the binary program, all of which invokes a function from a program (the idea is let's learn about what kind of functions that this program's main function calls). We have six calls. ```shell= 0x08048594 <+28>: call 0x80483d0 0x080485ab <+51>: call 0x8048410 <__isoc99_scanf@plt> 0x080485c2 <+74>: call 0x80483b0 0x080485d6 <+94>: call 0x80483d0 0x080485de <+102>: call 0x804852b <-- not a library call 0x080485ed <+117>: call 0x80483d0 ``` So maybe, the main function of level0 would be something like the following: ```clike= int main() { // we do not know the arguments of each function calls yet. puts("something1"); scanf("some", variable); strcmp(string1, string2); puts("something2"); get_a_shell("don't know args"); puts("something3"); } ``` And from the execution that we had: ```shell= kxj190011@vm-ctf1 : /home/labs/week1/level0 $ ./level0 What's the password? // ask for the password asdfasdf // gets my input Wrong password!! // puts this... ``` We may guess the code as: ```clike int main() { char input[SIZE]; puts("What's the password"); // ask for the password scanf("%s", input); // gets my input if(strcmp(password, input) == 0) { // compare, return 0 if both matches puts("maybe correct password?!"); // don't know yet.. get_a_shell("don't know args"); // don't know yet.. } else { puts("Wrong password!!"); // wrong.. } } ``` But to be more precise, let's dive into the details. The assembly code snippets for the first puts are: ``` 0x0804858f <+23>: push $0x80486bb 0x08048594 <+28>: call 0x80483d0 ``` The first instruction, ``` push $0x80486bb, ``` is the instruction for storing a 32-bit integer value 0x80486bb on the stack. After storing the value on the stack, it will call puts@plt. The function call is for calling a library function 'puts', which prints out a string liternal with a new line. The suffix, @plt, indicates procedure linkage table (PLT), and at here, you may think it as 'function_name@plt' means calling a library function 'function_name' (We will learn more about PLT in week 4). Let's check what's going on at here by executing the program up to this point. To do this, you may set the breakpoint at the main() as follows: ``` pwndbg> b main Breakpoint 1 at 0x8048586 pwndbg> r Starting program: /home/labs/week1/level0/level0 ``` Breakpoint here means that the debugger, gdb, will execute the program but it will STOP at the breakpoint for debugging. After hitting the breakpoint, we may examine values in memory and registers, and we may also change the values in memory and registers to control a program's execution. So the execution will stop at main() like the following: Breakpoint 1, 0x08048586 in main () In addition to stop at main(), in your gdb, pre-configured with pwngdb script, will show the following dashboard: ``` LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA ──────────────────────────────────────[ REGISTERS ]────────────────────────────────────── EAX 0xf7fb3dbc (environ) —▸ 0xffffd67c —▸ 0xffffd7e8 ◂— 'TERM=xterm' EBX 0x0 ECX 0xffffd5e0 ◂— 0x1 EDX 0xffffd604 ◂— 0x0 EDI 0xf7fb2000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1b1db0 ESI 0xf7fb2000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1b1db0 EBP 0xffffd5c8 ◂— 0x0 ESP 0xffffd5c4 —▸ 0xffffd5e0 ◂— 0x1 EIP 0x8048586 (main+14) ◂— subl $0x204, %esp ``` This [ REGISTERS ] dashboard will show the values of all CPU registers and will also display some related information about the values. For example, at the breakpoint, the EAX register will store 0xf7fb3dbc, and from the color of the value (purple), we can see that the value is an address of DATA area. By following arrows in the dashboard, we can easily see that the address points to 0xffffd67c in STACK (yellow), 0xffffd7e8 in STACK, and finally, it points to a string, 'TERM=xterm' in STACK. For the ECX register, it points to 0xffffd5e0 in STACK, and the memory stores the value 0x1. Likewise, the EIP register (a program counter, points to the code address where the CPU is executing the program right now) stores 0x8048586, which is the breakpoint that we set: REMEMBER: Breakpoint 1, 0x08048586 in main () And the color RED indicates EIP points to the CODE area. ``` ───────────────────────────────────────[ DISASM ]──────────────────────────────────────── ► 0x8048586 subl $0x204, %esp 0x804858c subl $0xc, %esp 0x804858f pushl $0x80486bb 0x8048594 calll puts@plt <0x80483d0> 0x8048599 addl $0x10, %esp 0x804859c subl $8, %esp 0x804859f leal -0x208(%ebp), %eax 0x80485a5 pushl %eax 0x80485a6 pushl $0x80486d0 0x80485ab calll __isoc99_scanf@plt <0x8048410> 0x80485b0 addl $0x10, %esp ``` Next, the [ DISASM ] dashboard shows the current instruction and some following instructions to show the code that CPU will run if we continue the program execution. The indicator ► colored in GREEN marked the current instruction (has not executed yet). ``` ────────────────────────────────────────[ STACK ]──────────────────────────────────────── 00:0000│ esp 0xffffd5c4 —▸ 0xffffd5e0 ◂— 0x1 01:0004│ ebp 0xffffd5c8 ◂— 0x0 02:0008│ 0xffffd5cc —▸ 0xf7e18637 (__libc_start_main+247) ◂— addl $0x10, %esp 03:000c│ 0xffffd5d0 —▸ 0xf7fb2000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1b1db0 ... ↓ 05:0014│ 0xffffd5d8 ◂— 0x0 06:0018│ 0xffffd5dc —▸ 0xf7e18637 (__libc_start_main+247) ◂— addl $0x10, %esp 07:001c│ ecx 0xffffd5e0 ◂— 0x1 ``` The [ STACK ] dashboard shows the current status of the stack, starting from the address stored in the esp register (points to the stack top. ebp points to the stack bottom). The same notation for the values is applied, the color indicates area of the memory, and arrow indicates values in the address (if the value is memory address). ``` ──────────────────────────────────────[ BACKTRACE ]────────────────────────────────────── ► f 0 8048586 main+14 f 1 f7e18637 __libc_start_main+247 ───────────────────────────────────────────────────────────────────────────────────────── ``` Finally, in the [ BACKTRACE ] dashboard, you may see the call stack of the execution (i.e., what kind of functions has been nested for now to reach to the execution point?). For now, it says the program started with __libc_start_main(), and internally the function calls main(). Let's get to the point where the program makes the first function call. As the result of executing the line, the register %eax will contain the value of 0x80486e7. And then, it moves zero to $ebp-4 (not related to printf). And then, it moves that %eax to (%esp), which is the first argument of printf function. Hint: (%esp) before a call indicates 1st argument. 0x4(%esp) before a call indicates 2nd argument. 0x8(%esp) before a call indicates 3rd argument. Then 0x10(%esp) means what? So it calls printf(0x80486e7). And we are curious what does 0x80486e7 means, and from our domain knowledge, we have already know that the 1st argument of printf is a string (from printf(char*, ...)). Let's check the value. In gdb, you can check a string at an address via x/s. ``` gdb-peda$ x/s 0x80486e7 0x80486e7: "What's the password?\n" ``` Oh, yes. The 1st printf is: printf("What's the password\n"), as we seen in the result of the execution. ``` kxj190011@vm-ctf1 : /home/labs/week1/level0 $ ./level0 What's the password? <-- this one. asdfasdf Wrong password!! ``` Let's do the same thing for all printfs. ``` 0x08048601 <+97>: lea 0x8048709,%eax 0x08048607 <+103>: mov %eax,(%esp) 0x0804860a <+106>: call 0x80483a0 gdb-peda$ x/s 0x8048709 0x8048709: "Correct!\n" 0x0804861f <+127>: lea 0x8048713,%eax 0x08048625 <+133>: mov %eax,(%esp) 0x08048628 <+136>: call 0x80483a0 gdb-peda$ x/s 0x8048713 0x8048713: "Wrong password!!\n" ``` Now the code looks like: ``` main() { printf("What's the password?\n"); scanf("some", variable); strcmp(string1, string2); printf("Correct!\n"); printf(""Wrong password!!\n"); } ``` Next, let's move on to scanf(). ``` 0x080485be <+30>: lea 0x80486fd,%ecx 0x080485c4 <+36>: lea -0x204(%ebp),%edx 0x080485ca <+42>: mov %ecx,(%esp) 0x080485cd <+45>: mov %edx,0x4(%esp) 0x080485d1 <+49>: mov %eax,-0x208(%ebp) 0x080485d7 <+55>: call 0x80483f0 <__isoc99_scanf@plt> ``` 1st argument, (%esp), comes from %ecx (the line +42), and %ecx is 0x80486fd. And scanf(char*, ...) also gets its 1st argument as string, let's check. ``` gdb-peda$ x/s 0x80486fd 0x80486fd: "%s" ``` Yes, it's scanf("%s", something). Another thing that I would like to let you know is that in x86 assembly, we access local variables via %ebp register. So the 2nd argument, 0x4(%esp) comes from %edx (line +45), and %edx is from lea -0x204(%ebp), %edx. So the %edx will store the address for a local variable at %ebp-0x204. So, let's make our pseudo-code as follows: ``` main() { printf("What's the password?\n"); scanf("%s", ebp_204); strcmp(string1, string2); printf("Correct!\n"); printf(""Wrong password!!\n"); } ``` Then, let's check [strcmp]. ``` 0x080485dc <+60>: lea -0x204(%ebp),%ecx 0x080485e2 <+66>: mov %esp,%edx 0x080485e4 <+68>: mov %ecx,(%edx) 0x080485e6 <+70>: movl $0x8048700,0x4(%edx) 0x080485ed <+77>: mov %eax,-0x20c(%ebp) 0x080485f3 <+83>: call 0x8048390 ``` strcmp(char*, char*) will get 2 arguments as all strings. Now the assembly is quite weird, the line 66 moves %esp to %edx (so now %edx is new %esp). So the 1st argument comes from %ecx (line 68, ecx->edx), which is -0x204(%ebp), and the 2nd argument is 0x8048700 (from 0x4(%edx), which is 0x4(%esp)). Note that the 1st argument, ebp_204, is the string that we got from scanf() (yes, it's our input). And then, the 2nd argument is: ``` gdb-peda$ x/s 0x8048700 0x8048700: "p4sSw0Rd" ``` Then, our code would look like: ``` main() { printf("What's the password?\n"); scanf("%s", ebp_204); strcmp(ebp_204, "p4sSw0Rd"); printf("Correct!\n"); printf(""Wrong password!!\n"); } ``` For the next, let's see what is happening after calling strcmp. ``` 0x080485f8 <+88>: cmp $0x0,%eax 0x080485fb <+91>: jne 0x804861f 0x08048601 <+97>: lea 0x8048709,%eax 0x08048607 <+103>: mov %eax,(%esp) 0x0804860a <+106>: call 0x80483a0 0x0804860f <+111>: mov %eax,-0x210(%ebp) 0x08048615 <+117>: call 0x8048530 0x0804861a <+122>: jmp 0x8048633 0x0804861f <+127>: lea 0x8048713,%eax 0x08048625 <+133>: mov %eax,(%esp) 0x08048628 <+136>: call 0x80483a0 ``` It first compares the return value (stored in %eax) to 0, and the next instruction, jne, is 'Jump Not Equal'. So you can think it as: ```clike= if(ret == 0) { not jump; } else { jump; } if(ret == 0) { 0x08048601 <+97>: lea 0x8048709,%eax 0x08048607 <+103>: mov %eax,(%esp) 0x0804860a <+106>: call 0x80483a0 0x0804860f <+111>: mov %eax,-0x210(%ebp) 0x08048615 <+117>: call 0x8048530 0x0804861a <+122>: jmp 0x8048633 } else { 0x0804861f <+127>: lea 0x8048713,%eax 0x08048625 <+133>: mov %eax,(%esp) 0x08048628 <+136>: call 0x80483a0 } ``` And we have already interpreted two printf functions, so you will get the following code: ```clike= main() { printf("What's the password?\n"); scanf("%s", ebp_204); ret = strcmp(ebp_204, "p4sSw0Rd"); if(ret == 0) { printf("Correct!\n"); get_a_shell(); } else { printf(""Wrong password!!\n"); } } ``` Now you just mastered how to clear level0. And, you can get this using [ltrace] (which prints out all library call traces) as follows: ``` kxj190011@vm-ctf1 : /home/labs/week1/level0 $ ltrace ./level0 __libc_start_main(0x80485a0, 1, 0xffffd6a4, 0x8048640 printf("What's the password?\n"What's the password? ) = 21 __isoc99_scanf(0x80486fd, 0xffffd404, 0xf7ffd000, 0xf7fefff9asdf ) = 1 strcmp("asdf", "p4sSw0Rd") = -1 printf("Wrong password!!\n"Wrong password!! ) = 17 +++ exited (status 0) +++ ``` ------ [ltrace]:http://todo [gdb]:http://todo [strcmp]:http://todo