Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

2024年2月13日21:12:04评论9 views字数 34952阅读116分30秒阅读模式

一
前言

这种写法最初是在2022-GoogleCTF-eldar, 国外的DiceGang的hgarrereyn师傅的wp:https://ctf.harrisongreen.me/2022/googlectf/eldar/中首次用到。也是用于解析虚拟机, 不过是用于解析的ELF metadata-driven turing weird machine。

然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好，可以说毫不夸张像魔法一样。

当时就在Todolist中写道，用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成，在我的Todolist中吃灰了接近一年，这一年都在被工作推着走，每天就像机器人一样去执行自己头天写的指令，记忆好像也变差了，经常忘事情，年末项目交付了一些了才有时间弄些自己的，创业之路真的很难。

言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中，被正式用于了解析常规虚拟机。

直至放到了今日，才回来写，其实虚拟机解析之前我在之前已经发过不少。总结来说，这种方法属于是disassembler的升级版, 远优于之前发的disassembler，你说它优于decompiler吗？我无法给出一个肯定答案，毕竟decompiler属于一种抽象为高级语言的思路。

二
python310 Structural Pattern Matching

Learn Structural Pattern Matching

Structural Pattern Matching介绍

PEP 634 – Structural Pattern Matching: Specification（https://peps.python.org/pep-0634/）：介绍 match 语法和支持的模式

PEP 635 – Structural Pattern Matching: Motivation and Rationale（https://peps.python.org/pep-0635/）：解释语法这么设计的理由

PEP 636 – Structural Pattern Matching: Tutorial（https://peps.python.org/pep-0636/）：一个教程，介绍概念、语法和语义

match patterns：

Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.

Capture patterns（捕捉模式）

匹配一个模式，并绑定到一个name：

def sum_list(numbers):
    match numbers:
        case []: # 匹配空列表 
            return 0
        case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
            return first + sum_list(rest)

def average(*args):
    match args:
        case [x, y]:           # captures the two elements of a sequence
            return (x + y) / 2
        case [x]:              # captures the only element of a sequence
            return x
        case []:
            return 0
        case a:                # captures the entire sequence
            return sum(a) / len(a)

guards（向模式添加条件）

用来进一步限制匹配模式，如下：

# 从小到大排序
def sort(seq):
    match seq:
        case [] | [_]:   # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
            return seq
        case [x, y] if x <= y:
            return seq
        case [x, y]:
            return [y, x]
        case [x, y, z] if x <= y <= z:
            return seq
        case [x, y, z] if x >= y >= z:
            return [z, y, x]
        case 
:
            a = sort([x for x in rest if x <= p])     # 比p小的去排序
            b = sort([x for x in rest if p < x])      # 比p大的去排序
            return a + 
 + b

AS Patterns（as模式）

给限制条件取别名，使其能够与bind name一起工作。

子模式在 match 语法里面是可以灵活组合的。

In : def as_pattern(obj):
...:     match obj:
...:         case str() as s:
...:             print(f'Got str: {s=}')
...:         case [0, int() as i]:
...:             print(f'Got int: {i=}')
...:         case [tuple() as tu]:
...:             print(f'Got tuple: {tu=}')
...:         case list() | set() | dict() as iterable:
...:             print(f'Got iterable: {iterable=}')
...:
...:

In : as_pattern('sss')
Got str: s='sss'

In : as_pattern([0, 1])
Got int: i=1

In : as_pattern([(1,)])
Got tuple: tu=(1,)

In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]

In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}

def simplify_expr(tokens):
    match tokens:
        case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
            return simplify_expr(expr)
        case [0, ('+'|'-') as op, right]:
            return UnaryOp(op, right)
        case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
            return Num(left + right)
        case [(int() | float()) as value]:
            return Num(value)

OR Patterns（或模式）

第一种写法，用逗号分隔：

case 401, 403, 404:
    print("Some HTTP error")

第二种写法与C语言类似：

case 401:
case 403:
case 404:
    print("Some HTTP error")

第三种写法：

case in 401, 403, 404:
    print("Some HTTP error")

第四种写法：

case ("a"|"b"|"c"):

第五种写法：

case ("a"|"b"|"c") as letter:

Literal Patterns（字面量模式）

使用 Python 自带的基本数据结构，如字符串、数字、布尔值和 None等。

match number:
    case 0:
        print('zero')
    case 1:
        print('one')
    case 2:
        print('two')

def simplify(expr):
    match expr:
        case ('+', 0, x):   # x + 0
            return x
        case ('+' | '-', x, 0):  # x +- 0
            return x
        case ('and', True, x):   # True and x
            return x
        case ('and', False, x):
            return False
        case ('or', False, x):
            return x
        case ('or', True, x):
            return True
        case ('not', ('not', x)):
            return x
    return expr

Wildcard Pattern（通配符模式）

Wildcard Pattern 是一种特殊的 capture pattern，它接收任何值，但是不将该值绑定到任何一个变量（其实就是忽略不关心的位置）。

def is_closed(sequence):
    match sequence:
        case [_]:               # any sequence with a single element
            return True
        case [start, *_, end]:  # a sequence with at least two elements
            return start == end
        case _:                 # anything
            return False

Value Patterns（值模式）

这种模式主要匹配常量或者 enum 模块的枚举值：

In : class Color(Enum):
...:     RED = 1
...:     GREEN = 2
...:     BLUE = 3
...:

In : class NewColor:
...:     YELLOW = 4
...:

In : def constant_value(color):
...:     match color:
...:         case Color.RED:
...:             print('Red')
...:         case NewColor.YELLOW:
...:             print('Yellow')
...:         case new_color:
...:             print(new_color)
...:

In : constant_value(Color.RED)  # 匹配第一个case
Red

In : constant_value(NewColor.YELLOW)  # 匹配第二个case
Yellow

In : constant_value(Color.GREEN)  # 匹配第三个case
Color.GREEN

In : constant_value(4)  # 常量值一样都匹配第二个case
Yellow

In : constant_value(10)  # 其他常量
10

这里注意，因为 case 具有绑定的作用，所以不能直接使用 YELLOW 这种常量，例如下面这样:
YELLOW = 4

def constant_value(color):
    match color:
        case YELLOW:
            print('Yellow')
# 这样语法是错误的

就是在模式中使用其他变量的值，那么使用的其他变量与 capture 模式的绑定名如何区分呢？用 "." 区分。

目前只能使用带 '.' 的常量。

class Codes:
    SUCCESS = 200
    NOT_FOUND = 404

def handle(retcode):
    match retcode:
        case Codes.SUCCESS:
            print('success')
        case Codes.NOT_FOUND:
            print('not found')
        case _:
            print('unknown')

Sequence Patterns（序列模式）

可以在 match 里使用列表或者元组格式的结果。

不区分 [a, b, c], (a, b, c) 和 a, b, c，它们是等价的，若要明确判断类型则需要 list([a, b, c])。

加星号的模式会匹配任意长度的元素，例如 (*, 3, *), 匹配任何含有 3 的列表。
不会迭代整个迭代器，所有的元素以下标和切片的形式访问。

In : def sequence(collection):
...:     match collection:
...:         case 1, [x, *others]:
...:             print(f"Got 1 and a nested sequence: {x=}, {others=}")
...:         case (1, x):
...:             print(f"Got 1 and {x}")
...:         case [x, y, z]:
...:             print(f"{x=}, {y=}, {z=}")
...:

In : sequence([1])

In : sequence([1, 2])
Got 1 and 2

In : sequence([1, 2, 3])
x=1, y=2, z=3

In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]

In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]

In : sequence([2, 3])

In : sequence((1, 2))
Got 1 and 2

Mapping Patterns（映射模式）

为了效率，key 必须是常量(literals、value patterns)

其实就是 case 后支持使用字典做匹配。

In : def mapping(config):
...:     match config:
...:         case {'sub': sub_config, **rest}:
...:             print(f'Sub: {sub_config}')
...:             print(f'OTHERS: {rest}')
...:         case {'route': route}:
...:             print(f'ROUTE: {route}')
...:

In : mapping({})

In : mapping({'route': '/auth/login'})
ROUTE: /auth/login

# 匹配有sub键的字典，值绑定到sub_config上，字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}

def change_red_to_blue(json_obj):
    match json_obj:
        case { 'color': ('red' | '#FF0000') }:
            json_obj['color'] = 'blue'
        case { 'children': children }:
            for child in children:
                change_red_to_blue(child)

Class Patterns（类模式）

Class Patterns 主要实现两个目标：检查对象是某个类的实例、从对象的特定属性中提取数据。

# case 后支持任何对象做匹配。我们先来一个错误的示例:

In : class Point:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:

In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:

In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))

Input In [], in class_pattern(obj)
      1 def class_pattern(obj):
      2     match obj:
----> 3         case Point(x, y):
      4             print(f'Point({x=},{y=})')

TypeError: Point() accepts 0 positional sub-patterns (2 given)

# 这是因为对于匹配来说， 位置需要确定 ，所以需要使用位置参数来标识:

In : def class_pattern(obj):
...:     match obj:
...:         case Point(x=1, y=2):
...:             print(f'match')
...:

In : class_pattern(Point(1, 2))
match

# 另外一个解决这种自定义类不用位置参数的匹配方案，使用 __match_args__ 返回一个位置参数的数组，
# 就像这样:
In : class Point:
...:     __match_args__ = ('x', 'y')
...:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:

# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性，所以可以直接用
In : from dataclasses import dataclass

In : @dataclass
...: class Point2:
...:     x: int
...:     y: int
...:

In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:         case Point2(x, y):
...:             print(f'Point2({x=},{y=})')
...:

In : class_pattern(Point(1, 2))
Point(x=1,y=2)

In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)

def eval_expr(expr):
    """Evaluate an expression and return the result."""
    match expr:
        case BinaryOp('+', left, right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp('-', left, right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp('*', left, right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp('/', left, right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp('+', arg):
            return eval_expr(arg)
        case UnaryOp('-', arg):
            return -eval_expr(arg)
        case VarExpr(name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case _:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

另外一个例子：

match media_object:
    case Image(type="jpg"):
        return media_object
    case Image(type="png") | Image(type="gif"):
        return render_as(media_object, "jpg")
    case Video():
        raise ValueError("Can't extract frames from video yet")
    case other_type:
        raise Exception(f"Media type {media_object} can't be handled yet")

namedtuple 例子，也属于是 class pattern：

from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
    case Mov(dst, src, 8, ridx):
        pass

Type Unions, Aliases, and Guards

numbers 的类型指定为 List，元素类型可以是 float 或 int。

def mean(numbers: list[float | int]) -> float:
    return sum(numbers) / len(numbers)

可以定义类型别名，类型检查器和程序员都可以识别到这种模式：

from typing import TypeAlias

Card: TypeAlias = tuple[str, str]          # ('', '')
Deck: TypeAlias = list[Card]               # [('', '')]

Type guards用于缩小 type union 的范围。

三
new disassembler of 2020GKCTF-EzMachine

一般这种disassembler都是逐渐去优化的，优化到最后能使用https://docs.pwntools.com/en/stable/asm.html#pwnlib.asm.make_elf_from_assembly。

Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

直接装配成一个elf。

1：建立指令类型，写出parse

◆Ezmachine-disassembler-parsefunc.py

from collections import namedtuplefrom dataclasses import dataclass@dataclassclass Regs(object):    idx: int    def __repr__(self):        if self.idx == 0:            return "eax"        elif self.idx == 1:            return "ebx"        elif self.idx == 2:            return "ecx"        elif self.idx == 3:            return "edx"        else:            return "unknown reg {}".format(self.idx)Nop = namedtuple("Nop", ["addr"])  # case 0: nopMovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, immPushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push immPushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push regPopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'PrintStr = namedtuple("PrintStr", ["addr"])AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, regSubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, regMulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, regDivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, regXorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, regJmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addrCmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, regJz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addrJnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addrJg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addrJl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr# case 17: gets(mem); eax=strlen(mem);InputStr = namedtuple("InputStr", ["addr"])InitMem = namedtuple(    "InitMem", ["addr", "mem_addr", "sz"])  # case 18: memset(mem_addr, 0, sz)MovRegStack = namedtuple(    "MovRegStack", ["addr", "dst", "src"])  # case 19: mov reg, [ebp-src]MovRegMem = namedtuple(    "MovRegMem", ["addr", "dst", "src"])  # case 20: mov reg, mem[src]Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)def parse(buffer):    instructions = []    pc = 0    while pc < len(buffer):        opcode = buffer[pc]        match opcode:            case 0:                instructions.append(Nop(pc))                pc += 1            case 1:                dst = buffer[pc + 1]                imm = buffer[pc + 2]                instructions.append(MovReg(pc, Regs(dst), imm))                pc += 3            case 2:                imm = buffer[pc + 1]                instructions.append(PushImm(pc, imm))                pc += 3            case 3:                reg = buffer[pc + 1]                instructions.append(PushReg(pc, Regs(reg)))                pc += 3            case 4:                reg = buffer[pc + 1]                instructions.append(PopReg(pc, Regs(reg)))                pc += 3            case 5:                instructions.append(PrintStr(pc))                pc += 3            case 6:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(AddReg(pc, Regs(dst), Regs(src)))                pc += 3            case 7:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(SubReg(pc, Regs(dst), Regs(src)))                pc += 3            case 8:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MulReg(pc, Regs(dst), Regs(src)))                pc += 3            case 9:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(DivReg(pc, Regs(dst), Regs(src)))                pc += 3            case 10:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(XorReg(pc, Regs(dst), Regs(src)))                pc += 3            case 11:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jmp(pc, target))                pc += 3            case 12:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(Cmp(pc, Regs(dst), Regs(src)))                pc += 3            case 13:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jz(pc, target))                pc += 3            case 14:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jnz(pc, target))                pc += 3            case 15:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jg(pc, target))                pc += 3            case 16:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jl(pc, target))                pc += 3            case 17:                instructions.append(InputStr(pc))                pc += 3            case 18:                mem_addr = buffer[pc + 1]                sz = buffer[pc + 2]                instructions.append(InitMem(pc, mem_addr, sz))                pc += 3            case 19:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))                pc += 3            case 20:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))                pc += 3            case 255:                instructions.append(Exit(pc))                pc += 3            case _:                raise Exception(f"unknown opcode: {opcode} at {pc}")                break    return instructionsif __name__ == '__main__':    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]    instructions = parse(opcode)        for ins in instructions:            print(ins)

◆Ezmachine-disassembler-parsefunc.out

MovReg(addr=0, dst=edx, imm=3)PrintStr(addr=3)InputStr(addr=6)MovReg(addr=9, dst=ebx, imm=17)Cmp(addr=12, dst=eax, src=ebx)Jz(addr=15, target=27)MovReg(addr=18, dst=edx, imm=1)PrintStr(addr=21)Exit(addr=24)MovReg(addr=27, dst=ecx, imm=0)MovReg(addr=30, dst=eax, imm=17)Cmp(addr=33, dst=eax, src=ecx)Jz(addr=36, target=126)MovRegMem(addr=39, dst=eax, src=ecx)MovReg(addr=42, dst=ebx, imm=97)Cmp(addr=45, dst=eax, src=ebx)Jl(addr=48, target=75)MovReg(addr=51, dst=ebx, imm=122)Cmp(addr=54, dst=eax, src=ebx)Jg(addr=57, target=75)MovReg(addr=60, dst=ebx, imm=71)XorReg(addr=63, dst=eax, src=ebx)MovReg(addr=66, dst=ebx, imm=1)AddReg(addr=69, dst=eax, src=ebx)Jmp(addr=72, target=105)MovReg(addr=75, dst=ebx, imm=65)Cmp(addr=78, dst=eax, src=ebx)Jl(addr=81, target=105)MovReg(addr=84, dst=ebx, imm=90)Cmp(addr=87, dst=eax, src=ebx)Jg(addr=90, target=105)MovReg(addr=93, dst=ebx, imm=75)XorReg(addr=96, dst=eax, src=ebx)MovReg(addr=99, dst=ebx, imm=1)SubReg(addr=102, dst=eax, src=ebx)MovReg(addr=105, dst=ebx, imm=16)DivReg(addr=108, dst=eax, src=ebx)PushReg(addr=111, reg=ebx)PushReg(addr=114, reg=eax)MovReg(addr=117, dst=ebx, imm=1)AddReg(addr=120, dst=ecx, src=ebx)Jmp(addr=123, target=30)PushImm(addr=126, imm=7)PushImm(addr=129, imm=13)PushImm(addr=132, imm=0)PushImm(addr=135, imm=5)PushImm(addr=138, imm=1)PushImm(addr=141, imm=12)PushImm(addr=144, imm=1)PushImm(addr=147, imm=0)PushImm(addr=150, imm=0)PushImm(addr=153, imm=13)PushImm(addr=156, imm=5)PushImm(addr=159, imm=15)PushImm(addr=162, imm=0)PushImm(addr=165, imm=9)PushImm(addr=168, imm=5)PushImm(addr=171, imm=15)PushImm(addr=174, imm=3)PushImm(addr=177, imm=0)PushImm(addr=180, imm=2)PushImm(addr=183, imm=5)PushImm(addr=186, imm=3)PushImm(addr=189, imm=3)PushImm(addr=192, imm=1)PushImm(addr=195, imm=7)PushImm(addr=198, imm=7)PushImm(addr=201, imm=11)PushImm(addr=204, imm=2)PushImm(addr=207, imm=1)PushImm(addr=210, imm=2)PushImm(addr=213, imm=7)PushImm(addr=216, imm=2)PushImm(addr=219, imm=12)PushImm(addr=222, imm=2)PushImm(addr=225, imm=2)MovReg(addr=228, dst=ecx, imm=1)MovRegStack(addr=231, dst=ebx, src=ecx)PopReg(addr=234, reg=eax)Cmp(addr=237, dst=eax, src=ebx)Jnz(addr=240, target=270)MovReg(addr=243, dst=ebx, imm=34)Cmp(addr=246, dst=ecx, src=ebx)Jz(addr=249, target=264)MovReg(addr=252, dst=ebx, imm=1)AddReg(addr=255, dst=ecx, src=ebx)Jmp(addr=258, target=231)MovReg(addr=261, dst=edx, imm=0)PrintStr(addr=264)Exit(addr=267)MovReg(addr=270, dst=edx, imm=1)PrintStr(addr=273)Exit(addr=276)Nop(addr=279)

拿parsefunc.out的原因是检查parse及指定类型定义是否合理。

2：编写初步dump

◆Ezmachine-disassembler-version0.py

from collections import namedtuplefrom dataclasses import dataclass@dataclassclass Regs(object):    idx: int    def __repr__(self):        if self.idx == 0:            return "eax"        elif self.idx == 1:            return "ebx"        elif self.idx == 2:            return "ecx"        elif self.idx == 3:            return "edx"        else:            return "unknown reg {}".format(self.idx)Nop = namedtuple("Nop", ["addr"])  # case 0: nopMovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, immPushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push immPushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push regPopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'PrintStr = namedtuple("PrintStr", ["addr"])AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, regSubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, regMulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, regDivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, regXorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, regJmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addrCmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, regJz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addrJnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addrJg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addrJl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr# case 17: gets(mem); eax=strlen(mem);InputStr = namedtuple("InputStr", ["addr"])InitMem = namedtuple(    "InitMem", ["addr", "mem_addr", "sz"])  # case 18: memset(mem_addr, 0, sz)MovRegStack = namedtuple(    "MovRegStack", ["addr", "dst", "src"])  # case 19: mov reg, [ebp-src]MovRegMem = namedtuple(    "MovRegMem", ["addr", "dst", "src"])  # case 20: mov reg, mem[src]Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)def parse(buffer):    instructions = []    pc = 0    while pc < len(buffer):        opcode = buffer[pc]        match opcode:            case 0:                instructions.append(Nop(pc))                pc += 1            case 1:                dst = buffer[pc + 1]                imm = buffer[pc + 2]                instructions.append(MovReg(pc, Regs(dst), imm))                pc += 3            case 2:                imm = buffer[pc + 1]                instructions.append(PushImm(pc, imm))                pc += 3            case 3:                reg = buffer[pc + 1]                instructions.append(PushReg(pc, Regs(reg)))                pc += 3            case 4:                reg = buffer[pc + 1]                instructions.append(PopReg(pc, Regs(reg)))                pc += 3            case 5:                instructions.append(PrintStr(pc))                pc += 3            case 6:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(AddReg(pc, Regs(dst), Regs(src)))                pc += 3            case 7:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(SubReg(pc, Regs(dst), Regs(src)))                pc += 3            case 8:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MulReg(pc, Regs(dst), Regs(src)))                pc += 3            case 9:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(DivReg(pc, Regs(dst), Regs(src)))                pc += 3            case 10:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(XorReg(pc, Regs(dst), Regs(src)))                pc += 3            case 11:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jmp(pc, target))                pc += 3            case 12:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(Cmp(pc, Regs(dst), Regs(src)))                pc += 3            case 13:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jz(pc, target))                pc += 3            case 14:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jnz(pc, target))                pc += 3            case 15:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jg(pc, target))                pc += 3            case 16:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jl(pc, target))                pc += 3            case 17:                instructions.append(InputStr(pc))                pc += 3            case 18:                mem_addr = buffer[pc + 1]                sz = buffer[pc + 2]                instructions.append(InitMem(pc, mem_addr, sz))                pc += 3            case 19:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegStack(pc, Regs(dst), src))                pc += 3            case 20:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegMem(pc, Regs(dst), src))                pc += 3            case 255:                instructions.append(Exit(pc))                pc += 3            case _:                raise Exception(f"unknown opcode: {opcode} at {pc}")                break    return instructionsdef dump(instructions):    for ins in instructions:        match ins:            case Nop(addr):                print(f"_0x{addr:04x}: nop")            case MovReg(addr, dst, imm):                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")            case PushImm(addr, imm):                print(f"_0x{addr:04x}: push 0x{imm:02x}")            case PushReg(addr, reg):                print(f"_0x{addr:04x}: push {reg}")            case PopReg(addr, reg):                print(f"_0x{addr:04x}: pop {reg}")            case PrintStr(addr):                print(f"_0x{addr:04x}: print_str")            case AddReg(addr, dst, src):                print(f"_0x{addr:04x}: add {dst}, {src}")            case SubReg(addr, dst, src):                print(f"_0x{addr:04x}: sub {dst}, {src}")            case MulReg(addr, dst, src):                print(f"_0x{addr:04x}: mul {dst}, {src}")            case DivReg(addr, dst, src):                print(f"_0x{addr:04x}: div {dst}, {src}")            case XorReg(addr, dst, src):                print(f"_0x{addr:04x}: xor {dst}, {src}")            case Jmp(addr, target):                print(f"_0x{addr:04x}: jmp _0x{target:04x}")            case Cmp(addr, dst, src):                print(f"_0x{addr:04x}: cmp {dst}, {src}")            case Jz(addr, target):                print(f"_0x{addr:04x}: jz _0x{target:04x}")            case Jnz(addr, target):                print(f"_0x{addr:04x}: jnz _0x{target:04x}")            case Jg(addr, target):                print(f"_0x{addr:04x}: jg _0x{target:04x}")            case Jl(addr, target):                print(f"_0x{addr:04x}: jl _0x{target:04x}")            case InputStr(addr):                print(f"_0x{addr:04x}: input_str")            case InitMem(addr, mem_addr, sz):                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")            case MovRegStack(addr, dst, src):                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")            case MovRegMem(addr, dst, src):                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")            case Exit(addr):                print(f"_0x{addr:04x}: exit(0)")            case _:                raise Exception(f"unknown instruction: {ins}")                breakif __name__ == '__main__':    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]    instructions = parse(opcode)    dump(instructions)

◆Ezmachine-disassembler-dumpfunc-version0.out

_0x0000: mov edx, 0x03_0x0003: print_str_0x0006: input_str_0x0009: mov ebx, 0x11_0x000c: cmp eax, ebx_0x000f: jz _0x001b_0x0012: mov edx, 0x01_0x0015: print_str_0x0018: exit(0)_0x001b: mov ecx, 0x00_0x001e: mov eax, 0x11_0x0021: cmp eax, ecx_0x0024: jz _0x007e_0x0027: mov eax, mem[2]_0x002a: mov ebx, 0x61_0x002d: cmp eax, ebx_0x0030: jl _0x004b_0x0033: mov ebx, 0x7a_0x0036: cmp eax, ebx_0x0039: jg _0x004b_0x003c: mov ebx, 0x47_0x003f: xor eax, ebx_0x0042: mov ebx, 0x01_0x0045: add eax, ebx_0x0048: jmp _0x0069_0x004b: mov ebx, 0x41_0x004e: cmp eax, ebx_0x0051: jl _0x0069_0x0054: mov ebx, 0x5a_0x0057: cmp eax, ebx_0x005a: jg _0x0069_0x005d: mov ebx, 0x4b_0x0060: xor eax, ebx_0x0063: mov ebx, 0x01_0x0066: sub eax, ebx_0x0069: mov ebx, 0x10_0x006c: div eax, ebx_0x006f: push ebx_0x0072: push eax_0x0075: mov ebx, 0x01_0x0078: add ecx, ebx_0x007b: jmp _0x001e_0x007e: push 0x07_0x0081: push 0x0d_0x0084: push 0x00_0x0087: push 0x05_0x008a: push 0x01_0x008d: push 0x0c_0x0090: push 0x01_0x0093: push 0x00_0x0096: push 0x00_0x0099: push 0x0d_0x009c: push 0x05_0x009f: push 0x0f_0x00a2: push 0x00_0x00a5: push 0x09_0x00a8: push 0x05_0x00ab: push 0x0f_0x00ae: push 0x03_0x00b1: push 0x00_0x00b4: push 0x02_0x00b7: push 0x05_0x00ba: push 0x03_0x00bd: push 0x03_0x00c0: push 0x01_0x00c3: push 0x07_0x00c6: push 0x07_0x00c9: push 0x0b_0x00cc: push 0x02_0x00cf: push 0x01_0x00d2: push 0x02_0x00d5: push 0x07_0x00d8: push 0x02_0x00db: push 0x0c_0x00de: push 0x02_0x00e1: push 0x02_0x00e4: mov ecx, 0x01_0x00e7: mov ebx, [ebp-2]_0x00ea: pop eax_0x00ed: cmp eax, ebx_0x00f0: jnz _0x010e_0x00f3: mov ebx, 0x22_0x00f6: cmp ecx, ebx_0x00f9: jz _0x0108_0x00fc: mov ebx, 0x01_0x00ff: add ecx, ebx_0x0102: jmp _0x00e7_0x0105: mov edx, 0x00_0x0108: print_str_0x010b: exit(0)_0x010e: mov edx, 0x01_0x0111: print_str_0x0114: exit(0)_0x0117: nop

其实这里拿到的Ezmachine-disassembler-dumpfunc-version0.out，就跟以前我们的disassembler得到的差不多。

拿这个dumpfunc-version0.out的目的，就是为了参考这个去做优化。

3：优化

- (1) 添加函数头尾

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_QUTPQFYHWRH7AJ4.webp)

由于头和尾都是直接开始的指令，没有栈帧，我们为其添加

```python
from collections import namedtuple
from dataclasses import dataclass

......

# 优化(1): 添加main函数序言和结尾
prologue = namedtuple("prologue", [])
epilogue = namedtuple("epilogue", [])
def add_main_prologue_epilogue(instructions):
    instructions.insert(0, prologue())
    instructions.append(epilogue())
    return instructions

def dump(instructions):
    for ins in instructions:
        match ins:
            case prologue():
                print(f"push ebp")
                print(f"mov ebp, esp")
            case epilogue():
                print(f"mov esp, ebp")
                print(f"pop ebp")
                print(f"ret")
           ......
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break

if __name__ == '__main__':
    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    dump(instructions)
```

- (2) 处理VM中mem及字符串

```python
.....

# VM中要使用的内存
def dump_data():
    print("n")
    print("""right:n    .asciz "right" """)
    print("""wrong:n    .asciz "wrong" """)
    print("""plz_input:n    .asciz "plz input:" """)
    print("""hacker:n    .asciz "hacker" """)
    print("""mem:n    .space 0x100 """)

if __name__ == '__main__':
        opcode = [...]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    dump(instructions)
    dump_data()
```

- (3) 处理print_str

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4U5PDW3GE26ETHF.webp)

我们弄出来的汇编中有这种语句

```python
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
```

其主要就是根据edx的值，来打印不同的字符串

难以避免的要进行函数调用，我们可以借用pwntools的shellcraft来产生：https://docs.pwntools.com/en/stable/shellcraft/i386.html#module-pwnlib.shellcraft.i386.linux

```python
from collections import namedtuple
from dataclasses import dataclass

.....
write_func_call = namedtuple("write_func_call", ["addr", "str_idx"])
# 优化(3): 处理print_str
def handle_print_str(instructions):
    """
    _0x0000: mov edx, 0x03
    _0x0003: print_str

    _0x0012: mov edx, 0x01
    _0x0015: print_str

    _0x0105: mov edx, 0x00
    _0x0108: print_str

    _0x010e: mov edx, 0x01
    _0x0111: print_str
    """
    idx = 0
    while idx < len(instructions):
        match instructions[idx: idx+2]:
            case [
                MovReg(addr1, Regs(3), imm),
                PrintStr(addr2)
            ] if (imm == 0x00 or imm == 0x01 or imm == 0x03 or imm == 0x04):
                instructions[idx: idx+2] = [write_func_call(addr2, imm)]
        idx += 1

def dump(instructions):
    for ins in instructions:
        match ins:
                        ......
            case write_func_call(addr, str_idx):
                if str_idx == 0:
                    print_right = f"""/* write(fd=1, buf='right', n=5) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, right
    push 5
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_right)
                elif str_idx == 1:
                    print_wrong = f"""/* write(fd=1, buf='wrong', n=5) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, wrong
    push 5
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_wrong)
                elif str_idx == 3:
                    print_plz_input = f"""/* write(fd=1, buf='plz input:', n=10) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, plz_input
    push 10
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_plz_input)
                elif str_idx == 4:
                    print_hacker = f"""/* write(fd=1, buf='hacker', n=6) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, hacker
    push 6
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_hacker)
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break

......
```

- (4) 处理input_str

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_X3P96QF84NKZ4PR.webp)

```python
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
```

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_JAXQMQ975NMVWXV.webp)

```python
from collections import namedtuple
from dataclasses import dataclass

......

read_strlen_func_call = namedtuple("read_func_call", ["addr"])
# 优化(4): 处理input_str
def handle_input_str(instructions):
    """
    _0x0006: input_str
    """
    idx = 0
    while idx < len(instructions):
        match instructions[idx: idx+1]:
            case [
                InputStr(addr)
            ]:
                instructions[idx: idx+1] = [read_strlen_func_call(addr)]
        idx += 1

def dump(instructions):
    for ins in instructions:
        match ins:
                        ......
            case read_strlen_func_call(addr):
                print_read_strlen = f"""/* read(fd=0, buf=mem, n=0x100) */
_0x{addr:04x}: push eax
    push ebx
    push ecx
    push edx
    xor ebx, ebx
    mov ecx, mem
    push 0x100
    pop edx
    push SYS_read  /* 3 */
    pop eax
    int 0x80

    /* strlen(mem) */
    mov edi, mem
    xor eax, eax
    push -1
    pop ecx
    repnz scas al, BYTE PTR [edi]
    inc ecx
    inc ecx
    neg ecx
    /* moving ecx into ecx, but this is a no-op */
    mov edi, ecx
    pop edx
    pop ecx
    pop ebx
    pop eax
    mov eax, edi
"""
                print(print_read_strlen)
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break

# 优化(2): VM中要使用的内存
def dump_data():
    print("n")
    print("""right:n    .asciz "right" """)
    print("""wrong:n    .asciz "wrong" """)
    print("""plz_input:n    .asciz "plz input:" """)
    print("""hacker:n    .asciz "hacker" """)
    print("""mem:n    .space 0x100 """)

if __name__ == '__main__':
    opcode = [.....]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    handle_print_str(instructions)
    handle_input_str(instructions)
    dump(instructions)
    dump_data()
```

- (5) 处理exit(0)

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_XYKFP696Y5UC9UJ.webp)
```python
case Exit(addr):
                print(f"""/* exit(status=0) */
_0x{addr:04x}: xor ebx, ebx
    push SYS_exit  /* 1 */
    pop eax
    int 0x80
""")
```

- (6) 优化mov ebx, [ebp-ecx]

这种asm是会报错的

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_MVYBK4BXJK84HFF.webp)

换成如下这种

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_9RHVZ9B8HUX8NJE.webp)

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4BK2G6J8HPXACSM.webp)

```python
case MovRegStack(addr, dst, src):
    # print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
    print(f"_0x{addr:04x}: mov {dst}, ebp")
    print(f"    sub {dst}, {src}")
    print(f"    mov {dst}, [{dst}]")
```

- (7) 优化_0x006c: div eax, ebx

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_2T8PCPWA695R3CV.webp)

正常的div ebx执行之后，商将存储在 eax 寄存器中，余数将存储在 edx 寄存器中

它的div有所不同，是存到eax和ebx中的

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_P25BCYVPP4BEK4M.webp)

我们还需要在div eax, ebx后面，加一条mov ebx, edx

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_PV5TZ5K9T466DWK.webp)

Ezmachine-disassembler.py（https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/0c2d246f-a2d4-484c-8671-4d65e9ac8fa1/Ezmachine-disassembler.py）

Ezmachine-disassembler-out.asm（https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/1edf8fac-54c2-42ba-84ac-1e46937eaf1e/Ezmachine-disassembler-out.asm）

4：调用pwntools make_elf

Ezmachine-asm_compile.py（https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/a6df480c-d615-4ab5-a2bd-dee5da074416/Ezmachine-asm_compile.py）

from ast import dump
from pwn import *

code = """
push ebp
mov ebp, esp
.....
ret

right:
    .asciz "right" 
wrong:
    .asciz "wrong" 
plz_input:
    .asciz "plz input:" 
hacker:
    .asciz "hacker" 
mem:
    .space 0x100 
"""

elf = make_elf_from_assembly(code)
print(elf)

效果：