The Power of Execution Graphs Part 1/3

Joe Security's Blog

Published on: 22.04.2015

Introduction

We have been quite busy and will soon release Joe Sandbox 12. It is so far one of the biggest releases we have made and includes several new features such as:

Execution graphs
Yara rule generator (see http://www.yara-generator.net/)
MITM SSL proxy to inspect HTTPS (credits to Daniel Roethlisberger)

63 behavior signatures

Behavior signatures to detect unpacked / dynamic code

More than 10 behavior signatures to detect evasive behavior

Score algorithm with lower FP and FN

System event logging

Slim PCAPs

Per process memory and CPU stats

In this and two follow-up blog posts we are going to outline a new feature called Execution Graphs.

Evading sandboxes is a key feature of today’s advanced threats. To do so malware uses various tricks for checking whether it is running on an analysis system, such as trying to detect if the current system is a virtual / emulated machine or checking whether it is being debugged or analyzed. In such cases, the malware will keep a low profile and avoid exhibiting its actual malicious behavior, potentially evading detection by the malware analysis system. Latest threats also implement generic evasion such as validating user behavior or time and sleep tricks (see blog post http://bit.ly/1uZBmN2 and http://bit.ly/1qNT3Bu).

Since version 7, released in 2012 Joe Sandbox implements a variety of techniques to prevent or detect evasive malware. This includes execution on native systems, analysis of non-executed functions through Hybrid Code Analysis (HCA), specific signatures for identifying evasive patterns as well as cookbooks.

In the last months we have seen a strong increase of more sophisticated evasion techniques in malware which are harder to find. Therefore we have decided to make this topic a key for Joe Security’s research roadmap.

Execution Graphs

One of the new features we added to Joe Sandbox 12 are Execution Graphs. Execution Graphs have been designed to automatically spot evasions but also to help to quickly understand how the malware implements the evasion.

In general an Execution Graph is a highly condensed control flow graph with a focus on API-rich paths. Since it is highly compressed it is easier to understand than a full control flow graph. The graph is composed of nodes representing sections of code and edges correspond to the control-flow (call, jmps etc) of the malware. Each node is labeled with the set of API calls it executes. Nodes are colored to highlight additional properties:

Yellow: the node is a program / thread entry point or a top level function
Orange: the code has been triggered during execution
Red: the code has been unpacked and executed
Grey / blackish: the code has not been executed

Different shapes are used for highlighting graph locations. The diamond-shaped nodes correspond to so-called key decision nodes, in the sense that the process decides at this node to avoid execution of a branch which could lead to interesting key behavior. Thus key decision nodes are especially relevant when browsing the execution graph for evasive behavior. Note that determining whether a decision node is key depends on the execution status of the nodes reachable through its branches (one branch should lead to executed APIs, the other to different non-executed APIs), thus different executions may lead to different key decision nodes.

The following figure shows the initial part of the execution graph for our demo sample (MD5: 0af4ef5069f47a371a0caf22ae2006a6).

Notice how the first few nodes after the entry point (colored yellow) have an orange/red color, while the other nodes are grey/black? Recall that red coloring indicates that the corresponding code has been executed, while black is used for non-executed code.

When zooming in the graph entry node, the following control-flow pattern appears:

The sample execution graph clearly exhibits a very straightforward evasive behavior: there is a key decision point where the GetSystemTime API is called, followed by another key decision and a call to the ExitProcess API. All these nodes are colored in red and thus are executed: the part of the graph starting at GetVersionExA is not executed (grey and black color): the full execution graph includes a lot of non-executed malicious behavior not shown here. The green edges represent so-called rich paths, which allow the analyst to track the most API intensive paths of the execution graph, independently from their actual execution status.A path is considered to be "intensive" if a lot of APIs are executed which appear in malicious codes. Here the rich path leads to some non-executed part of the graph:

The blue edges represent thread creations, and the yellow nodes are thread entry points. In the given sample each created thread has its own malicious payload:

Thread 4098a0: its task is to terminate debugging tools and Antivirus. Function 4095e0 is registered as a callback using the EnumWindows API: it enumerates all top-level windows and checks their title against strings such as "avast", "avira" or "kaspersky" among many others. If the title matches the processes is killed instantly.

Thread 407230 is in charge of persistence and installation behavior.
Thread 407180 spreads its main executable to external drives, since it checks for available system drives and uses API call chains often found in USB drive infection routines (GetDriveType, CopyFile, SetFileAttributes).

Thread 407a80: parses remote commands. It is the main payload thread which acts as a broker.

The structure of the graph as well as all additional properties such as execution coverages or decision nodes are directly passed to the signature interface of Joe Sandbox. This enables to write behavior rules which detect evasive behavior.

We may navigate between the execution graphs and the corresponding assembly code. In the case of sample MD5 0af4ef5069f47a371a0caf22ae2006a6, we can determine that the current system time returned by GetSystemTime is checked in the code associated with the key decision nodes, and depending on its value the sample decides to exit the process or continue with execution:

Same for the command handler found in thread 407a80:

Conclusion

Execution graphs are a powerful tool for detecting and understanding evasive behavior. Due to its form, coloring and node shapes we can spot evasion pattern very efficiently. Since the graph is reduced and simplified this also works with very complex and extensive codes. The structure of the graph and all attributes are fed to the Joe Sandbox signature interface. Therefore we can easily rate and classify evasive behavior within seconds. Since the graph describes the complete behavior and not just the executed path, any behavior can be rated and classified.

During development execution graphs already have proven to be very useful. Therefore we will present some of our detection of more complex behaviors / evasion in two additional blog posts. Stay tuned!

Example Reports for the sample used in the post:

Deep Malware Analysis

The Power of Execution Graphs Part 1/3

Joe Security's Blog

Introduction

Execution Graphs

Conclusion