As you may know Joe Sandbox is a very extensive analysis system. Hybrid Code Analysis Technology, a key part of Joe Sandbox, enables to analyze executed and non-executed code paths - things a malware would execute only if a specific event triggers. Rather than looking at the installation behavior of the malware which is often a tiny part, HCA provides a complete overview of all the behavior. In addition it enriches disassembly codes with dynamic information such as API calls, arguments, strings and constants, making the understanding of the code much easier. Finally HCA links dynamic data such as a "File Create" event with its underlying code function. This enables to understand the context and relation between different events. All in all HCA is a very powerful and unique technology that helps to understand today's complex threats more efficiently. Below you see an excellent example of the HCA output:
During extensive field tests with HCA and several great customer feedbacks over the last years we realized that HCA generates very rich output describing the malware behavior nearly completely in most cases. Therefore we started some research in order to use HCA output to do malware similarity analysis and clustering. Today I am happy and excited to release a new Joe Security Product called Joe Sandbox Class
which is based on this research.
Joe Sandbox Class
is a plugin for Joe Sandbox Desktop, Complete and Light. On a high-level Joe Sandbox Class works as outlined in the following picture:
As a first step Joe Sandbox Class selects some features from analysis data generated by Joe Sandbox. Due to its quality mainly HCA code functions are used. The analysed code functions are always based on malicious data, e.g. from the sample under analysis. They do not include Windows or third party library functions. In a second step the features are generalized in order to match variations (different code functions with the same behavior). Next Joe Sandbox Class reduces noise. Noise can be understood as functions which often occur in malicious code and are related to setup functions introduced by compilers and runtime environments. Right after that Joe Sandbox searches its database for similar features. Finally all features are stored in the database and a detailed report in HTML, XML and JSON is generated. Joe Sandbox Class also compiles a graph outlining the different malware groups respectively clusters.
Finding Similar Samples
There are several cool use-cases for Joe Sandbox Class. Let's have a look at some of the most important:
Assuming that we have the sample install.exe (MD5: 1c95dc2d7af36838341996e6ed50fef1) and we want to see if we have a similar sample within our database:
Joe Sandbox Class has found two similar samples. One of the functions which is similar in the two samples is function 00403329. This function is very huge and has not been executed :
Looking at the behavior analysis report for the two similar samples it becomes obvious that it is the same malware:
Since we know that it is the same malware we can get a more complete picture of its behavior:
We can also find the samples within the cluster graph generated by Joe Sandbox Class:
Another very cool use-case is enhancing detection. Imagine that we have a sample which has some new evasion technique (e.g. Sandbox Overloading with GetSystemTimeAdjustment
) included and therefore could not be detected as malicious by Joe Sandbox. However with the help of Joe Sandbox Class we can find similar samples which may have been detected as malicious by Joe Sandbox. Since they share some common behavior we can automatically classify the sample with the evasion technique as malicious. Finding samples in the cluster graph which have been classified as benign is very simple since the nodes are colored green:
As the graph outlines the Lloyds Message Service_....exe has been classified as benign. However it shares several code functions with malicious samples:
Looking at the behavior report of Lloyds Message Service_...d.exe we encountered that it tried to download additional files which where not available anymore.
As we have outlined there are many interesting use-cases. You can analyze all the interesting clusters, detect campaigns and get an understanding of the threat landscape:
or you can also search for new malware - things you have not seen before. However this requires a very large feature database. We are currently constructing such a database and will move the demonstration of this use-case to a future blog-post.
You would like to test Joe Sandbox Class? We have setup a free analyzer service which can be found at www.class-analyzer.net