Explore Joe Security Cloud Basic Accounts Subscribe to our Newsletters Contact Us
top title background image

Joe Security's Blog

Introduction Yara Rule Generator

Published on: 12.02.2015

A couple of months ago we started to work on a new feature for Joe Sandbox we call Yara Rule Generator. Yara is a well known pattern matching engine built for the purpose of writing simple malware detection rules:


Yara main use is to detect APT and advanced threats which AV does not detect that quickly. A big part of Joe Security's customers use Yara on a daily basis. Due to that we got many requests about adding a feature to Joe Sandbox to automatically generate Yara rules and finally decided to take up that challenge.

Today we release a new free service you find at Yara Rule Generator creates Yara rules automatically based on behavior data such as files and memory captured by Joe Sandbox.

How does the Joe Sandbox Yara Rule Generator work and what kind of rules does it generate? The generator creates three different rules per submitted sample:

File rules enable to search for the submitted sample. Dropped rules are rules generated out of files which have been created or downloaded by the initial sample during dynamic analysis. Memory opcode rules finally are generated by using memory dumps. File and dropped rules enable  to scan for the particular sample on the file system. Memory opcode rules on the other hand allow to find malware in the process memory (you can specify a process id as a target if you launch Yara or use our batch file to scan all processes) of a target system.

Further a rule can be a simple or super rule. Simple rule are specific to the submitted sample and its behavior. Therefore they do not match variants of the same malware. Super rules are generic and are built over a set of uploaded samples / behavior. Since they only capture common behavior they often find malware variants:

To generate rules the Joe Sandbox Yara Rule Generator extracts different kind of behavior data such as:

  • PE structure data (e.g. section names)
  • Strings (unicode and ascii)
  • Code sequences (e.g. entrypoint)
  • Opcodes sequences from HCA (Hybrid Code Analysis)
All the extracted artifacts are then rated based on knowledge, entropy and location information. After artifact selection a test rule is generated and it's false positive rate measured by using a reference goodware set. Finally the rule is taken if the false positive rate is acceptable.

For super rules Joe Sandbox Yara Rule Generator uses an efficient clustering algorithm to find common opcode sequences.

Results look very promising. To test super rules we have generated rules by using malware family sets. We took three samples out of the set and generated super rules. We then infected a test system with a fourth sample of the same family and searched it with our rules:

Of course also the file and dropped rules work well:

However please note that the Yara Rule Generator is no silver bullet. Creation of simple and super rule is tricky and far from perfect. During the development of version 1.0.0 we spot lot of areas for improvements. All the rules are well commented and documented. Therefore it is simple to extend or change the rules.

The Yara Rule Generator has already been deeply integrated into the Joe Sandbox platform and will be shipped with the next major release.

Happy Rule Creation!

Update 1:

We were inspired by yaraGen from Florian Roth as well as