Deep URL Analysis is the core component of Joe Sandbox for Phishing analysis and detection. In this blog post we are going to have a look at how Joe Sandbox performs Deep URL Analysis, what techniques, technologies and tricks are used and how we overcome new challenges added by adversaries.
Joe Sandbox performs deep URL Analysis by dynamically executing a URL or document (containing a URL) in a real environment:
For the example above, a URL is launched in a real Chrome browser on a real Windows 10 x64 system. Since a real browser and operating system are used, Joe Sandbox gains access to a wide range of interesting behavior data including the document object model (DOM) tree, images, all HTML, JavaScript, etc. In addition, screenshots of the desktop as well as the full network data is captured. That massive information is the input for various different detection techniques such as Template Matching, Partial Hashing, OCR, Yara, etc.
Some Phishing pages might not have any brand images at all, but rather only text, and even the text might be part of an image. For those special cases Joe Sandbox uses optical character recognition on the captured screenshots:
Full Analysis: https://www.joesandbox.com/analysis/509223/0/html
Phishing detection is a constant battle and the bad guys continue to find new tricks to bypass detections. Here are some of them which we saw recently.
As a result, the analysis system (also any secure e-Mail gateway) needs to be able to automate all user behavior, such as opening and downloading documents, clicking and following links, etc. If one of the steps fails, the final phishing page is not reached, and no detection is possible. Joe Sandbox includes an extensive user behavior simulation engine that can automatically address such attacks.
Why not add a Captcha before the real phishing page? Well, that is a brilliant idea to prevent a machine from reaching the final page. To solve this challenge Joe Sandbox tries to detect Captcha protected pages itself via template matching and hand crafted signatures:
Full Analysis: https://www.joesandbox.com/analysis/437966/0/html#yara
In order to prevent detection, some phishing pages use Geo Blocking. With that the page is only accessible to visitors from a specific country. The page uses the visitor IP and geo lookup services to determine the country.
Joe Sandbox overcomes this blocking via its built-in localized Internet anonymization feature. This feature enables analysts to select a specific country before analysis. During the analysis, all traffic is then routed through this country:
As we have outlined in this blog post, Joe Sandbox analyzes URLs in a real browser, on a real operation system, in order to extract as many runtime data as possible. The phishing detection is also done using various different detection techniques such as template matching, partial hashing, OCR and hand crafted rules. New evasion tricks are properly handled with automation, additional features and third party integrations.
Overall, Joe Sandbox features one of the most extensive, complete and evasion resistant phishing detection technologies on the market.
Interested in testing Joe Sandbox? Register for free at Joe Sandbox Cloud Basic or contact us for an in-depth technical demo!