Using VBA Emulation to Analyze Obfuscated Macros

ViperMonkey is an experimental toolkit that I have been developing since early 2015, to parse VBA macros and emulate their execution. This articles shows how it can be used to analyze obfuscated macros and extract hidden strings/IOCs.

ViperMonkey

ViperMonkey is a Python project including a VBA macro parser, a VBA emulation engine and a set of tools for malicious macro analysis. I mentioned it several times since early 2015 [SSTIC, MISC], but it is only recently that it reached the point where it could be used in practice. Indeed, emulating VBA macros execution and all the features of MS Office, ActiveX objects and DLLs used by malicious macros is extremely complex. ViperMonkey is still very, very far from implementing all those features, and it is not yet able to handle most real-life macros. However, in some cases it can be a great help to deobfuscate macros automatically for malware analysis. I decided to release it publicly on GitHub, so that malware analysts can start using it, and contribute to its development.

This is how it works:

  1. VBA Macro source code is extracted from MS Office files using olevba.
  2. The code is parsed using a VBA grammar defined with pyparsing, inspired from the official specifications MS-VBAL.
  3. The parser transforms the VBA code into a structured object model (Python objects).
  4. A custom VBA engine emulates the execution of the code, simulating MS Office features, ActiveX objects and DLLs used by most malicious macros.
  5. Specific actions of interest such as calling DLLs, running code, downloading or writing files are captured and recorded.
  6. The results can be used by malware analysts, to better understand the behaviour of the macro, and to extract obfuscated strings/IOCs.

Let's look at several real-life examples.

Sample 1

DIAN_caso-5415.doc is a quite old malware sample from 2009. Its VBA macro code is not very obfuscated, but it is a good example of a simple downloader and dropper.

Let's run vmonkey on that sample:

python vmonkey.py DIAN_caso-5415.doc.zip -z infected

First, vmonkey displays the VBA source code extracted from the file:

Then vmonkey parses the VBA code, and displays the list of procedures found (Subs, Functions, DLL functions):

vmonkey uses its VBA engine to emulate the macro execution, starting from specific entry points such as "Auto_Open":

After execution has completed, vmonkey shows the actions of interest that were recorded, with their parameters:

Sample 2

Let's now look at a more recent sample (August 2016): a5e14eecf6beb956732790b05df001ce4fe0f001022f75dd1952d529d2eb9c11

This code is much more obfuscated. Looking at it more closely, there are many calls to a small function called JTCKC() which decodes obfuscated strings.

The algorithm is simple yet effective: it extracts two characters, ignores the third character. The two characters form an hexadecimal code. They are concatenated with "&H" and then used with Chr() to obtain a decoded character. For example, Chr("&H64") returns "d". The same algorithm is applied to the rest of the string, to each group of three characters.

This is interesting: this technique of using Chr("&Hxx") is not documented in the official VBA specifications MS-VBAL. The parameter of the Chr() function is expected to be an integer, not a string. Therefore, I had to modify the ViperMonkey engine to match the real behaviour of MS Office.

Now that we know the decoding algorithm, it would be possible to manually apply it to all the obfuscated strings in the code. But that would be tedious, and it looks like the algorithm is applied twice to get to the final payload.

Fortunately, vmonkey can do it automatically for us. The emulation engine is extremely slow, but eventually we get useful results at the end:

Et voila: it turns out the real payload is actually a piece of JScript code, that downloads an executable file from hxxp://216.170.126.3/wfil/file.exe and runs it.

Looking more closely at the VBA code, the JScript code is not written to a file on disk, but it uses an ActiveX object called "ScriptControl" to run it directly from memory.

Sample 3

Here is another recent malware sample from September 2016: scan_092016_9534905854.docm (SHA256 9eebae3cce1e63f01eaa6867ae8537cf19162527bb7e7d752282f0e20dc03a66)

This one is also quite obfuscated, but in a completely different way: there are dozens of very small functions, which seem to provide small chunks of strings:

Afterwards, other functions concatenate all these chunks of code to build the payload, and then run it:

There again, it would be possible to reconstruct the payload manually by following the code step by step, but it would take lots of effort.

vmonkey can execute the code for us and show the end result:

Like the previous sample, it uses a ScriptControl object to run JScript code. The payload simply downloads an executable file from hxxp://northsidecollisiona2.com/load5.exe and runs it using cmd.exe.

Using ViperMonkey

Let's be honest: ViperMonkey works great on those three samples, but its parser and engine are still *very* incomplete. Expect it to break on most real-life samples.

This is a personal open-source project, only developed on my scarce spare time. And the task is huge, considering the complexity of the VBA language, MS Office features and all the DLLs and ActiveX objects that malware can use.

So if you find this project useful and would like to help, please do. You can either submit pull requests, issues and ideas on the ViperMonkey GitHub repository. Or you may contact me directly if you have specific questions.