When developing tools related to MS Office files such as olefile and oletools, it is often necessary to test them on many different samples of various types and sizes. It is quite easy to find malicious samples using malwr.com, hybrid-analysis.com and VirusTotal, just to name a few (see my previous post about that topic). However, finding and downloading a large number of legitimate files is a different challenge. Here are some tips to do it:
"Would it be possible to add a method to olefile that returns bytes that are appended to an OLE file? I have a sample that has encoded EXE appended."
When Didier Stevens asked me that question some time ago, I thought it would be easy, a matter of minutes. Indeed, the OLE format (aka Microsoft Compound File Binary Format) is structured and well specified in MS-CFB.
This article presents several tools that can be used to extract VBA Macros source code from MS Office Documents, for malware analysis and forensics. It also provides an overview of how VBA Macros are stored.
olefile (formerly OleFileIO_PL) is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook MSG files, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
python-oletools is a package of python tools to analyze Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on my olefile parser.
mraptor is a simple tool designed to detect malicious VBA macros in MS Office files, based on characteristics of the VBA code. This article explains how it works, and how it can be used in practice.
Since 2014, malicious macros are coming back. And their success in recent campaigns demonstrates that it is still an effective way to deliver malware, sixteen years after Melissa.
This is a presentation that I gave to the SSTIC symposium in June 2015, translated to English. It explains what malicious macros can do, how their code can be obfuscated, and some of the anti-analysis tricks observed in recent cases. Then it shows several tools that can be used to analyze macros, including oledump and olevba.
From time to time, people report strange malicious documents which are not successfully analyzed by malware analysis tools nor by sandboxes. Let's investigate. (this is a follow-up to the post "Malfunctioning Malware" by Didier Stevens)
olevba is a script to parse OLE and OpenXML files such as MS Office documents (e.g. Word, Excel), to detect VBA Macros, extract their source code in clear text, decode malware obfuscation (Hex/Base64/StrReverse/Dridex) and detect security-related patterns such as auto-executable macros, suspicious VBA keywords used by malware, and potential IOCs (IP addresses, URLs, executable filenames, etc). It is part of the python-oletools package.
This article describes the Microsoft Office 97-2003 legacy/binary file formats (doc, xls, ppt), related security issues and useful resources.