olefile - a Python module to read/write MS OLE2 files

olefile (formerly OleFileIO_PL) is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook MSG files, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.

Quick links: Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the author - Repository - Updates on Twitter

News

  • 2017-01-04: moved the documentation to ReadTheDocs
  • 2016-05-20: moved olefile repository to GitHub
  • 2016-02-02 v0.43: fixed issues #26 and #27, better handling of malformed files, use python logging.
  • 2015-01-25 v0.42: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8 instead of Latin-1), fixed bug in listdir with empty storages.
  • 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)
  • 2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams >4K, updated doc and license, improved the setup script.

Download and Install

If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run pip install olefile or easy_install olefile for the first installation.

To update olefile, run pip install -U olefile.

Otherwise, see http://olefile.readthedocs.io/en/latest/Install.html

Features

  • Parse/read/write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, ...
  • List all the streams and storages contained in an OLE file
  • Open streams as files
  • Parse and read property streams, containing metadata of the file

olefile can be used as an independent module or with PIL/Pillow.

olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my python-oletools, which are built upon olefile and provide a higher-level interface.

 

Documentation

Please see the online documentation for more information.

License

See http://olefile.readthedocs.io/en/latest/License.html

 

Other projects using olefile / OleFileIO_PL

  • python-oletools: a package of python tools to analyze OLE files and MS Office documents, mainly for malware analysis and debugging. It includes olebrowse, a graphical tool to browse and extract OLE streams, oleid to quickly identify characteristics of malicious documents, olevba to detect/extract/analyze VBA macros, and pyxswf to extract Flash objects (SWF) from OLE files.
  • oledump: a tool to analyze malicious MS Office documents and extract VBA macros
  • ExeFilter: to scan and clean active content in file formats (e.g. MS Office VBA macros)
  • py-office-tools:  to display records inside Excel and PowerPoint files
  • pyew: a malware analysis tool
  • pyOLEscanner: a malware analysis tool
  • PPTExtractor: to extract images from PowerPoint presentations
  • msg-extractor: to parse MS Outlook MSG files
  • pyhwp: hwp file format python parser
  • RC4-40-brute-office: a tool to crack MS Office files using RC4 40-bit encryption
  • punbup: a tool to extract files from McAfee antivirus quarantine files (.bup)
  • Viper: a framework to store, classify and investigate binary files of any sort for malware analysis (also includes code from oleid)
  • Pillow: the friendly fork of PIL, the Python Image Library
  • Ghiro: a digital image forensics tool
  • Nightmare: A distributed fuzzing testing suite, using olefile to fuzz OLE streams and write them back to OLE files.

 

Comments

I did managed to extract embedded using OleFileIO_PL alone

def extract_embedded_ole()
ole = OleFileIO_PL.OleFileIO( fname )
i = 0
for stream in ole.listdir():
for s in stream:
if type( stream ) == type( [] ) and len( stream ) > 1:
i += 1
if ole.get_type( stream ) == 2 and s in ['Workbook', 'WordDocument', 'Package', 'WordDocument','VisioDocument' ,'PowerPoint Document', "Book", "CONTENTS"]:
ole_stream = ole.openstream( stream )
ole_props = ole.getproperties( ['\x05SummaryInformation'] )
out_dir = fname + ".embeddings/" + "/".join( stream[:-1] )
try:
os.makedirs( out_dir )
except OSError:
pass

#Write out Streams
out_name = out_dir + "/" + os.path.split( fname )[1] + "-emb-" + s + "-" + str( i ) + ".ole"
out_file = open( out_name, 'w+b' )
out_file.write( ole_stream.read() )
out_file.close()