Reply to comment

OleFileIO_PL - a Python module to read MS OLE2 files

OleFileIO_PL is a Python module to read Microsoft OLE2 files (also called Structured Storage or Compound Document File Format), such as Microsoft Office documents, Image Composer and FlashPix files, Outlook messages, ... This is an improved version of the OleFileIO module from PIL, the excellent Python Imaging Library v1.1.6 (See: http://www.pythonware.com/products/pil/index.htm), created and maintained by Fredrik Lundh.

As far as I know, this module is now the most complete and robust Python implementation to read MS OLE2 files, portable on several OSes. (please tell me if you know other similar Python modules)

WARNING: THIS IS (STILL) WORK IN PROGRESS.

Main improvements over PIL version:

  • Better compatibility with Python 2.4, 2.5 and 2.6
  • Support for files larger than 6.8MB
  • Robust: many checks to detect malformed files
  • Improved API
  • Added setup.py and install.bat to ease installation

News

  • 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.
  • 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.
  • 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)
  • see changelog in source code for more info.

License

OleFileIO_PL changes are Copyright (c) 2005-2010 by Philippe Lagadec.

The Python Imaging Library (PIL) is

- Copyright (c) 1997-2005 by Secret Labs AB

- Copyright (c) 1995-2005 by Fredrik Lundh

By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:

Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.

SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Download:

Pick zip file below on this page.

How to use this module:

See sample code at the end of the module, and also docstrings.

Here are a few examples:

import OleFileIO_PL

# Test if a file is an OLE container:
assert OleFileIO_PL.isOleFile('myfile.doc')

# Open OLE file:
ole = OleFileIO_PL.OleFileIO('myfile.doc')

# Get list of streams:
print ole.listdir()

# Test if known streams/storages exist:
if ole.exists('worddocument'):
    print "This is a Word document."
    print "size :", ole.get_size('worddocument')
    if ole.exists('macros/vba'):
         print "This document seems to contain VBA macros."
# Extract the "Pictures" stream from a PPT file:
if ole.exists('Pictures'):
    pics = ole.openstream('Pictures')
    data = pics.read()
    f = open('Pictures.bin', 'w')
    f.write(data)
    f.close()

It can also be used as a script from the command-line to display the structure of an OLE file, for example:

OleFileIO_PL.py myfile.doc

How to contribute:

If you would like to help us improve this module, or simply provide feedback, please send an e-mail to decalage(at)laposte.net. You can also help in many ways:

  • test this module on different platforms / Python versions
  • find and report bugs
  • improve documentation, code samples, docstrings
  • write unittest test cases
  • provide tricky malformed files

To report a bug, for example a normal file which is not parsed correctly, please send an e-mail with an attachment containing the debugging output of OleFileIO_PL.

For this, launch the following command :

OleFileIO_PL.py -d -c file >debug.txt 
AttachmentSize
OleFileIO_PL-0.18.zip23.04 KB
OleFileIO_PL-0.19.zip23.22 KB
OleFileIO_PL-0.20.zip23.61 KB
OleFileIO_PL-0.21.zip23.9 KB

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <b> <address> <blockquote> <br> <caption> <center> <code> <dd> <del> <div> <dl> <dt> <em> <font> <h2> <h3> <h4> <h5> <h6> <hr> <i> <img> <li> <ol> <p> <pre> <span> <strong> <sub> <sup> <table> <tbody> <td> <tfoot> <th> <thead> <tr> <u> <ul> <tr>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]".
  • Use [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] to insert a mediawiki style collapsible table of contents. All the arguments are optional.

More information about formatting options