Reply to comment

MS Office legacy/binary formats security (doc, xls, ppt, ...)

This article describes the Microsoft Office legacy/binary file formats (doc, xls, ppt), related security issues and useful resources. [WORK IN PROGRESS]

The original location of this page is http://www.decalage.info/file_formats_security/office.

File format description

MS Office binary formats are widely used:

  • Word documents (.doc) for texts.
  • Excel workbook (.xls) for worksheets with numeric values.
  • PowerPoint (.ppt) for presentations.

Except for very old MS Office versions, all these formats share the same basic container structure, either called OLE2, structured storage or compound document.

MS Office also contain other applications such as MS Access which use different file formats not based on the OLE2 format.

Since MS Office 2007, new file formats based on XML (docx, xslx, pptx) are used by default. These formats will be covered in a future article.

Main client applications

The main applications used to open MS Office files are part of the MS Office suite:

  • MS Word for documents
  • MS Excel for workbooks
  • MS Powerpoint for presentations

Many alternative applications are also able to open MS Office files, such as OpenOffice, StarOffice, GNOME Office and KOffice.

Main security issues

  • VBA macros
  • OLE objects (particularly Package objects)

Format specifications and technical information

Publications about MS Office formats security issues

  • Malware and File formats, P. Lagadec, 2003: SSTIC03
  •  

Examples of known vulnerabilities and exploits

  •  

Useful analysis tools

Parsing tools and libraries

  • OleFileIO_PL: a Python module to parse and read MS OLE2 files.
  • xlrd/xlwt: Python modules to read and create (not modify) MS Excel files.
  • POIFS: a Java library to read and write MS OLE2 files.

Filtering tools and libraries

  • ExeFilter: to sanitize MS Office files by removing macros and OLE Package objects.

 

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <b> <address> <blockquote> <br> <caption> <center> <code> <dd> <del> <div> <dl> <dt> <em> <font> <h2> <h3> <h4> <h5> <h6> <hr> <i> <img> <li> <ol> <p> <pre> <span> <strong> <sub> <sup> <table> <tbody> <td> <tfoot> <th> <thead> <tr> <u> <ul> <tr>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].
  • Use [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] to insert a mediawiki style collapsible table of contents. All the arguments are optional.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.