Reply to comment

PDFiD - a Python module to analyze and sanitize PDF files

PDF files may be used to trigger malicious content, as described here. PDFiD is a Python tool to analyze and sanitize PDF files, written by Didier Stevens. Here is PDFiD_PL, a version that I have slightly modified so that it can be imported as a module in Python applications (originally for ExeFilter).

Modifications

The modified version is named pdfid_PL.py. The main differences with the original tool are in the PDFiD function:

def PDFiD(file, allNames=False, extraData=False, disarm=False, force=False,
    output_file=None, raise_exceptions=False, return_cleaned=False,
    active_keywords=ACTIVE_KEYWORDS):

The following parameters have been added:

  • output_file: path of output file to be created.
  • raise_exceptions: raise an exception when a parsing error happens, instead of ignoring it.
  • return_cleaned: return a tuple (xmlDoc, cleaned), where cleaned=True if the PDF contained active content which has been cleaned.
  • active_keywords: list of PDF tags to be disabled. Default value: ('/JS', '/JavaScript', '/AA', '/OpenAction', '/JBIG2Decode', '/RichMedia', '/Launch')

All these parameters are optional, so that pdfid_PL.py runs exactly like the original pdfid.py when they are not set.

Changelog

pdfid_PL is updated each time Didier Stevens modifies pdfid:

  • 2010-06-15 v0.0.11b: fixed a bug that happened when using return_cleaned
  • 2010-05-02 v0.0.11: added /Launch to list of keywords to be disabled.
  • 2010-01-11 v0.0.10: relaxed %PDF header checking
  • 2009-10-19 v0.0.9: updated from pdfid v0.0.9
  • 2009-10-09 v0.0.7: initial version based on pdfid v0.0.7.

Download

Pick the attached file below.

Alternatively, you may get the latest version from the ExeFilter SVN.

Sample usage

import pdfid_PL as pdfid
xmldoc, cleaned = pdfid.PDFiD('file.pdf', disarm=True, output_file='cleaned.pdf', 
raise_exceptions=True, return_cleaned=True) 
if cleaned: print 'PDF has been cleaned.'
else: print 'PDF is clean.'

Alternatives

 

AttachmentSize
pdfid_PL_0.0.9.zip5.75 KB
pdfid_PL_0.0.10.zip5.88 KB
pdfid_PL_0.0.11.zip5.93 KB
pdfid_PL-0.0.11b.zip8.3 KB

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <b> <address> <blockquote> <br> <caption> <center> <code> <dd> <del> <div> <dl> <dt> <em> <font> <h2> <h3> <h4> <h5> <h6> <hr> <i> <img> <li> <ol> <p> <pre> <span> <strong> <sub> <sup> <table> <tbody> <td> <tfoot> <th> <thead> <tr> <u> <ul> <tr>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].
  • Use [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] to insert a mediawiki style collapsible table of contents. All the arguments are optional.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.