Répondre au commentaire

PDFiD - a Python module to analyze and sanitize PDF files

PDF files may be used to trigger malicious content, as described here. PDFiD is a Python tool to analyze and sanitize PDF files, written by Didier Stevens. Here is PDFiD_PL, a version that I have slightly modified so that it can be imported as a module in Python applications (originally for ExeFilter).

Modifications

The modified version is named pdfid_PL.py. The main differences with the original tool are in the PDFiD function:

def PDFiD(file, allNames=False, extraData=False, disarm=False, force=False,
    output_file=None, raise_exceptions=False, return_cleaned=False,
    active_keywords=ACTIVE_KEYWORDS):

The following parameters have been added:

  • output_file: path of output file to be created.
  • raise_exceptions: raise an exception when a parsing error happens, instead of ignoring it.
  • return_cleaned: return a tuple (xmlDoc, cleaned), where cleaned=True if the PDF contained active content which has been cleaned.
  • active_keywords: list of PDF tags to be disabled. Default value: ('/JS', '/JavaScript', '/AA', '/OpenAction', '/JBIG2Decode', '/RichMedia', '/Launch')

All these parameters are optional, so that pdfid_PL.py runs exactly like the original pdfid.py when they are not set.

Changelog

pdfid_PL is updated each time Didier Stevens modifies pdfid:

  • 2010-06-15 v0.0.11b: fixed a bug that happened when using return_cleaned
  • 2010-05-02 v0.0.11: added /Launch to list of keywords to be disabled.
  • 2010-01-11 v0.0.10: relaxed %PDF header checking
  • 2009-10-19 v0.0.9: updated from pdfid v0.0.9
  • 2009-10-09 v0.0.7: initial version based on pdfid v0.0.7.

Download

Pick the attached file below.

Alternatively, you may get the latest version from the ExeFilter SVN.

Sample usage

import pdfid_PL as pdfid
xmldoc, cleaned = pdfid.PDFiD('file.pdf', disarm=True, output_file='cleaned.pdf', 
raise_exceptions=True, return_cleaned=True) 
if cleaned: print 'PDF has been cleaned.'
else: print 'PDF is clean.'

Alternatives

 

Fichier attachéTaille
pdfid_PL_0.0.9.zip5.75 Ko
pdfid_PL_0.0.10.zip5.88 Ko
pdfid_PL_0.0.11.zip5.93 Ko
pdfid_PL-0.0.11b.zip8.3 Ko

Répondre

Le contenu de ce champ ne sera pas montré publiquement.
  • Les adresses de pages web et de messagerie électronique sont transformées en liens automatiquement.
  • Allowed HTML tags: <a> <b> <address> <blockquote> <br> <caption> <center> <code> <dd> <del> <div> <dl> <dt> <em> <font> <h2> <h3> <h4> <h5> <h6> <hr> <i> <img> <li> <ol> <p> <pre> <span> <strong> <sub> <sup> <table> <tbody> <td> <tfoot> <th> <thead> <tr> <u> <ul> <tr>
  • Les lignes et les paragraphes vont à la ligne automatiquement.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].
  • Insérer [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] pour afficher une table des matières déroulable de style MediaWiki. Tous les arguments sont optionnels.

Plus d'informations sur les options de formatage

By submitting this form, you accept the Mollom privacy policy.