Reply to comment

Origapy - a Python module to sanitize PDF files

Origapy is a Python interface to Origami, a PDF parser written in Ruby. It provides access to pdfclean.rb, in order to sanitize PDF files by disabling all active content (javascript, launch actions, embedded files, etc). Because Origami is a full PDF parser, it is much more effective than PDFiD (when sanitizing/disarming PDF files), but also quite slower.

Origapy uses a simple Python/Ruby bridge based on pipes, as described on this page.

WARNING: this is still work in progress. The current version of the Origami parser may trigger errors on some PDF files.

Changelog

  • 2010-09-12 v0.09: updated Origami engine to v1.0.0-beta3
  • 2009-10-02 v0.08: updated Origami engine to v1.0.0-beta1
  • 2009-09-30 v0.07: detects when a file is clean or cleaned, raise an exception when an error occurs

License

Origapy and Origami are open-source, published under GPL v3.

Download

Pick the attached file below.

Requirements

  • Python 2.x
  • Ruby 1.8.x

Install

Unzip and run install.bat on Windows, or "python setup.py install" on other platforms.

Usage

import origapy
pc = origapy.PDF_Cleaner()
pc.clean('file.pdf', 'cleaned.pdf')

 Alternatives

AttachmentSize
origapy-0.09.zip141.02 KB

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <b> <address> <blockquote> <br> <caption> <center> <code> <dd> <del> <div> <dl> <dt> <em> <font> <h2> <h3> <h4> <h5> <h6> <hr> <i> <img> <li> <ol> <p> <pre> <span> <strong> <sub> <sup> <table> <tbody> <td> <tfoot> <th> <thead> <tr> <u> <ul> <tr>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].
  • Use [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] to insert a mediawiki style collapsible table of contents. All the arguments are optional.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.