Origapy is a Python interface to Origami, a PDF parser written in Ruby. It provides access to pdfclean.rb, in order to sanitize PDF files by disabling all active content (javascript, launch actions, embedded files, etc). Because Origami is a full PDF parser, it is much more effective than PDFiD (when sanitizing/disarming PDF files), but also quite slower.
Origapy uses a simple Python/Ruby bridge based on pipes, as described on this page.
WARNING: this is still work in progress. The current version of the Origami parser may trigger errors on some PDF files.
Origapy and Origami are open-source, published under GPL v3.
Pick the attached file below.
Unzip and run install.bat on Windows, or "python setup.py install" on other platforms.
import origapy
pc = origapy.PDF_Cleaner()
pc.clean('file.pdf', 'cleaned.pdf')
|
| Fichier attaché | Taille |
|---|---|
| origapy-0.09.zip | 141.02 Ko |