Weaponized PDF - Payload Delivery Format

This article describes the PDF file format, related security issues and useful resources. [WORK IN PROGRESS]

The original location of this article is http://www.decalage.info/file_formats_security/pdf

Last update: 2017-11-10 (created 2010-02-13)

Table of Contents

File format description
Main client applications
Main security issues
Potential Solutions
Format specifications and technical information
Publications about PDF security issues
Examples of known vulnerabilities and exploits
Obfuscation techniques
Analysis techniques
Useful analysis tools
Parsing tools and libraries
Filtering tools and libraries

File format description

PDF (Portable Document Format) is a file format designed by Adobe. It is mainly used to publish final version of documents on the Internet, by e-mail or on CD-ROMs. Its main purpose is to display or print documents with a fixed layout. The PDF format may also be used to create electronic forms.

More info: http://en.wikipedia.org/wiki/Portable_Document_Format

Main client applications

The main application used to open PDF files for display is Adobe Reader. Many alternative applications are also able to display PDF files, such as Preview on MacOSX and Foxit Reader on Windows.

Adobe Acrobat is one of the applications which can create and edit PDF documents.

Main security issues

PDF is usually considered as a static and safe format for document exchange, which is a wrong perception.

The PDF format is in fact very complex, and contains several features which may lead to security issues:

Javascript: Adobe Reader (and possibly other readers) contains a Javascript engine similar to the ones used by web browsers, but with a slightly different API to manipulate PDF content dynamically or to control some viewer features. Potentially dangerous features are restricted for obvious security reasons. However, this means that PDF documents are not purely static, and for example some actions may be used to fool a user (popups) or to send e-mails and HTTP requests automatically. Furthermore, experience shows that many recent vulnerabilities have been exploited using Javascript in PDF.
Launch actions: a PDF file may launch any command on the operating system, after user confirmation (popup message). Different command lines may be specified for Windows, Unix and Mac. On Windows only, parameters can be provided for the command. Until Adobe Reader 9.3.2, the CVE-2010-1240 vulnerability made it possible to fool users by modifying the text of the popup message. Since Adobe Reader 9.3.3, a blacklist restricts file formats that can be opened, blocking executable files by default (but a way to bypass it has been found, and finally fixed in v9.3.4).
Embedded files: a PDF file may contain attached files, which can be extracted and opened from the reader. This trick may be used to hide malicious executables in order to bypass some antivirus and content analysis engines. Fortunately, Adobe Reader refuses to open embedded files if their extension is part of a blacklist, such as EXE, BAT, CMD, etc. However, this blacklist is not perfect and formats such as HTML or Python scripts may be embedded in PDF and launched from Adobe Reader.
GoToE actions: a PDF file may be embedded inside another PDF file, and a GoToE action may be used so that Adobe Reader opens the embedded PDF file automatically without notifying the user. This feature may be used to hide a malicious PDF file within a normal PDF file, to fool many antivirus engines.
Embedded Flash applications: a PDF file may contain Flash applications (stored as embedded SWF files), which bring their own security issues, such as ActionScript content and Adobe Flash Player vulnerabilities. Adobe Reader contains its own Flash Player, independent from the one installed in web browsers. For example the CVE-2010-1297 vulnerability was first patched in the Flash Player on the 10 June 2010, whereas the Flash Player shipped with Adobe Reader was only patched on the 29 June 2010.
Encryption: a PDF file may be encrypted with a password. However, if an empty password is used, Adobe Reader will open it directly without asking the user. This trick may be used to fool many antivirus and analysis engines that do not support decryption.
Parser "flexibility": PDF specifications, Adobe Reader and possibly other applications are very flexible about the structure of PDF files.
- For example, most people think that PDF files have to start with the "%PDF" magic number, whereas the specifications only say this header has to be in the first 1024 bytes. See the Adobe PDF 1.7 Reference, Appendix H.3, page 1102: "Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file". It is therefore possible to insert around 1000 random bytes at the beginning of a PDF file. This trick may be used to bypass too strict antivirus or content analysis engines, because a fake file header (for example JPEG or HTML) can be inserted.
- Another example is that the catalog at the end of the PDF structure may not point exactly to each object: Adobe Reader is able to reconstruct malformed files even if some content has been inserted within or between PDF objects.

Potential Solutions

Disable JavaScript and Launch features on each client: protects against most current malware, but limits functionality on the client.
Convert all incoming PDF files to PDF/A (a subset of PDF without JavaScript, encryption, audio/video, external links, etc): an interesting solution, but PDF/A requires that all fonts are embedded. Links to potential tools: pdfa.org, 3-Heights, gDoc, 7-pdf. However, most of the PDF/A tools have not been designed for security purposes.
Sanitize all incoming PDF files with a tool such as ExeFilter: covers most issues by disabling JavaScript, Launch actions, embedded files, etc in incoming PDF files.

Format specifications and technical information

PDF 2.0 ISO 32000-2:2017 specifications (ISO standard, July 2017)
PDF 1.7 ISO 32000-1:2008 specifications (ISO standard, 2008)
Adobe PDF specifications and archives (Adobe)
General information about PDF and links (Wikipedia)
Short PDF structure description (Didier Stevens)
JavaScript for Acrobat documentation (Adobe)
Application Security for the Acrobat Family of Products v9.x (Adobe)
Application Security Library for Acrobat and Adobe Reader (Adobe)

Publications about PDF security issues

2000: Adobe Acrobat Security Issues : The Open File Action and File Attachment Annotations, Carl Orthlieb, Adobe
2001: Adobe PDF files can be used as virus carriers, Richard M. Smith, Bugtraq mailing-list
2003: Malware and File formats, Philippe Lagadec, SSTIC03
2008: New Viral Threats of PDF Language, Eric Filiol, Black Hat Europe 2008
2008: Malicious Origami in PDF, Frédéric Raynal, Guillaume Delugré, PacSec08
2008-2011: Didier Stevens' blog
2009: PDF: A Vector for Badness Incognito, Jeremy Conway, ISSA
2009: A look at Portable Document Format vulnerabilities, Sami Rautiainen
2009: Malicious PDF origamis strike back, Frédéric Raynal, Guillaume Delugré, Damien Aumaitre, Hack.lu 2009
2009: Penetration Document Format, Didier Stevens, Hack.lu 2009
2010: Surrounded by Malicious PDFs, François Paget, McAfee Labs Blog
2010: Fighting PDF malware with ExeFilter, Philippe Lagadec, EUSecWest 2010
2010-07-19: PDF Malware Overview, Joel Yonts, SANS
2010: Finding rules for heuristic detection of malicious PDFs: with analysis of embedded exploit code, Paul Baccas, VB2010
2010: The Rise of PDF Malware, Karthik Selvaraj and Nino Fred Gutierrez, Symantec whitepaper
2010-12-30: OMG-WTF-PDF, Julia Wolf, 27th Chaos Communication Congress (if slides are not available, try this and click on "quick view", or look at the video)
2016-03-25: Caradoc: a pragmatic approach to PDF parsing and validation, Guillaume Endignoux, Olivier Levillain, Jean-Yves Migeon
2016-11-02: How secure is PDF encryption?, Guillaume Endignoux

Examples of known vulnerabilities and exploits

CVE-2010-1240: "Escape from PDF", revealed by Didier Stevens on March 29 2010: It has been known since 2000 (from Adobe itself) that the launch action feature in PDF is a security issue. What is new is that Didier Stevens has shown that this feature may be used to launch an executable file in the PDF document itself (without providing details for now). He also discovered that Foxit Reader before version 3.2.0.0303 did not ask any confirmation before launching the executable. He finally showed that Adobe Reader 9.3.1 has a bug which makes it possible to tweak the warning message and fool users so that they click on "Open" (the actual CVE-2010-1240). Foxit Reader was patched a few days later, and Adobe suggested a workaround on April 6. Jeremy Conway showed it is possible to combine launch actions with incremental updates to create a PDF virus, and Sophos reported malicious usage of launch actions in the wild on April 12. Adobe Reader 9.3.3 was released on June 29 with a fix for CVE-2010-1240, and a new blacklist system to avoid launching some file formats such as executable files. (but a way to bypass it has been found, then fixed in v9.3.4)
CVE-2009-4324: Javascript Doc.media.newPlayer vulnerability in Adobe Reader up to v9.2. Public exploits: on Securityfocus, Metasploit.
CVE-2009-0927: Javascript Collab.getIcon buffer overflow in Adobe Reader. Public exploits: on Securityfocus.
List of Adobe Reader vulnerabilities:

Obfuscation techniques

Before analyzing malicious documents, it's good to know your enemy. Here are a few hand-picked blog posts and articles that explain known obfuscation and anti-analysis techniques:

2008-04-29: PDF, Let Me Count the Ways…, Didier Stevens
2009-02-06: Complex obfuscated PDF exploit, Hermes (Lei) Li
2009-05-11: PDF Filter Abbreviations, Didier Stevens
2009-05-14: Malformed PDF Documents, Didier Stevens
2009-06-19: Streams and filters in PDF with origami, Frédéric Raynal
2009-06-19: Virus total with origami?, Frédéric Raynal
2009-06-26: (At least) 4 ways to die opening a PDF, Frédéric Raynal
2009-11-03: Making malicious PDF undetectable, Andrzej Derezowski
2010-01-04: Sophisticated, targeted malicious PDF documents exploiting CVE-2009-4324, Bojan Zdrnja
2010-01-09: Yet another interesting PDF obfuscation, Andrzej Derezowski
2010-01-13: Generic PDF exploit hider - embedPDF.py and goodbye AV detection, Felipe Andres Manzano
2010-01-14: PDF Obfuscation using getAnnots(), Julia Wolf
2010-01-14: PDF Babushka, Bojan Zdrnja
2010-04-08: JavaScript obfuscation in PDF: Sky is the limit, Bojan Zdrnja
2010-05-18: More Malformed PDFs, Didier Stevens
2010-06-21: World's Smallest PDF, Julia Wolf
2010-06-25: Solving the Win7 Puzzle (a zip bomb in PDF), Didier Stevens
2010-07-13: How to really obfuscate your PDF malware, Sebastian Porst
2010-07-20: CSI:Internet - PDF time bomb (an excellent description of obfuscated PDF malware), Thorsten Holz
2010-08-19: Anatomy of a PDF Exploit, Niels Provos
2010-08-30: Getting Owned By Malicious PDF - Analysis, Mahmud Ab Rahman
2010-09-01: An approach to PDF shielding, Guillaume Delugré
2010-09-21: The Rise of PDF Malware, Karthik Selvaraj and Nino Fred Gutierrez
2010-09-26: Malicious PDF Analysis E-book, Didier Stevens
2010-09-29: Finding rules for heuristic detection of malicious PDFs: with analysis of embedded exploit code, Paul Baccas
2010-11-03: No endstream, no endobj, no worries, Lebahnet
2010-12-30: OMG-WTF-PDF, Julia Wolf (if slides are not available, try this and click on "quick view", or look at the video)
2011-01-05: Portable Document Format Malware, Kazumasa Itabashi (whitepaper)
2011-05-06: Obfuscation and (non-)detection of malicious PDF files, Jose Miguel Esparza, CARO 2011
2011-07-14: a summary of PDF tricks - encodings, structures, javascript..., corkami
2011-09-14: The undocumented password validation algorithm of Adobe Reader X, Guillaume Delugré
2013-11-05: Malicious PDF Analysis Evasion Techniques, Michael Du

Analysis techniques

2009-07-06: Is this PDF malicious?, Frédéric Raynal
2009: Analyzing Malicious Documents Cheat Sheet, Lenny Zeltser
2010-01-07: Static analysis of malicious PDFs and Static analysis of malicious PDFs part #2, Daniel Wesemann
2010-04-05: Matt's Primer for PDF Analysis, Sourcefire VRT
2010-08-30: Getting Owned By Malicious PDF - Analysis, Mahmud Ab Rahman
2010-09-26: Malicious PDF Analysis E-book, Didier Stevens
2011-05-04: How to Extract Flash Objects from Malicious PDF Files, Lenny Zeltser
2011-05-25: Malicious PDF Analysis Workshop Screencasts (HITB Amsterdam), Didier Stevens
2016-03-25: Caradoc: a pragmatic approach to PDF parsing and validation, Guillaume Endignoux, Olivier Levillain, Jean-Yves Migeon

Useful analysis tools

(listed in no particular order)

Command-line

pdfid: PDF analysis tool written in Python (basic parsing, useful to detect malware).
pdf-parser: PDF analysis tool written in Python (more complete parser).
Origami: PDF analysis framework written in Ruby (full parser/builder, includes many scripts and a GUI).
opaf: PDF analysis framework written in Python (full parser) - see also this blog post
pdf structazer (documentation)
pdftk: PDF manipulation tool, useful to analyze obfuscated PDFs
QPDF: another PDF manipulation tool to remove encryption, linearization or object streams
jsunpack-n: to extract JavaScript from various formats including PDF - an online version is also available.
pyew: a malware analysis tool with PDF analysis features
peepdf: malicious PDF analysis tool written in Python
caradoc: a parser and validator of PDF files written in OCaml
veraPDF: an open source PDF/A validator supported by the PDF industry and funded by the European Union’s PREFORMA project

GUI

Origami: PDF analysis framework written in Ruby (full parser/builder, includes many scripts and a GUI).
PDF Dissector: a commercial tool to analyze malicious PDF files
pdfubar: an open-source GUI written in Python using pdf-parser, Yara and jsunpack-n to analyze PDF files (pretty basic for now but promising)
PDF Stream Dumper: malicious PDF analysis tool written in VB with a GUI

Linux distributions

REMnux and Mercury: Linux distributions with many malware analysis tools ready to use (including Origami, pdfid, pdf-parser, jsunpack-n, etc)

Online

jsunpack-n: to extract JavaScript from various formats including PDF - an online version is also available.
wepawet: online malware analysis supporting PDF
joedoc: online PDF exploit detection based on sandboxing and tracing
PDF Examiner: online PDF analysis tool
Gallus: online PDF analysis tool

Parsing tools and libraries

pdf-parser: PDF parser for Python
Origami: PDF parser and builder for Ruby
jsunpack-n: includes a PDF parser in Python
opaf: PDF analysis framework written in Python (full parser)
PDFbox: PDF parser and builder for Java
pyPdf: Python module to read and write PDF files
PDFMiner: PDF parser and analyzer written in Python
caradoc: a parser and validator of PDF files written in OCaml

Filtering tools and libraries

ExeFilter: to sanitize PDF files by disabling all Javascript, launch actions, embedded files, etc.
pdfid / pdfid_PL
Origami / Origapy

Languages

Navigation

Weaponized File Formats

Primary links

Popular content

Today's:

All time: