This article describes the PDF file format, related security issues and useful resources. [WORK IN PROGRESS]
The original location of this article is http://www.decalage.info/file_formats_security/pdf
Last update: 2017-11-10 (created 2010-02-13)
PDF (Portable Document Format) is a file format designed by Adobe. It is mainly used to publish final version of documents on the Internet, by e-mail or on CD-ROMs. Its main purpose is to display or print documents with a fixed layout. The PDF format may also be used to create electronic forms.
More info: http://en.wikipedia.org/wiki/Portable_Document_Format
Main client applications
The main application used to open PDF files for display is Adobe Reader. Many alternative applications are also able to display PDF files, such as Preview on MacOSX and Foxit Reader on Windows.
Adobe Acrobat is one of the applications which can create and edit PDF documents.
Main security issues
PDF is usually considered as a static and safe format for document exchange, which is a wrong perception.
The PDF format is in fact very complex, and contains several features which may lead to security issues:
- Javascript: Adobe Reader (and possibly other readers) contains a Javascript engine similar to the ones used by web browsers, but with a slightly different API to manipulate PDF content dynamically or to control some viewer features. Potentially dangerous features are restricted for obvious security reasons. However, this means that PDF documents are not purely static, and for example some actions may be used to fool a user (popups) or to send e-mails and HTTP requests automatically. Furthermore, experience shows that many recent vulnerabilities have been exploited using Javascript in PDF.
- Launch actions: a PDF file may launch any command on the operating system, after user confirmation (popup message). Different command lines may be specified for Windows, Unix and Mac. On Windows only, parameters can be provided for the command. Until Adobe Reader 9.3.2, the CVE-2010-1240 vulnerability made it possible to fool users by modifying the text of the popup message. Since Adobe Reader 9.3.3, a blacklist restricts file formats that can be opened, blocking executable files by default (but a way to bypass it has been found, and finally fixed in v9.3.4).
- Embedded files: a PDF file may contain attached files, which can be extracted and opened from the reader. This trick may be used to hide malicious executables in order to bypass some antivirus and content analysis engines. Fortunately, Adobe Reader refuses to open embedded files if their extension is part of a blacklist, such as EXE, BAT, CMD, etc. However, this blacklist is not perfect and formats such as HTML or Python scripts may be embedded in PDF and launched from Adobe Reader.
- GoToE actions: a PDF file may be embedded inside another PDF file, and a GoToE action may be used so that Adobe Reader opens the embedded PDF file automatically without notifying the user. This feature may be used to hide a malicious PDF file within a normal PDF file, to fool many antivirus engines.
- Embedded Flash applications: a PDF file may contain Flash applications (stored as embedded SWF files), which bring their own security issues, such as ActionScript content and Adobe Flash Player vulnerabilities. Adobe Reader contains its own Flash Player, independent from the one installed in web browsers. For example the CVE-2010-1297 vulnerability was first patched in the Flash Player on the 10 June 2010, whereas the Flash Player shipped with Adobe Reader was only patched on the 29 June 2010.
- Encryption: a PDF file may be encrypted with a password. However, if an empty password is used, Adobe Reader will open it directly without asking the user. This trick may be used to fool many antivirus and analysis engines that do not support decryption.
- Parser "flexibility": PDF specifications, Adobe Reader and possibly other applications are very flexible about the structure of PDF files.
- For example, most people think that PDF files have to start with the "%PDF" magic number, whereas the specifications only say this header has to be in the first 1024 bytes. See the Adobe PDF 1.7 Reference, Appendix H.3, page 1102: "Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file". It is therefore possible to insert around 1000 random bytes at the beginning of a PDF file. This trick may be used to bypass too strict antivirus or content analysis engines, because a fake file header (for example JPEG or HTML) can be inserted.
- Another example is that the catalog at the end of the PDF structure may not point exactly to each object: Adobe Reader is able to reconstruct malformed files even if some content has been inserted within or between PDF objects.
Potential Solutions
- Disable JavaScript and Launch features on each client: protects against most current malware, but limits functionality on the client.
- Convert all incoming PDF files to PDF/A (a subset of PDF without JavaScript, encryption, audio/video, external links, etc): an interesting solution, but PDF/A requires that all fonts are embedded. Links to potential tools: pdfa.org, 3-Heights, gDoc, 7-pdf. However, most of the PDF/A tools have not been designed for security purposes.
- Sanitize all incoming PDF files with a tool such as ExeFilter: covers most issues by disabling JavaScript, Launch actions, embedded files, etc in incoming PDF files.
Publications about PDF security issues
- 2000: Adobe Acrobat Security Issues : The Open File Action and File Attachment Annotations, Carl Orthlieb, Adobe
- 2001: Adobe PDF files can be used as virus carriers, Richard M. Smith, Bugtraq mailing-list
- 2003: Malware and File formats, Philippe Lagadec, SSTIC03
- 2008: New Viral Threats of PDF Language, Eric Filiol, Black Hat Europe 2008
- 2008: Malicious Origami in PDF, Frédéric Raynal, Guillaume Delugré, PacSec08
- 2008-2011: Didier Stevens' blog
- 2009: PDF: A Vector for Badness Incognito, Jeremy Conway, ISSA
- 2009: A look at Portable Document Format vulnerabilities, Sami Rautiainen
- 2009: Malicious PDF origamis strike back, Frédéric Raynal, Guillaume Delugré, Damien Aumaitre, Hack.lu 2009
- 2009: Penetration Document Format, Didier Stevens, Hack.lu 2009
- 2010: Surrounded by Malicious PDFs, François Paget, McAfee Labs Blog
- 2010: Fighting PDF malware with ExeFilter, Philippe Lagadec, EUSecWest 2010
- 2010-07-19: PDF Malware Overview, Joel Yonts, SANS
- 2010: Finding rules for heuristic detection of malicious PDFs: with analysis of embedded exploit code, Paul Baccas, VB2010
- 2010: The Rise of PDF Malware, Karthik Selvaraj and Nino Fred Gutierrez, Symantec whitepaper
- 2010-12-30: OMG-WTF-PDF, Julia Wolf, 27th Chaos Communication Congress (if slides are not available, try this and click on "quick view", or look at the video)
- 2016-03-25: Caradoc: a pragmatic approach to PDF parsing and validation, Guillaume Endignoux, Olivier Levillain, Jean-Yves Migeon
- 2016-11-02: How secure is PDF encryption?, Guillaume Endignoux
Examples of known vulnerabilities and exploits
- CVE-2010-1240: "Escape from PDF", revealed by Didier Stevens on March 29 2010: It has been known since 2000 (from Adobe itself) that the launch action feature in PDF is a security issue. What is new is that Didier Stevens has shown that this feature may be used to launch an executable file in the PDF document itself (without providing details for now). He also discovered that Foxit Reader before version 3.2.0.0303 did not ask any confirmation before launching the executable. He finally showed that Adobe Reader 9.3.1 has a bug which makes it possible to tweak the warning message and fool users so that they click on "Open" (the actual CVE-2010-1240). Foxit Reader was patched a few days later, and Adobe suggested a workaround on April 6. Jeremy Conway showed it is possible to combine launch actions with incremental updates to create a PDF virus, and Sophos reported malicious usage of launch actions in the wild on April 12. Adobe Reader 9.3.3 was released on June 29 with a fix for CVE-2010-1240, and a new blacklist system to avoid launching some file formats such as executable files. (but a way to bypass it has been found, then fixed in v9.3.4)
- CVE-2009-4324: Javascript Doc.media.newPlayer vulnerability in Adobe Reader up to v9.2. Public exploits: on Securityfocus, Metasploit.
- CVE-2009-0927: Javascript Collab.getIcon buffer overflow in Adobe Reader. Public exploits: on Securityfocus.
- List of Adobe Reader vulnerabilities:
Obfuscation techniques
Before analyzing malicious documents, it's good to know your enemy. Here are a few hand-picked blog posts and articles that explain known obfuscation and anti-analysis techniques:
- 2008-04-29: PDF, Let Me Count the Ways…, Didier Stevens
- 2009-02-06: Complex obfuscated PDF exploit, Hermes (Lei) Li
- 2009-05-11: PDF Filter Abbreviations, Didier Stevens
- 2009-05-14: Malformed PDF Documents, Didier Stevens
- 2009-06-19: Streams and filters in PDF with origami, Frédéric Raynal
- 2009-06-19: Virus total with origami?, Frédéric Raynal
- 2009-06-26: (At least) 4 ways to die opening a PDF, Frédéric Raynal
- 2009-11-03: Making malicious PDF undetectable, Andrzej Derezowski
- 2010-01-04: Sophisticated, targeted malicious PDF documents exploiting CVE-2009-4324, Bojan Zdrnja
- 2010-01-09: Yet another interesting PDF obfuscation, Andrzej Derezowski
- 2010-01-13: Generic PDF exploit hider - embedPDF.py and goodbye AV detection, Felipe Andres Manzano
- 2010-01-14: PDF Obfuscation using getAnnots(), Julia Wolf
- 2010-01-14: PDF Babushka, Bojan Zdrnja
- 2010-04-08: JavaScript obfuscation in PDF: Sky is the limit, Bojan Zdrnja
- 2010-05-18: More Malformed PDFs, Didier Stevens
- 2010-06-21: World's Smallest PDF, Julia Wolf
- 2010-06-25: Solving the Win7 Puzzle (a zip bomb in PDF), Didier Stevens
- 2010-07-13: How to really obfuscate your PDF malware, Sebastian Porst
- 2010-07-20: CSI:Internet - PDF time bomb (an excellent description of obfuscated PDF malware), Thorsten Holz
- 2010-08-19: Anatomy of a PDF Exploit, Niels Provos
- 2010-08-30: Getting Owned By Malicious PDF - Analysis, Mahmud Ab Rahman
- 2010-09-01: An approach to PDF shielding, Guillaume Delugré
- 2010-09-21: The Rise of PDF Malware, Karthik Selvaraj and Nino Fred Gutierrez
- 2010-09-26: Malicious PDF Analysis E-book, Didier Stevens
- 2010-09-29: Finding rules for heuristic detection of malicious PDFs: with analysis of embedded exploit code, Paul Baccas
- 2010-11-03: No endstream, no endobj, no worries, Lebahnet
- 2010-12-30: OMG-WTF-PDF, Julia Wolf (if slides are not available, try this and click on "quick view", or look at the video)
- 2011-01-05: Portable Document Format Malware, Kazumasa Itabashi (whitepaper)
- 2011-05-06: Obfuscation and (non-)detection of malicious PDF files, Jose Miguel Esparza, CARO 2011
- 2011-07-14: a summary of PDF tricks - encodings, structures, javascript..., corkami
- 2011-09-14: The undocumented password validation algorithm of Adobe Reader X, Guillaume Delugré
- 2013-11-05: Malicious PDF Analysis Evasion Techniques, Michael Du
Analysis techniques
- 2009-07-06: Is this PDF malicious?, Frédéric Raynal
- 2009: Analyzing Malicious Documents Cheat Sheet, Lenny Zeltser
- 2010-01-07: Static analysis of malicious PDFs and Static analysis of malicious PDFs part #2, Daniel Wesemann
- 2010-04-05: Matt's Primer for PDF Analysis, Sourcefire VRT
- 2010-08-30: Getting Owned By Malicious PDF - Analysis, Mahmud Ab Rahman
- 2010-09-26: Malicious PDF Analysis E-book, Didier Stevens
- 2011-05-04: How to Extract Flash Objects from Malicious PDF Files, Lenny Zeltser
- 2011-05-25: Malicious PDF Analysis Workshop Screencasts (HITB Amsterdam), Didier Stevens
- 2016-03-25: Caradoc: a pragmatic approach to PDF parsing and validation, Guillaume Endignoux, Olivier Levillain, Jean-Yves Migeon
(listed in no particular order)
Command-line
- pdfid: PDF analysis tool written in Python (basic parsing, useful to detect malware).
- pdf-parser: PDF analysis tool written in Python (more complete parser).
- Origami: PDF analysis framework written in Ruby (full parser/builder, includes many scripts and a GUI).
- opaf: PDF analysis framework written in Python (full parser) - see also this blog post
- pdf structazer (documentation)
- pdftk: PDF manipulation tool, useful to analyze obfuscated PDFs
- QPDF: another PDF manipulation tool to remove encryption, linearization or object streams
- jsunpack-n: to extract JavaScript from various formats including PDF - an online version is also available.
- pyew: a malware analysis tool with PDF analysis features
- peepdf: malicious PDF analysis tool written in Python
- caradoc: a parser and validator of PDF files written in OCaml
- veraPDF: an open source PDF/A validator supported by the PDF industry and funded by the European Union’s PREFORMA project
GUI
- Origami: PDF analysis framework written in Ruby (full parser/builder, includes many scripts and a GUI).
- PDF Dissector: a commercial tool to analyze malicious PDF files
- pdfubar: an open-source GUI written in Python using pdf-parser, Yara and jsunpack-n to analyze PDF files (pretty basic for now but promising)
- PDF Stream Dumper: malicious PDF analysis tool written in VB with a GUI
Linux distributions
- REMnux and Mercury: Linux distributions with many malware analysis tools ready to use (including Origami, pdfid, pdf-parser, jsunpack-n, etc)
Online
- jsunpack-n: to extract JavaScript from various formats including PDF - an online version is also available.
- wepawet: online malware analysis supporting PDF
- joedoc: online PDF exploit detection based on sandboxing and tracing
- PDF Examiner: online PDF analysis tool
- Gallus: online PDF analysis tool
- pdf-parser: PDF parser for Python
- Origami: PDF parser and builder for Ruby
- jsunpack-n: includes a PDF parser in Python
- opaf: PDF analysis framework written in Python (full parser)
- PDFbox: PDF parser and builder for Java
- pyPdf: Python module to read and write PDF files
- PDFMiner: PDF parser and analyzer written in Python
- caradoc: a parser and validator of PDF files written in OCaml