Cybersecurity threat detection and response is a game of action and reaction. Cybercriminals endeavor to find creative new ways of misdirection and obfuscation. And security experts work tirelessly to identify, research, and develop new ways to guard against these threats.
The weaponization of PDFs is one way bad actors have widened the cybersecurity battlefield, turning what many see as an innocuous file into something used for nefarious purposes.
The Portable Document Format, or PDF, as it’s better known, was developed by Adobe in 1993 to allow anyone to view a complex document without the need for a dedicated program and independent of any particular platform. PDF files can be opened in any operating system and with any web browser – or via a PDF reader.
Based on the PostScript language[i], a PDF can contain a variety of information, including text, hyperlinks, multimedia, images, attachments, metadata, and other data, making it a very detailed format. The PDF reference documentation that describes the potential of this very powerful and versatile file format contains more than 800 pages.
In this blog we’ll look more closely at the PDF and a variety of ways cybercriminals are using it to fool detection and infiltrate networks. We’ll also show how Deep Instinct detects compromised PDFs, immediately disabling them from being opened.
PDF is composed of 4 main sections: Header, Body, Xref table, and Trailer.
A simple section that specifies the PDF file format and the version number, in our case 1.7 (the “%” sign in PDF specifies a comment)
The PDF body contains all of the objects that represent the layout of a document, including images, fonts, pages, scripting code, and more.
Cross-Reference (Xref) Table
Just as the name suggests, this is a table of cross-references – a data structure that contains references (offsets) to objects (images, etc.) within the file, allowing the PDF reader to find any object at any time without the need to load the entire document into memory, and quickly navigate between pages.
PDF reader will first look at the Trailer as it contains the necessary information for parsing the entire file. The most important keys are the offset to cross-reference table, the number of objects in the file, and the reference to the catalog (root) object. The catalog object is the first object in the Body hierarchy and defines how the document is displayed (the page-layout), the objects are outlined, and the content presented.
Common Attack Vectors
Now that we’ve reviewed how PDFs are architected, let’s examine how they are being compromised and used by cybercriminals.
Similar to MS Office documents, PDF files have been in widespread use for many years now. Along with its popularity and due to the fact that it’s a cross-platform format, PDF files are very appealing for threat actors and are used as an initial point of infection for victims.
From a simple Phishing attack to a sophisticated exploit, PDF can be weaponized in many ways.
When PDF became a popular document format, cybercriminals recognized an opportunity to use social engineering techniques to deliver Phishing PDFs.
As with Office documents, email attachments have become the primary means of distribution.
Other document types, such as Office files, have earned a bad reputation over the years because they’ve been widely exploited using various vulnerabilities and macros. PDFs have historically been viewed as a safer format and developed a reputation as being less dangerous due to less publicized breaches and actual threats seen in the wild.
Phishing PDFs typically contain a link to an external URL that is designed to forward the user to one of these:
These methods are fairly simple and don’t require a very sophisticated attacker to carry them out.
A common trend in recent years – a captcha picture with an embedded URL redirecting the victim to a phishing or unwanted ad website.
Most PDF users are aware of and use PDF editing capabilities. In fact, for many, this may be the extent to which they use PDFs. But there is also a feature called “Actions” which can be used to preform a variety of tasks, including the following:
These actions are triggered by “Events.” Common examples of such “Events” can include saving or closing the PDF, mouse clicking, and so on. A user can create a document that, when closed, will open a web page with a specific location.
Some of the actions and triggers that can be created using Adobe Acrobat:
In the figure below we can see a basic SubmitForm action that mimics an Amazon email and password form. An attacker can use social engineering to convince potential victims to fill out the form and then forward their credentials to a remote server.
This type of plain attack, as with the phishing scams that we’ve seen before, can affect most PDF readers and browsers. These attacks rely on readers’ built-in features by design and will always work, if allowed by users, as opposed to the exploitation of vulnerabilities found in PDF readers.
Action that mimics an Amazon form and the malicious code behind it.
Exploits and vulnerabilities
PDF documents can be viewed with browsers and PDF readers or dedicated applications for PDF files and each may have vulnerabilities threat actors can exploit.
These vulnerabilities can be divided into several categories, including, but not limited to the following:
Common Vulnerabilities and Exposures (CVE) is a listing of publicly known security weaknesses. There are hundreds of CVE for PDF readers[vii] and almost 300 known vulnerabilities for Adobe Acrobat Reader alone[viii].
While companies update their products regularly, it is important to remember that there are still many “zero day”[ix] vulnerabilities that haven’t been reported or discovered yet. Additionally, many users still use older and unpatched versions of PDF readers and browsers.
Security researchers and threat actors find new PDF-related exploits almost daily, with “out-of-bounds read” vulnerabilities being one of the most common.
An out-of-bounds read vulnerability occurs when a program tries to read data from before the beginning of a buffer or past its end. In computer science, a buffer is a segment of physical memory that stores temporary data so it can be utilized.
This usually happens when a pointer of an arithmetic operation results in a position outside the bounds of allocated memory; this may cause the program to crush or corrupt the memory.
An attacker can use this vulnerability to their advantage. By reading out-of-bounds memory addresses or secret values, the bad actor can gather information that may assist in achieving an exploitation of an unrelated weakness and execute an arbitrary code on a victim’s machine.
Here is an example of this vulnerability seen exploited in the wild:
CVE-2018-4893 – This vulnerability occurs as a result of computation that reads data that is past the end of the target buffer; the computation is part of XPS (XML Paper Specification)[x] font processing. Published in February 2018, this critical vulnerability is already three and a half years old at the time of this writing. Nevertheless, it is still being seen in the wild.
Anti-Virus vendors vs Mal-PDF
Distinguishing between a Phishing PDF and a legitimate one can be challenging. The example below shows just how similar both documents appear to be. They possess a very analogous structure – both consist of one page with little text and a hyperlink to click which redirects the user to a web page.
Even next-gen antivirus vendors can struggle to accomplish this task. While the challenges to mitigate this vary by vendor, poor detection capabilities or poor identification and classification can lead to a high false-positive rate. Many antivirus vendors fail to correctly detect phishing PDFs.
Very similar PDF documents, a legitimate Booking invoice on the left compared to an Apple Phishing email on the right.
The figure below is taken from a Phishing PDF[xi] that attempts to persuade the user to review the document while redirecting to a fraudulent web page.
The PDF was first submitted to Virus-Total on January 14, 2021. Not only was no vendor able to detect it, but this document was analyzed multiple times with zero detections on each analysis.
We then dropped the sample in our lab station where our Deep Instinct agent was installed, and we received a popup which alerted us that a threat was detected. Any attempt to open the file was immediately disabled.
Furthermore, we’ve demonstrated how difficult it is to distinguish between a phishing attempt from a legitimate PDF, and how even next-gen antivirus vendors have a difficult time detecting these types of attacks.
Deep Instinct secures its clients with a robust deep learning-based protection mechanism able to seamlessly prevent zero day threats in the form of PDF, PE, and other malicious files.
If you’d like to hear more about our industry-leading approach to stopping malware, please contact us and we’ll set up a demo.