pdf-parser is a python-based script written by Didier Stevens, that parses a PDF document to identify the fundamental elements used in the analyzed file.


$ cd /data/src/
$ wget http://didierstevens.com/files/software/pdf-parser_V0_4_3.zip
$ unzip pdf-parser_V0_4_3.zip
$ chmod +x pdf-parser.py



Usage: pdf-parser.py [options] pdf-file


show program's version number and exit
-h, --help
show this help message and exit
-s SEARCH, --search=SEARCH
string to search in indirect objects (except streams)
-f, --filter
pass stream object through filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only)
-o OBJECT, --object=OBJECT
id of indirect object to select (version independent)
id of indirect object being referenced (version independent)
-e ELEMENTS, --elements=ELEMENTS
type of elements to select (cxtsi)
-w, --raw
raw output for data and filters
-a, --stats
display stats for pdf document
-t TYPE, --type=TYPE
type of indirect object to select
-v, --verbose
display malformed PDF elements
-x EXTRACT, --extract=EXTRACT
filename to extract to
-H, --hash
display hash of objects
-n, --nocanonicalizedoutput
do not canonicalize the output
-d DUMP, --dump=DUMP
filename to dump stream content to
-D, --debug
display debug info
-c, --content
display the content for objects without streams or with streams without filters
string to search in streams
search in unfiltered streams
case sensitive search in streams
use regex to search in streams


Confirm presence of Javascript

With pdfid, we have been able to detect the presence of Javascript in the PDF file.

Highlight links between objects

Using pdf-parser

Let's use pdf-parser to dig more about this PDF file.

$ ./pdf-parser.py --search=javascript jsunpack-n-read-only/samples/pdf-thisCreator.file
obj 3 0
 Referencing: 5 0 R

    /JavaScript 5 0 R

obj 6 0
 Referencing: 111611 0 R

    /JS 111611 0 R
    /S /JavaScript

The above command shows the links between objects 3 and 5 on one hand and 6 and 111611 on the other hand. Let's see whether object 5 is linked with other objects:

$ ./pdf-parser.py --object=5 jsunpack-n-read-only/samples/pdf-thisCreator.file
obj 5 0
 Referencing: 6 0 R

    /Names [(A)6 0 R ]

Object 5 is linked to object 6 and we now have the complete map:


Using pdfobjflow

Using pdfobjflow offers a quicker way of having the map:

$ ./pdf-parser.py /data/tools/jsunpack-n-read-only/samples/pdf-thisCreator.file | ./pdfobjflow.py
$ eog pdfobjflow.png

Here is the map:


Decompress javascript

Now, let's decompress the javascript contained in object 111611 with the --filter and --raw options:

$ ./pdf-parser.py --object=111611 --filter --raw jsunpack-n-read-only/samples/pdf-thisCreator.file > out.js
$ cat out.js
obj 111611 0
 Contains stream

    /Filter /FlateDecode
    /Length 142

 /*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/var b/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/=/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/
this.creator;/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/var a/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/=/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/
unescape(/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/b/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/);/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/eval(
/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/unescape(/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/this.creator.replace(/z/igm,'%')
/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/)/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/);

The above command reveals an obfuscated JavaScript code. Piping the output to a few commands helps decoding it:

$ tail -n +11 out.js | js_beautify - | grep -v "^\/\*" | indent 
var b = this.creator;
var a = unescape (b);
eval (unescape (this.creator.replace (/z / igm, '%')));