Pdf-parser

From aldeid
Jump to navigation Jump to search
You might also see: pdfid
You might also see: make-pdf-javascript

Description

pdf-parser is a python-based script written by Didier Stevens, that parses a PDF document to identify the fundamental elements used in the analyzed file.

Installation

$ cd /data/src/
$ wget http://didierstevens.com/files/software/pdf-parser_V0_4_3.zip
$ unzip pdf-parser_V0_4_3.zip
$ chmod +x pdf-parser.py

Usage

Syntax

Usage: pdf-parser.py [options] pdf-file

Options

--version
show program's version number and exit
-h, --help
show this help message and exit
-s SEARCH, --search=SEARCH
string to search in indirect objects (except streams)
-f, --filter
pass stream object through filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only)
-o OBJECT, --object=OBJECT
id of indirect object to select (version independent)
-r REFERENCE, --reference=REFERENCE
id of indirect object being referenced (version independent)
-e ELEMENTS, --elements=ELEMENTS
type of elements to select (cxtsi)
-w, --raw
raw output for data and filters
-a, --stats
display stats for pdf document
-t TYPE, --type=TYPE
type of indirect object to select
-v, --verbose
display malformed PDF elements
-x EXTRACT, --extract=EXTRACT
filename to extract to
-H, --hash
display hash of objects
-n, --nocanonicalizedoutput
do not canonicalize the output
-d DUMP, --dump=DUMP
filename to dump stream content to
-D, --debug
display debug info
-c, --content
display the content for objects without streams or with streams without filters
--searchstream=SEARCHSTREAM
string to search in streams
--unfiltered
search in unfiltered streams
--casesensitive
case sensitive search in streams
--regex
use regex to search in streams

Example

Confirm presence of Javascript

With pdfid, we have been able to detect the presence of Javascript in the PDF file.

Using pdf-parser

Let's use pdf-parser to dig more about this PDF file.

$ ./pdf-parser.py --search=javascript jsunpack-n-read-only/samples/pdf-thisCreator.file
obj 3 0
 Type: 
 Referencing: 5 0 R

  <<
    /JavaScript 5 0 R
  >>

obj 6 0
 Type: 
 Referencing: 111611 0 R

  <<
    /JS 111611 0 R
    /S /JavaScript
  >>

The above command shows the links between objects 3 and 5 on one hand and 6 and 111611 on the other hand. Let's see whether object 5 is linked with other objects:

$ ./pdf-parser.py --object=5 jsunpack-n-read-only/samples/pdf-thisCreator.file
obj 5 0
 Type: 
 Referencing: 6 0 R

  <<
    /Names [(A)6 0 R ]
  >>

Object 5 is linked to object 6 and we now have the complete map:

Using pdfobjflow

Using pdfobjflow offers a quicker way of having the map:

$ ./pdf-parser.py /data/tools/jsunpack-n-read-only/samples/pdf-thisCreator.file | ./pdfobjflow.py
$ eog pdfobjflow.png

Here is the map:

Decompress javascript

Now, let's decompress the javascript contained in object 111611 with the --filter and --raw options:

$ ./pdf-parser.py --object=111611 --filter --raw jsunpack-n-read-only/samples/pdf-thisCreator.file > out.js
$ cat out.js
obj 111611 0
 Type: 
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /Length 142
  >>

 /*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/var b/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/=/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/
this.creator;/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/var a/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/=/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/
unescape(/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/b/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/);/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/eval(
/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/unescape(/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/this.creator.replace(/z/igm,'%')
/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/)/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/);

The above command reveals an obfuscated JavaScript code. Piping the output to a few commands helps decoding it:

$ tail -n +11 out.js | js_beautify - | grep -v "^\/\*" | indent 
var b = this.creator;
var a = unescape (b);
eval (unescape (this.creator.replace (/z / igm, '%')));

Comments