Category:Digital-Forensics/Malicious-Documents/PDF

From aldeid
Jump to navigation Jump to search
You are here:
PDF

PDF file structure

$ xxd helloworld.pdf 
0000000: 2550 4446 2d31 2e34 0a25 e1e9 ebd3 0a32  %PDF-1.4.%.....2
0000010: 2030 206f 626a 0a3c 3c2f 5479 7065 202f   0 obj.<</Type /
0000020: 4361 7461 6c6f 670a 2f50 6167 6573 2031  Catalog./Pages 1
0000030: 2030 2052 0a3e 3e0a 656e 646f 626a 0a33   0 R.>>.endobj.3
0000040: 2030 206f 626a 0a3c 3c2f 5479 7065 202f   0 obj.<</Type /
0000050: 5061 6765 0a2f 5061 7265 6e74 2031 2030  Page./Parent 1 0
0000060: 2052 0a2f 5265 736f 7572 6365 7320 3c3c   R./Resources <<
0000070: 2f50 726f 6353 6574 7320 5b2f 5044 4620  /ProcSets [/PDF 
0000080: 2f54 6578 7420 2f49 6d61 6765 4220 2f49  /Text /ImageB /I
0000090: 6d61 6765 4320 2f49 6d61 6765 495d 0a2f  mageC /ImageI]./
00000a0: 4578 7447 5374 6174 6520 3c3c 2f47 3020  ExtGState <</G0 
00000b0: 3420 3020 520a 3e3e 0a2f 466f 6e74 203c  4 0 R.>>./Font <
00000c0: 3c2f 4630 2035 2030 2052 0a3e 3e0a 3e3e  </F0 5 0 R.>>.>>
00000d0: 0a2f 4d65 6469 6142 6f78 205b 3020 3020  ./MediaBox [0 0 
00000e0: 3631 3220 3739 325d 0a2f 436f 6e74 656e  612 792]./Conten
00000f0: 7473 2036 2030 2052 0a3e 3e0a 656e 646f  ts 6 0 R.>>.endo
0000100: 626a 0a36 2030 206f 626a 0a3c 3c2f 4669  bj.6 0 obj.<</Fi
0000110: 6c74 6572 202f 466c 6174 6544 6563 6f64  lter /FlateDecod
0000120: 650a 2f4c 656e 6774 6820 3230 380a 3e3e  e./Length 208.>>
[SNIP]
0003b70: 0a3c 3c2f 5479 7065 202f 5061 6765 730a  .<</Type /Pages.
0003b80: 2f43 6f75 6e74 2031 0a2f 4b69 6473 205b  /Count 1./Kids [
0003b90: 3320 3020 525d 0a3e 3e0a 656e 646f 626a  3 0 R].>>.endobj
0003ba0: 0a78 7265 660a 3020 3131 0a30 3030 3030  .xref.0 11.00000
0003bb0: 3030 3030 3020 3635 3533 3520 6620 0a30  00000 65535 f .0
0003bc0: 3030 3030 3135 3230 3920 3030 3030 3020  000015209 00000 
0003bd0: 6e20 0a30 3030 3030 3030 3031 3520 3030  n .0000000015 00
0003be0: 3030 3020 6e20 0a30 3030 3030 3030 3036  000 n .000000006
0003bf0: 3320 3030 3030 3020 6e20 0a30 3030 3030  3 00000 n .00000
0003c00: 3030 3533 3820 3030 3030 3020 6e20 0a30  00538 00000 n .0
0003c10: 3030 3030 3030 3633 3220 3030 3030 3020  000000632 00000 
0003c20: 6e20 0a30 3030 3030 3030 3235 3920 3030  n .0000000259 00
0003c30: 3030 3020 6e20 0a30 3030 3030 3030 3736  000 n .000000076
0003c40: 3320 3030 3030 3020 6e20 0a30 3030 3030  3 00000 n .00000
0003c50: 3134 3836 3820 3030 3030 3020 6e20 0a30  14868 00000 n .0
0003c60: 3030 3030 3031 3036 3020 3030 3030 3020  000001060 00000 
0003c70: 6e20 0a30 3030 3030 3031 3238 3520 3030  n .0000001285 00
0003c80: 3030 3020 6e20 0a74 7261 696c 6572 0a3c  000 n .trailer.<
0003c90: 3c2f 5369 7a65 2031 310a 2f52 6f6f 7420  </Size 11./Root 
0003ca0: 3220 3020 520a 3e3e 0a73 7461 7274 7872  2 0 R.>>.startxr
0003cb0: 6566 0a31 3532 3635 0a25 2545 4f46       ef.15265.%%EOF

A PDF file is composed of following elements:

      header It provides the version (e.g. %PDF-1.4)
      objects List of objects (text, fonts, graphics, javascript, forms, ...) used in the PDF file.
      xref Table with offsets of objects in the table
      trailer Lists the number of objects, the offset of the xref table and metadata (creation date, author, ...)

PDF objects

Delimiters

PDF objects are delimited by following tags:

X Y obj
endobj

Indirect objects and references

An indirect object is a PDF object that can be referenced.

The below example shows the indirect object ID #2 (version 0), that points to ("R" means reference) indirect object ID #10.

2 0 obj
<<
  /Type /Catalog
  /Pages 10 R
>>
endobj

Compression

  • Streams can be compressed
  • Filters are used for compression (short names are specified into brackets):
    • ASCIIHexDecode (Ahx)
    • ASCII85Decode (A85)
    • CCITTFaxDecode (CCF)
    • DCTDecode (DCT)
    • FlateDecode (Fl)
    • LZWDecode (LZW)
    • RunLengthDecode (RL)
  • FlateDecode is a commonly used filter based on the zlib/deflate algorithm (gzip)
  • The syntax is as follows:
8 0 obj
<<
    /Filter /FlateDecode
    /Length 270
>>
stream
    binary stream
endstream
endobj

Here is an example:

          3820 3020 6f62 6a0a 3c3c 2f46      8 0 obj.<</F
696c 7465 7220 2f46 6c61 7465 4465 636f  ilter /FlateDeco
6465 0a2f 4c65 6e67 7468 2032 3730 0a3e  de./Length 270.>
3e20 7374 7265 616d 0a78 9c5d 91cb 6a85  > stream.x.]..j.
3010 86f7 798a 599e 2e0e 5eea 3976 2182  0...y.Y...^.9v!.
b508 2e7a a1b6 0fa0 c9e8 09d4 1862 5cf8  ...z.........b\.
f68d 138f 8506 12f8 98f9 ffb9 2428 eb97  ............$(..
5a49 0bc1 8799 7883 167a a984 c179 5a0c  ZI....x..z...yZ.
47e8 7090 8a45 3108 c9ed 4ef4 f2b1 d52c  G.p..E1...N....,
70e2 669d 2d8e b5ea 2796 6500 c1a7 8bce  p.f.-...'.e.....
d6ac 702a c4d4 e103 0bde 8d40 23d5 00a7  ..p*.......@#...
efb2 71dc 2c5a ffe0 88ca 42c8 f21c 04f6  ..q.,Z....B.....
cee9 b5d5 6fed 8810 90ec 5c0b 1797 763d  ....o.....\...v=
3bcd 5fc6 d7aa 1162 e2c8 77c3 2781 b36e  ;._....b..w.'..n
399a 560d c8b2 d09d 1cb2 ca9d 9ca1 12ff  9.V.............
e257 afea 7a7e 6b0d 653f baec 302c c27c  .W..z~k.e?..0,.|
a3f8 9928 7922 4a2a a26b 4974 893d 559e  ...(y"J*.kIt.=U.
2e44 69ec a9f0 9452 cddd 3dba d73a 5a4b  .Di....R..=..:ZK
d2bb 3d39 257b b68f 6fcd 6e4b 3d36 c117  ..=9%{..o.nK=6..
63dc 1268 f334 fd36 b754 787c 8e9e f4a6  c..h.4.6.Tx|....
daee 2fc8 9d8a 740a 656e 6473 7472 6561  ../...t.endstrea
6d0a 656e 646f 626a                      m.endobj

Keywords

Keywords define the object types:

Name Description
/AA Automatically execute an action or a script
/AcroForm Adobe Forms. Can launch scripts or actions
/Action Can launch scripts or actions
/Colors
/EmbeddedFile
/Encrypt
/GoTo, /GoToR, /GoToE Changes the view to a specified destination in the PDF file. GoToR can send data to a URL
/JavaScript Embeds JavaScript code in an object
/JBIG2Decode Used for images
/JS Embeds JavaScript code in an object
/Launch Launches a program or opens a document
/Names Can launch scripts or actions
/ObjStm Can hide objects in an object stream
/OpenAction Automatically execute an action or a script
/Page Number of pages in the PDF file
/RichMedia Embeds rich media (e.g. Flash) in an object
/SubmitForm Can send data to URL
/URI Accesses the resource by the URL
Warning
  • Names are case sensitive
  • Names can be encoded using ASCII, hexadecimal or octal (e.g. /#4a#61#76#61#53#63#72#69#70#74 is the hexadecimal representation of /JavaScript)

Specific JavaScript methods

Some JavaScript methods are specific to PDF documents:

Method Description
app.setTimeOut()

Executes a function after the specified time (in milliseconds):

var sh = app.setTimeOut("start()", 10)
app.viewerVersion Returns the version of the PDF viewer
syncAnnotScan(), getAnnots() Retrieves annotations embedded in the document
spell.customDictionaryOpen() Opens a custom dictionary

Online tools

Comments

Pages in category "Digital-Forensics/Malicious-Documents/PDF"