34

I can not decrypt the data from the stream like:

    56 0 obj 
    << /Length 1242 /Filter /FlateDecode >>
    stream
    x]êΩnƒ Ñ{ûbÀKq¬æê¢....(whole binary is omitted)
    endstream
    endobj

I tried isolate the binary content (x]êΩnƒ Ñ{ûbÀKq¬æ\âê¢....) in a file and in a binary string. Decoding function gzinflate($encripted_data) sends me error of decoding, and I think it happens because encoded content in not "deflated" or so.

In PDF Reference v 1.7, (six edition), on a page 67, I found the description of /FlateDecode filter as: ...Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data

I need real raw solution, aka php function or/and algorithm what to do with this "\FlateDecoded" stream.

Thank You!

  • Do you need this function for selected objects only or for all compressed streams (and all compression schemes)? – Kurt Pfeifle Jul 31 '12 at 1:17
  • Dear Kurt! I'll be glad to know how to deal with all kind of filters like: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt, but in real life, FlateDecode is the most used in PDF files which was produced by "print to PDF..."-s))), and now I really need to deal with this single filter. – Ruben Kazumov Jul 31 '12 at 1:35
  • You say 'I think it happens because encoded content is not "deflated" or so'. -- That's why I gave you the hint about qpdf in my answer. You can use it (at least) to verify or falsify your own efforts, even if it turns out to not be meeting your direct requirements. Also your 56 0 obj-object can be anything. If you don't tell from where in the PDF it is referenced as 56 0 R there is no way to know if it is an ICC profile, a font, an image, some page content or something else... – Kurt Pfeifle Jul 31 '12 at 2:19
  • Dear Kurt! May be qpdf is good solution for taks, like this, but unfortunately, qpdf is the "shell" or command-line solution. Not my case. Bitte verzeih mir! Danke für die Hinweise! – Ruben Kazumov Jul 31 '12 at 2:56
53

Since you didn't tell if you need to access one decompressed stream only or if you need all streams decompressed, I'll suggest you a simple commandline tool which does it in one go for the complete PDF: Jay Berkenbilt's qpdf.

Example commandline:

 qpdf --qdf --object-streams=disable in.pdf out.pdf

out.pdf can then be inspected in a text editor (only embedded ICC profiles, images and fonts could still be binary).

qpdf will also automatically re-order the objects and display the PDF syntax in a normalized way (and telling you in a comment what the original object ID of the de-compressed object was).

Should you require to re-compress the file again (maybe after you edited it), just run this command:

 qpdf out-edited.pdf out-recompressed.pdf

(You may see some warning message, telling that the utility was attempting to repair a damaged file....)

qpdf is multi-platform and available from Sourceforge.

  • How can we re-compress pdf file, for example, after modifying a text in uncompressed file? – Kemal Dağ Feb 6 '15 at 14:31
  • 1
    @KemalDağ: See my updated answer. – Kurt Pfeifle Feb 6 '15 at 15:44
  • Thanks. It re-compresses the original one. But, after opening it in adobe reader, it raises following error : "This document enabled extended features in Adobe Reader. The document has been changed since it was created and use of extended features is no longer available. Please contact the author for the original version of this document." There are fillable fields in the PDF form. Is there a method to modify PDF files without Adobe Reader to raise above error? Because Adobe disables fillable fields after recompression. – Kemal Dağ Feb 10 '15 at 12:22
  • And after doing just uncompressing and decompressing a file, the output file is different than the original input. Should not two be the same? – Kemal Dağ Feb 10 '15 at 12:27
  • @KemalDağ: Using QPDF to uncompress and re-compress all PDF objects will not restore the original PDF exactly. QPDF is providing "content preserving" transformations of a PDF. As I said, un-compressing does also "re-order the objects" and "display the PDF syntax in a normalized way". Upon re-compressing it does not restore the original order of the objects (different order does not change the visibly rendered contents of the pages). – Kurt Pfeifle Oct 16 '16 at 19:34
15
header('Content-Type: text');           // I going to download the result of decoding
$n = "binary_file.bin";                 // decoded part in file in a directory
$f = @fopen($n, "rb");                  // now file is mine
$c = fread($f, filesize($n));           // now I know all about it 
$u = @gzuncompress($c);                 // function, exactly fits for this /FlateDecode filter
$out = fopen("php://output", "wb");     // ready to output anywhere
fwrite($out, $u);                       // output to downloadable file

Jingle bells! Jingle bells!...

gzuncompress() - the solution

5

Long overdue, but someone might find it helpful. In this case: << /Length 1242 /Filter /FlateDecode >> all you need is to pass the isolated binary string (so basically everything between "stream" and "endstream") to zlib.decompress:

import zlib
stream = b"êΩnƒ Ñ{ûbÀKq¬æ\âê"  # binary stream here
data = zlib.decompress(stream) # Here you have your clean decompressed stream

However, if you have/DecodeParms in your PDF object thing become complicated. You will need the /Predictor value and columns number. Better use PyPDF2 for this.

  • 3
    The question is asking for PHP, this solution suggests to use Python. That's not a very good fit. Anyway, this may to obvious to you but not everyone else: You'll need to pass everything in between stream and endstream except the leading and trailing EOL markers. – IInspectable Jan 14 '17 at 15:05
-1

i just used

import de.intarsys.pdf.filter.FlateFilter;

from jpod / source forge and it works well

FlateFilter filter = new FlateFilter(null);
byte[] decoded = filter.decode(bytes, start, end - start);

the bytes are straight from the pdf file

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.