[olug] Convert TIFF to PDF with OCR?

Obi-Wan obiwan at jedi.com
Sat Feb 20 04:20:47 UTC 2010


> I have a few large TIFF files (scanned documents) that have the text
> embedded in it (via an OCR) so that it can be searched.
> 
> Problem is, the files are very large (about 60MB total).
> 
> Since TIFF is un-compressed, I'd like to convert to a PDF.  My only
> converter is the print-to-PDF, but that looses the OCR text.
> 
> Any suggestions on making the conversion?

TIFF supports numerous methods of compression, including JPEG, LZW,
and CCITT.  I'm pretty familiar with the TIFF format spec, but I'm not
familiar with a standard method of attaching text in a TIFF, so it's
probably done with a custom tag (the "T" in TIFF).  Therefore, I think
you're unlikely to find any app (other than the one that did the
initial encoding) that will retain your text when converting to PDF.
I think you've got two options to pursue:

1) Re-OCR the images using something that will write directly to a PDF.

2) Open the TIFFs in an image processing package and then save them
   as compressed TIFFs, in the hope that the package will pass any
   unrecognized tags along verbatim to the output file.  You may have
   to try or research several packages to find one that does this.

-- 
Ben "Obi-Wan" Hollingsworth                             obiwan at jedi.com
   The stuff of earth competes for the allegiance I owe only to the
     Giver of all good things, so if I stand, let me stand on the
       promise that You will pull me through.  -- Rich Mullins



More information about the OLUG mailing list