Provided by: exactimage_0.7.4-2ubuntu1_i386 bug

NAME

       hocr2pdf - hOCR to PDF converter of the ExactImage library

SYNOPSIS

       hocr2pdf   [-c|--concurrent-lines  NUMBER]  [-d|--directions  BITFIELD]
       [-s|--line-skip NUMBER] [-t|--threshold VALUE] FILE...FILE

       hocr2pdf --help

DESCRIPTION

       ExactImage is a fast C++ image processing library. Unlike  ImageMagick,
       it  allows  operation  in several color spaces and bit depths natively,
       resulting in much lower memory  and  computational  requirements.  Some
       optimized  algorithms operate in 1/20 of the time ImageMagick requires,
       and displaying large images can be as fast as  1/10  of  the  time  the
       "display" program takes.

       hocr2pdf  is  a command line front-end for the image processing library
       to create perfectly layouted, searchable PDF files from hOCR, annotated
       HTML, input obtained from an OCR system.

OPTIONS

       -i|--input FILE
           Input image filename.

       -o|--output FILE
           Output PDF filename.

       -n|--no-image
           Do not place the image over the text.

       -r|--resolution RESOLUTION
           Resolution overwrite.

       -s|--sloppy-text
           Sloppily place text, group words, do not draw single glyphs.

       -t|--text
           Extract text, including trying to remove hyphens.

       -h|--help
           Show summary of options.

EXAMPLES

       Creating a Searchable PDF from hOCR input

       hOCR,  annotated  HTML,  input must be provided to STDIN, and the image
       data is read using the filename from the -i or  --input  argument.  For
       example:

       $ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr

       By  default  the text layer is hidden by the real image data. Including
       image data can be disabled via the -n, --no-image,  so  that  just  the
       recognized text from the OCR is visible - e.g. for debugging or to save
       storage space:

       $ hocr2pdf -i scan.tiff -n -o test.pdf < cuneiform-out.hocr

       Too many gabs between letters in individual words

       This might be a problem with imprecise OCR data or justified text  with
       huge  gabs.  ExactImage  includes  a  special  mode  activated with the
       command line  argument  -s,  --sloppy-text,  to  group  glyphs  between
       whitespace  to  words  which  can  help  PDF  viewers to produce better
       results while cut and pasting text:

       $ hocr2pdf -i scan.tiff -s -o test.pdf < cuneiform-out.hocr

SEE ALSO

       exactimage(7)

       bardecode(1)

       e2mtiff(1)

       econvert(1)

       edentify(1)

       empty-page(1)

       optimize2bw(1)

HOMEPAGE

       More information about hocr2pdf and the ExactImage project can be found
       at <http://www.exactcode.de/site/open_source/exactimage/>.

AUTHOR

       ExactImage was written by ExactCODE GmbH <http://www.exactcode.de/>.

       This manual page was written by Daniel Baumann <daniel@debian.org>, for
       the Debian project (but may be used by others).