Provided by:
exactimage_0.7.4-2ubuntu1_i386 
NAME
hocr2pdf - hOCR to PDF converter of the ExactImage library
SYNOPSIS
hocr2pdf [-c|--concurrent-lines NUMBER] [-d|--directions BITFIELD]
[-s|--line-skip NUMBER] [-t|--threshold VALUE] FILE...FILE
hocr2pdf --help
DESCRIPTION
ExactImage is a fast C++ image processing library. Unlike ImageMagick,
it allows operation in several color spaces and bit depths natively,
resulting in much lower memory and computational requirements. Some
optimized algorithms operate in 1/20 of the time ImageMagick requires,
and displaying large images can be as fast as 1/10 of the time the
"display" program takes.
hocr2pdf is a command line front-end for the image processing library
to create perfectly layouted, searchable PDF files from hOCR, annotated
HTML, input obtained from an OCR system.
OPTIONS
-i|--input FILE
Input image filename.
-o|--output FILE
Output PDF filename.
-n|--no-image
Do not place the image over the text.
-r|--resolution RESOLUTION
Resolution overwrite.
-s|--sloppy-text
Sloppily place text, group words, do not draw single glyphs.
-t|--text
Extract text, including trying to remove hyphens.
-h|--help
Show summary of options.
EXAMPLES
Creating a Searchable PDF from hOCR input
hOCR, annotated HTML, input must be provided to STDIN, and the image
data is read using the filename from the -i or --input argument. For
example:
$ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr
By default the text layer is hidden by the real image data. Including
image data can be disabled via the -n, --no-image, so that just the
recognized text from the OCR is visible - e.g. for debugging or to save
storage space:
$ hocr2pdf -i scan.tiff -n -o test.pdf < cuneiform-out.hocr
Too many gabs between letters in individual words
This might be a problem with imprecise OCR data or justified text with
huge gabs. ExactImage includes a special mode activated with the
command line argument -s, --sloppy-text, to group glyphs between
whitespace to words which can help PDF viewers to produce better
results while cut and pasting text:
$ hocr2pdf -i scan.tiff -s -o test.pdf < cuneiform-out.hocr
SEE ALSO
exactimage(7)
bardecode(1)
e2mtiff(1)
econvert(1)
edentify(1)
empty-page(1)
optimize2bw(1)
HOMEPAGE
More information about hocr2pdf and the ExactImage project can be found
at <http://www.exactcode.de/site/open_source/exactimage/>.
AUTHOR
ExactImage was written by ExactCODE GmbH <http://www.exactcode.de/>.
This manual page was written by Daniel Baumann <daniel@debian.org>, for
the Debian project (but may be used by others).