Ubuntu Manpage: djvu2hocr - DjVu to hOCR converter

NAME

       djvu2hocr - DjVu to hOCR converter

SYNOPSIS


       djvu2hocr [option...] djvu-file

       djvu2hocr {--version | --help | -h}

DESCRIPTION

       djvu2hocr converts hidden text from a DjVu file to the hOCR[1] format.

OPTIONS

   Input selection options
       -p, --pages=page-range
           Specifies pages to covert.  page-range is a comma-separated list of sub-ranges. Each sub-range is
           either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from
           1.

           The default is to convert all pages.

   Text segmentation options
       --word-segmentation=simple
           Use the same word segmentation as found in the DjVu file.

           This is the default.

       --word-segmentation=uax29
           Use the Unicode Text Segmentation[2] algorithm to break lines into words, possibly fixing word
           segmentation found in the DjVu file.

   HTML output options
       --title=title
           Specifies the document title.

           The default is “DjVu hidden text layer”.

       --css=style
           Add the specified CSS style to the document.

           For example, --css='.ocrx_line { display: block; }' can be used to visually preserve line breaks.

   Other options
       --version
           Output version information and exit.

       -h, --help
           Display help and exit.

PORTABILITY

       djvu2hocr uses a custom extension to hOCR to retain characters which cannot be directly represented in an
       HTML/XML document. For example, control character BEL (^G, U+0007), is converted into the following HTML
       chunk: <span class="djvu_char" title="#x07"> </span>

BUGS

       Please report bugs at: https://github.com/jwilk/ocrodjvu/issues

NOTES

        1. hOCR
           https://docs.google.com/View?docid=dfxcv4vc_67g844kf

        2. Unicode Text Segmentation
           http://unicode.org/reports/tr29/

djvu2hocr 0.10.2                                   2017-02-07                                       DJVU2HOCR(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

PORTABILITY

BUGS

SEE ALSO

NOTES