Provided by: ocrodjvu_0.10.4-1_all bug


       djvu2hocr - DjVu to hOCR converter


       djvu2hocr [option...] djvu-file

       djvu2hocr {--version | --help | -h}


       djvu2hocr converts hidden text from a DjVu file to the hOCR[1] format.


   Input selection options
       -p, --pages=page-range
           Specifies pages to covert.  page-range is a comma-separated list of sub-ranges. Each
           sub-range is either a single page (e.g. 17) or a contiguous range of pages
           (e.g. 37-42). Pages are numbered from 1.

           The default is to convert all pages.

   Text segmentation options
           Use the same word segmentation as found in the DjVu file.

           This is the default.

           Use the Unicode Text Segmentation[2] algorithm to break lines into words, possibly
           fixing word segmentation found in the DjVu file.

   HTML output options
           Specifies the document title.

           The default is “DjVu hidden text layer”.

           Add the specified CSS style to the document.

           For example, --css='.ocrx_line { display: block; }' can be used to visually preserve
           line breaks.

   Other options
           Output version information and exit.

       -h, --help
           Display help and exit.


       djvu2hocr uses a custom extension to hOCR to retain characters which cannot be directly
       represented in an HTML/XML document. For example, control character BEL (^G, U+0007), is
       converted into the following HTML chunk: <span class="djvu_char" title="#x07"> </span>


       Please report bugs at:


       djvu(1), hocr2djvused(1), ocrodjvu(1)


        1. hOCR

        2. Unicode Text Segmentation