Provided by: djvulibre-bin_3.5.27.1-5ubuntu0.1_amd64 bug

NAME

       djvutoxml, djvuxmlparser - DjVuLibre XML Tools.

SYNOPSIS

       djvutoxml [options] inputdjvufile [outputxmlfile]
       djvuxmlparser [ -o djvufile ] inputxmlfile

DESCRIPTION

       The  DjVuLibre  XML  Tools  provide  for  editing the metadata, hyperlinks and hidden text
       associated with DjVu files.  Unlike djvused(1) the DjVuLibre XML Tools  rely  on  the  XML
       technology and can take advantage of XML editors and verifiers.

DJVUTOXML

       Program  djvutoxml creates a XML file outputxmlfile containing a reference to the original
       DjVu document inputdjvufile as well as  tags  describing  the  metadata,  hyperlinks,  and
       hidden text associated with the DjVu file.

       The following options are supported:

       --page pagenum
              Select a page in a multi-page document.  Without this option, djvutoxml outputs the
              XML corresponding to all pages of the document.

       --with-text
              Specifies the HIDDENTEXT element for each page should be included  in  the  output.
              If  specified  without the --with-anno flag then the --without-anno is implied.  If
              none of the --with-text, --without-text, --with-anno, or --without-anno, flags  are
              specified, then the --with-text and --with-anno flags are implied.

       --without-text
              Specifies not to output the HIDDENTEXT element for each page.  If specified without
              the --without-anno flag then the --with-anno flag is implied.

       --with-anno
              Specifies the area MAP element for each page should be included in the output.   If
              specified without the --with-text flag then the --without-text flag is implied.

       --without-anno
              Specifies  the area MAP element for each page should not be included in the output.
              If specified without the --without-text flag then the --with-text flag is implied.

DJVUXMLPARSER

       Files produced by djvutoxml can then be modified using either  a  text  editor  or  a  XML
       editor.   Program  djvuxmlparser  parses  the XML file inputxmlfile in order to modify the
       metadata of the corresponding DjVu file.

       -o djvufile
              In principle the target DjVu file is the file referenced by the OBJECT  element  of
              the XML file.  This option provides the means to override the filename specified in
              the OBJECT element.

DJVUXML DOCUMENT TYPE DEFINITION

       The document type definition file (DTD)

         /usr/share/djvu/pubtext/DjVuXML-s.dtd

       defines the input and output of the DjVu XML tools.

       The DjVuXML-s DTD is a simplification of the HTML DTD:

         http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html

       with a few new attributes added specific to DjVu.  Each of the specified pages of  a  DjVu
       document are represented as OBJECT elements within the BODY element of the XML file.  Each
       OBJECT element may contain multiple PARAM elements to specify attributes like  page  name,
       resolution,  and  gamma  factor.   Each  OBJECT  element  may also contain one HIDDENTTEXT
       element to specify the hidden text (usually generated with an OCR engine) within the  DjVu
       page.   In  addition  each  OBJECT  element  may reference a single area MAP element which
       contains multiple AREA elements to represent all the hyperlink and highlight areas  within
       the DjVu document.

   PARAM Elements
       Legal  PARAM  elements of a DjVu OBJECT include but are not limited to PAGE for specifying
       the page-name, GAMMA for specifying the gamma correction factor (normally  2.2),  and  DPI
       for specifying the page resolution.

   HIDDENTEXT Elements
       The  HIDDENTEXT  elements  consists  of nested elements of PAGECOLUMNS, REGION, PARAGRAPH,
       LINE, and WORD.  The most deeply nested element specified,  should  specify  the  bounding
       coordinates  of  the  element in top-down orientation.  The body of the most deeply nested
       element should contain the text.  Most DjVu documents use  either  LINE  or  WORD  as  the
       lowest level element, but any element is legal as the lowest level element.  A white space
       is always added between WORD elements and  a  line  feed  is  always  added  between  LINE
       elements.   Since  languages such as Japanese do not use spaces between words, it is quite
       common for Asian OCR engines to use WORD as characters instead.

   MAP Elements
       The body of the MAP elements consist of AREA elements.   In  addition  to  the  attributes
       listed in

         http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA,

       the  attributes  bordertype, bordercolor, border, and highlight have been added to specify
       border type, border color, border width, and highlight colors respectively.  Legal  values
       for each of these attributes are listed in the DjVuXML-s DTD.  In addition, the shape oval
       has been added to the legal list of shapes.  An oval uses a rectangular bounding box.

BUGS

       Perhaps it would have been better to use CC2 style  sheets  with  standard  HTML  elements
       instead of defining the HIDDENTEXT element.

CREDITS

       The  DjVu  XML tools and DTD were written by Bill C. Riemers <docbill@sourceforge.net> and
       Fred Crary.

SEE ALSO

       djvu(1), djvused(1), and utf8(7).