Provided by: djvulibre-bin_3.5.27.1-5ubuntu0.1_amd64 bug

NAME

       DjVu - DjVu and DjVuLibre.

INTRODUCTION

       Although  the  Internet  has  given  us  a  worldwide infrastructure on which to build the
       universal library, much of the world knowledge, history, and literature is  still  trapped
       on paper in the basements of the world's traditional libraries. Many libraries and content
       owners are in the process of  digitizing  their  collections.   While  many  such  efforts
       involve  the  painstaking process of converting paper documents to computer-friendly form,
       such as SGML based formats, the  high  cost  of  such  conversions  limits  their  extent.
       Scanning  documents,  and  distributing  the  resulting  images electronically is not only
       considerably cheaper, but also more faithful to the original document because it preserves
       its visual aspect.

       Despite  the  quickly  improving speed of network connections and computers, the number of
       scanned document images accessible on the Web today is relatively small. There are several
       reasons for this.

       The  first reason is the relatively high cost of scanning anything else but unbound sheets
       in black and white. This problem is slowly going away with the appearance of fast and low-
       cost color scanners with sheet feeders.

       The  second  reason  is that long-established image compression standards and file formats
       have proved inadequate for distributing scanned documents at high resolution, particularly
       color documents.  Not only are the file sizes and download times impractical, the decoding
       and rendering times are also prohibitive.  A typical magazine page scanned in color at 100
       dpi  in  JPEG  would  typically  occupy  100  KB  to 200 KB , but the text would be hardly
       readable: insufficient for screen viewing and totally unacceptable for printing. The  same
       page  at 300 dpi would have sufficient quality for viewing and printing, but the file size
       would be 300 KB to 1000 KB at best, which is impractical for remote access. Another  major
       problem  is that a fully decoded 300 dpi color images of a letter-size page occupies 24 MB
       of memory and easily causes disk swapping.

       The third reason is that digital documents are more than just a collection  of  individual
       page  images.  Pages in a scanned documents have a natural serial order. Special provision
       must be made to ensure that flipping pages  be  instantaneous  and  effortless  so  as  to
       maintain a good user experience. Even more important, most existing document formats force
       users to download the entire document first before displaying  a  chosen  page.   However,
       users  often  want  to  jump  to  individual pages of the document without waiting for the
       entire document to download.  Efficient browsing requires efficient  random  page  access,
       fast  sequential  page  flipping,  and  quick  rendering.  This  can  be  achieved  with a
       combination of advanced compression, pre-fetching, pre-decoding, caching, and  progressive
       rendering.  DjVu decomposes each page into multiple components (text, backgrounds, images,
       libraries of common shapes...)  that may be shared by  several  pages  and  downloaded  on
       demand.   All  these  requirements  call for a very sophisticated but parsimonious control
       mechanism  to  handle  on-demand  downloading,  pre-fetching,   decoding,   caching,   and
       progressive  rendering  of  the  page images.  What is being considered here is not just a
       document image compression technique, but a whole platform for document delivery.

       DjVu is an image compression technique, a document format, and  a  software  platform  for
       delivering documents images over the Internet that fulfills the above requirements.

DJVU IMAGE COMPRESSION

       The DjVu image compression is based on three technologies:

   DjVuPhoto
       DjVuPhoto,  also  known  as  IW44,  is  a  wavelet-based continuous-tone image compression
       technique with progressive decoding/rendering.  It is best used for encoding  photographic
       images in colors or in shades of gray.  Images are typically half the size as JPEG for the
       same distortion.

   DjVuBitonal
       DjVuBitonal, also known as JB2, is a bitonal image compression  that  takes  advantage  of
       repetitions  of  nearly  identical  shapes on the page (such as characters) to efficiently
       compress text images.  It is best used to compress black  and  white  images  representing
       text and simple drawings.  A typical 300 dpi page in DjVuBitonal occupies 5 to 25 KB (3 to
       8 times better than TIFF-G4 or PDF ).

   DjVuDocument
       DjVuDocument is a compression technique specifically designed for color digital  documents
       images  containing  both  pictures  and  text, such as a page of a magazine.  DjVuDocument
       represents images into separately compressed layers.   The  foreground  layer  is  usually
       compressed  with DjVu Bitonal and contains the text and drawings.  The background layer is
       usually compressed with DjVuPhoto and contains the background texture and the pictures  at
       lower resolution.

DJVU DOCUMENT DELIVERY PLATFORM

       The  DjVu  technology  is designed from the ground up to support the efficient delivery of
       digital documents over the Internet.  It provides various ways  to  deal  with  multi-page
       documents,  and various ways to enrich the content with hyper-links, meta-data, searchable
       text, etc.

   MIME types
       The DjVu format has an official MIME  type  of  image/vnd.djvu,  which  is  the  preferred
       content-type  to  be  given  by  http  servers for DjVu files.  Unofficial mime types used
       historically are image/x.djvu and image/x-djvu, which may still be encountered.   Ideally,
       clients should be configured to handle all three.  (For web server configuration help, see
       http://www.djvuzone.org/support/tutorial/chapter-authoring1.html.)

   Bundled multi-page documents
       Bundled multi-page DjVu document uses a single file  to  represent  the  entire  document.
       This  single  file  contains all the pages as well as ancillary information (e.g. the page
       directory, data shared by several pages, thumbnails, etc.).  Using a single file format is
       very convenient for storing documents or for sending email attachments.

       When you type the URL of a multi-page document, the DjVu browser plugin starts downloading
       the whole file, but displays the  first  page  as  soon  as  it  is  available.   You  can
       immediately  navigate  to  other  pages  using the DjVu toolbar.  Suppose however that the
       document is stored on a remote web server.  You can easily access the first page  and  see
       that this is not the document you wanted.  Although you will never display the other pages
       the browser is transferring data for these pages and is  wasting  the  bandwidth  of  your
       server  (and  the  bandwidth  of the Internet too).  You could also see the summary of the
       document on the first page and jump to page 100.  But page 100 cannot be  displayed  until
       data  for  pages  1 to 99 has been received.  You may have to wait for the transmission of
       unnecessary page data.  This second problem (the unnecessary wait) can be solved using the
       ``byte serving'' options of the HTTP/1.1 protocol.  This option has to be supported by the
       web server, the proxies, the caches and the browser.  Byte serving however does not  solve
       the first problem (the waste of bandwidth).

   Indirect multi-page documents
       Indirect  multi-page  DjVu  documents  solve  both  problems.  An indirect multi-page DjVu
       document is composed of several files.  The main file is named the index  file.   You  can
       browse  a document using the URL of the index file, just like you do with a bundled multi-
       page document.  The index file however is very small.  It  simply  contains  the  document
       directory  and  the  URLs of secondary files containing the page data.  When you browse an
       indirect multi-page document, the browser  only  accesses  data  for  the  pages  you  are
       viewing.   This can be done at a reasonable speed because the browser maintains a cache of
       pages and sometimes pre-fetches a few pages ahead of the current page.   This  model  uses
       the  web  serving  bandwidth much more effectively.  It also eliminates unnecessary delays
       when jumping ahead to pages located anywhere in a long document.

   Annotations
       Every DjVu image optionally includes so-called annotation chunks.  The annotation chunk is
       often  used  to  define  hyper-links  to  other  document pages or to arbitrary web pages.
       Annotation chunks can also be used for other purposes such as setting the initial  viewing
       mode  of a page, defining highlighted zones, or storing arbitrary meta-data about the page
       or the document.

   Hidden text
       Every DjVu image optionally  includes  a  hidden  text  layer  that  associated  graphical
       features  with  the  corresponding  text.   The  hidden text layer is usually generated by
       running an Optical Character Recognition software.  This textual information provides  for
       indexing DjVu documents and copying/pasting text from DjVu page images.

   Thumbnails
       DjVu documents sometimes contain pre-computed page thumbnails.

   Outline
       DjVu  documents  sometimes  contain  a  navigation chunk containing an outline, that is, a
       hierarchical table of contents with pointers to the corresponding document pages.

DJVUZONE AND DJVULIBRE

       The DjVu technology was initially created by a few researchers in AT&T Labs  between  1995
       and  1999.   Lizardtech,  Inc.  (  http://www.lizardtech.com  ) then obtained a commercial
       license from AT&T and continued the development.  They have now a variety of solutions for
       producing and distributing documents using the DjVu technology.

       The  DjVuZone  web  site  (  http://www.djvuzone.org  )  is  managed  by the few AT&T Labs
       researchers who created the DjVu technology in the  first  place.   We  promote  the  DjVu
       technology by providing an independent source of information about DjVu.

       Understanding  how  little  room  there  is  for a proprietary document format, Lizardtech
       released the DjVu Reference Library under the GNU Public License in December  2000.   This
       library  entirely  defines  the  compression  format and the elementary codecs.  Six month
       later, Lizardtech released an updated DjVu Reference Library as well as the source code of
       the Unix viewer.

       These  two  releases  form  the  basis of our initial DjVuLibre software.  We modified the
       build system to comply with the expectations of the open source community.   Various  bugs
       and  portability  issues  have  been  fixed.   We also tried to make it simpler to use and
       install, while preserving the essential structure of the Lizardtech releases.

       The DjVuLibre software contains the following components:

       bzz(1) A general purpose compression  command  line  program.   Many  internal  DjVu  data
              structures are compressed using this technique.

       c44(1) A DjVuPhoto command line encoder. This state-of-the-art wavelet compressor produces
              DjVuPhoto images from PPM or JPEG images.

       cjb2(1)
              A DjVuBitonal command line encoder. This soft-pattern-matching compressor  produces
              DjVuBitonal  images  from  PBM  images.   It  can  encode  images  without loss, or
              introduce small changes in order to improve the compression  ratio.   The  lossless
              encoding mode is competitive with that of the Lizardtech commercial encoders.

       cpaldjvu(1)
              A  DjVuDocument  command  line encoder for images with few colors.  This encoder is
              well suited to compressing images with a small  number  of  distinct  colors  (e.g.
              screen-shots).   The  dominant color is encoded by the background layer.  The other
              colors are encoded by the foreground layer.

       csepdjvu(1)
              A DjVuDocument command line encoder for separated images.   This  encoder  takes  a
              file  containing  pre-segmented  foreground  and  background  images and produces a
              DjVuDocument image.

       ddjvu(1)
              A command line decoder  for  DjVu  images.   This  program  produces  a  PNM  image
              representing any segment of any page of a DjVu document at any resolution.

       djview(1)
              A  stand-alone  viewer  for  DjVu  images.  This sophisticated viewer displays DjVu
              documents.  It implements document navigation as well as fast zooming and panning.

       nsdejavu(1)
              A web browser plugin for viewing DjVu images.  This small plugin allows for viewing
              DjVu  documents from web browsers.  It internally uses djview to perform the actual
              work.

       djvups(1)
              A command line tool for converting DjVu documents into PostScript .

       djvm(1)
              A command line tool for  manipulating  bundled  multi-page  DjVu  documents.   This
              program is often used to collect individual pages and produce a bundled document.

       djvmcvt(1)
              A  command  line  tool  for  converting bundled documents to indirect documents and
              conversely.

       djvused(1)
              A powerful command line tool for manipulating  multi-page  documents,  creating  or
              editing  annotation  chunks,  creating or editing hidden text layers, pre-computing
              thumbnail images, and more...

       djvutxt(1)
              A command line tool to extract the hidden text from DjVu documents.

       djvudump(1)
              A command line tool  for  inspecting  DjVu  files  and  displaying  their  internal
              structure.

       djvuextract(1)
              A command line tool for dis-assembling DjVu image files.

       djvumake(1)
              A command line tool for assembling DjVu image files.

       djvuserve(1)
              A CGI program for generating indirect multi-page DjVu documents on the fly.

       djvutoxml(1), djvuxmlparser(1)
              Command line tools to edit DjVu metadata as XML files.

DJVU ENCODERS AND ANY2DJVU

       DjVuLibre  comes  with  a variety of specialized encoders, c44(1) for photographic images,
       cjb2(1) for bitonal images, and cpaldjvu(1) for images with few distinct colors.  Although
       these  encoders perform well in their specialized domain, they cannot handle complex tasks
       involving segmentation and multipage encoding.

       The Lizardtech commercial products (see http://www.lizardtech.com/solutions/document)  can
       perform these complex encoding tasks

       Another  solution is provided by the compression server at (http://any2djvu.djvuzone.org).
       This machine uses pre-lizardtech prototype encoders from AT&T Labs and performs almost  as
       well  as  the  commercial  Lizardtech encoders.  Please note that the Any2DjVu compression
       server comes with no guarantee, that nothing is done to ensure that  your  documents  will
       remain confidential, and that there is only one computer working for the whole planet.

CREDITS

       Numerous  people  have  contributed  to  the  DjVu source code during the last five years.
       Please submit a sourceforge bug report to update the following list.

          Yoshua Bengio, Léon Bottou, Chakradhar Chandaluri, Regis M. Chaplin, Ming  Chen,  Parag
          Deshmukh,  Royce  Edwards,  Andrew  Erofeev,  Praveen  Guduru, Patrick Haffner, Paul G.
          Howard, Orlando Keise, Yann Le Cun, Artem  Mikheev,  Florin  Nicsa,  Joseph  M.  Orost,
          Steven  Pigeon,  Bill  Riemers,  Patrice  Simard,  Jeffery  Triggs, Luc Vincent, Pascal
          Vincent.