Ubuntu Manpage: csepdjvu - DjVu encoder for separated data files.

Provided by: djvulibre-bin_3.5.27.1-5ubuntu0.1_amd64

NAME

       csepdjvu - DjVu encoder for separated data files.

SYNOPSIS

       csepdjvu  [options] [sepfiles]... outputdjvufile

DESCRIPTION

       This  program  creates  a  DjVuDocument  file  outputdjvufile  from  separated  data files
       sepfiles.  It can read separated data from the standard input when  given  a  single  dash
       instead  of  the  separated  data file names.  This feature is intended for pre-processing
       programs that push separated data into csepdjvu via a pipe.

       Each separated data file represents one or more page images.  When the  program  arguments
       specify  multiple  pages,  all  the  pages  are  encoded and saved as a bundled multi-page
       document.  When the program arguments specify a single page, the page is encoded and saved
       as a single page file.

OPTIONS

-d n Specify the resolution information encoded into the output file expressed in dots
per inch. The resolution information encoded in DjVu files determine how the
decoder scales the image on a particular display. Meaningful resolutions range
from 25 to 6000. The default value is 300 dpi.

-q n,...,n

-q n+...+n
Specify the encoding quality of the IW44 encoded background layer. The option
argument contain several integers (one per chunk) separated by either commas or
pluses. This option is similar to option -slice of program c44. Please refer to
the c44(1) man page for additional details. The default quality specification is
-q 72,83,93,103.

This option does not apply to uniformly white background that were not specified by
the separated data but are called for by the DjVu specification. Such background
images always come at the lowest possible resolution and with a standard quality
setting that ensures the color uniformity.

-t Program csepdjvu interprets certain comments in the separated file to construct a
hidden text layer in the DjVu file. This layer records the location of each word
for hiliting purposes. This option reduces the file size by simply recording the
location of each line.

-v Display a brief message describing each page.

-vv Display extensive informational messages during encoding.

SEPARATED DATA FILE FORMAT

Each separated data file contains a concatenation of one or more separated page images.
Each page is logically represented by a foreground image with a transparent color and by a
background image visible through the transparent pixels. The data for each separated page
image is the concatenation of the following data blocks:

* A foreground image encoded using either the "Color RLE format" or the "Bitonal RLE
format". These formats are described later in this section.

* An optional background image encoded as a "Portable Pixmap" ( PPM ). This well known
format is summarized later in this section. The absence of a background image simply
indicates that a uniformly white background should be assumed.

* An arbitrary number of comment lines starting with character "#" and terminated by a
linefeed character. Comment lines whose first word starts with a capital letter have
special meanings documented later in this document.

The dimensions (width and height) of the background image must be obtained by rounding up
the quotient of the foreground image dimensions by an integer reduction factor ranging
from 1 to 12. Assume, for instance, that the width of the foreground is 2507 and the
reduction factor is 3. The width of the background image will be the integer ratio
(2507+2)/3.

Color RLE format
The Color RLE format is a simple run-length encoding scheme for color images with a
limited number of distinct colors. The data always begin with a text header composed of
the two characters "R6", the number of columns, the number of rows, and the number of
color palette entries. All numbers are expressed in decimal ASCII. These four items are
separated by blank characters (space, tab, carriage return, or linefeed) or by comment
lines introduced by character "#". The last number is followed by exactly one character
which usually is a linefeed character.

The header is followed by the color palette containing three bytes per color entry. The
bytes represent the red, green, and blue components of the color.

The palette is followed by a collection of four bytes integers (most significant bit
first) representing runs of pixels with an identical color. The twelve upper bits of this
integer indicate the index of the run color in the palette entry. The twenty lower bits
of the integer indicate the run length. Color indices greater than 0xff0 are reserved.
Color index 0xfff is used for transparent runs. Each row is represented by a sequence of
runs whose lengths add up to the image width. Rows are encoded starting with the top row
and progressing toward the bottom row.

Bitonal RLE format
The Bitonal RLE format is a simple run-length encoding scheme for bitonal images. The
data always begin with a text header composed of the two characters "R4", the number of
columns, and the number of rows. All numbers are expressed in decimal ASCII. These three
items are separated by blank characters (space, tab, carriage return, or linefeed) or by
comment lines introduced by character "#". The last number is followed by exactly one
character which usually is a linefeed character.

The rest of the file encodes a sequence of numbers representing the lengths of alternating
runs of transparent and black pixels. Lines are encoded starting with the top line and
progressing toward the bottom line. Each line starts with a white run. The decoder knows
that a line is finished when the sum of the run lengths for that line is equal to the
number of columns in the image. Numbers in range 0 to 191 are represented by a single
byte in range 0x00 to 0xbf. Numbers in range 192 to 16383 are represented by a two byte
sequence: the first byte, in range 0xc0 to 0xff, encodes the six most significant bits of
the number, the second byte encodes the remaining eight bits of the number. This scheme
allows for runs of length zero, which are useful when a line starts with a black pixel,
and when a very long run (whose length exceeds 16383) must be split into smaller runs.

Portable Pixmap (PPM) format
The Portable Pixmap format is a well known format for representing color images. Check
the ppm(1) man page for complete information.

The data always begin with a text header composed of the two characters "P6", the number
of columns, the number of rows, and the maximal value of a color component (usually 255).
All numbers are expressed in decimal ASCII. These three items are separated by blank
characters (space, tab, carriage return, or linefeed) or by comment lines introduced by
character "#". The last number is followed by exactly one character which usually is a
linefeed character.

The rest of the file encodes all the pixels. Each pixel is represented by three bytes
representing the red, green and blue component of the pixel. Pixels are ordered in left
to right, top to bottom.

Comments in separated files
Each page is followed by an arbitrary number of comment lines starting with character "#"
and terminated by a linefeed character. Comment lines whose first word starts with a
capital letter have special meanings. The following constructs are currently defined:

* # T px:py dx:dy wxh+x+y (string)
This constructs indicates that the piece of text string must be associated with an area
of size wxh at position x,y relative to the lower left corner of the page. The string
is UTF-8 encoded. Special characters can be escaped as in PostScript using the
backslash character. Integers px, and py represent the position of the current point
on the text baseline before the text was drawn. The drawing operation then moves the
current point by dx, and dy pixels. When such comments are present, csepdjvu produces
a hidden text layer for the corresponding pages.

* # L wxh+x+y (url)
This construct indicates that an hyperlink to url url should be associated with area of
size wxh at position x,y. When such comments are present, csepdjvu produces pages with
an annotation chunk containing the specified hyperlinks.

* # B count (string) (#pageno)
This constructs provides outline information for the document. An outline entry
entitled string is associated with page pageno. Integer count indicates how many of
the following outline entries must be attached to the current entry as subentries.
When such comments are present in the first page csepdjvu produces an navigation chunk
with the specified outline.

* # P (string)
Provides title string for the current page.

CREDITS