lunar (1) catdvi.1.gz

Provided by: catdvi_0.14-14_amd64 bug

NAME

       catdvi - a DVI to plain text converter

SYNOPSIS

       catdvi    [-d debuglevel,    --debug=debuglevel]   [-e outenc,   --output-encoding=outenc]
       [-p pagespec, --first-page=pagespec] [-l pagespec, --last-page=pagespec] [-N, --list-page-
       numbers]   [-s,   --sequential]   [-U,  --show-unknown-glyphs]  [-h,  --help]  [--version]
       [--copyright] [dvi-file]

DESCRIPTION

       This manual page documents catdvi version 0.14

       catdvi reads the DVI (typesetter DeVice Independent) file dvi-file and dumps a plain  text
       approximation of the document it describes to stdout.  If the argument dvi-file is omitted
       or a dash (`-'), catdvi  will  read  from  stdin.   Several  output  encodings  (different
       character sets of the plain text output) are supported, most notably UTF-8.

       The  current  version  of  catdvi  is  a work in progress; it may not be robust enough for
       production use, but already works  fine  with  linear  english  text.   Many  mathematical
       symbols  (e.g.  the uppercase greek letters) and moderately complex formulae also come out
       right.

       The program needs to read the TFM (Tex Font Metric) files corresponding to the fonts  used
       in  the DVI file.  These are searched (and, if necessary and possible, created on the fly)
       through the Kpathsea library.

       In order to correctly translate a DVI file to text, the input encoding of the  fonts  used
       in  it (i.e. a meaning-preserving mapping from font code points to Unicode) must be known.
       There are a lot of different font encodings  in  use.  At  the  time  of  writing,  catdvi
       understands the following input encodings:

       `TEX TEXT'
              Knuth's original font encoding, also known as OT1.

       `TEX TEXT WITHOUT F-LIGATURES'
              A variant of the above.

       `EXTENDED TEX FONT ENCODING - LATIN'
              The Cork encoding, also known as T1.

       `TEX MATH ITALIC'
              The encoding of Knuth's math italic fonts, also known as OML.

       `TEX MATH SYMBOLS'
              The encoding of Knuth's math symbol fonts, also known as OMS.

       `TEX MATH EXTENSION' (most of it)
              The  encoding of Knuth's math extension fonts (big operators, brackets, etc.), also
              known as OMX.

       `TEX TYPEWRITER TEXT'
              The encoding of Knuth's typewriter type fonts.

       `LATEX SYMBOLS'
              The encoding of the lasy fonts.

       Henrik Theilings European currency symbol (`eurosym') font.

       `TEX TEXT COMPANION SYMBOLS 1---TS1' (almost everything)
              The encoding of the text companion fonts.

       Martin Vogels symbol (`MarVoSym') font.
              Both the 1998 and the 2000 version are supported as far as possible --  about  half
              of the symbols are not representable in Unicode.

       `BLACKBOARD'
              The encoding of the blackboard bold math (`bbm') fonts.

       All AMS fonts except the Cyrillic ones.
              This  includes  the  AMS  math  symbols  group  A and group B, Euler fraktur, Euler
              cursive, Euler script and Euler compatible extension fonts.

       It is impossible to do perfect translation from unmarked-up DVI to plain text,  since  the
       former  does  only  describe  the  layout  of a page, and a translator such as this should
       really know where words and paragraphs end, and more importantly, which glyphs  should  be
       aligned vertically and which shouldn't.  The current alignment algorithm tries to preserve
       the relative horizontal positions of word beginnings; this works well in most cases.  Word
       breaks  are  detected  using simple heuristics; paragraphs are not detected at all (and no
       paragraph fill is attempted).

       The price of alignment is that the output will likely be more than 80 columns  wide,  even
       though  catdvi tries very hard not to use more columns than strictly necessary.  Output is
       usually less than 120 columns, almost always less than 132 columns wide. It may be a  good
       idea to switch your terminal to one of these modes if possible.

OPTIONS

       The program follows the usual GNU command line syntax, with long options starting with two
       dashes.

       -d debuglevel, --debug=debuglevel
              Set the debug output level to debuglevel (default is 10).  Large values will result
              in  lots  of  debug  output,  0  in  none  at  all.  The maximal debug output level
              currently used is 150.

       -e outenc, --output-encoding=outenc
              Specify the encoding of the output character set.  outenc can be one of the numbers
              or  names  from the table below.  Names are case insensitive.  The following output
              encodings should be available:

              0: UTF-8
              1: US-ASCII
              2: ISO-8859-1
              3: ISO-8859-15

              The command catdvi --help (see below) will give  a  more  up-to-date  list  of  all
              compiled-in output encodings. The default encoding is 1.

       -p pagespec, --first-page=pagespec
              Do  not  output  pages  before  page  pagespec.   Pages  can  be specified in three
              different ways; the first two are exactly the same as for dvips(1).

              A (possibly negative) number num specifies a TeX page number, which  is  stored  as
              the so-called count0 value in the DVI file for every page.  Plain TeX uses negative
              page numbers for roman-numbered frontmatter (title page, preface, TOC, etc.) so the
              count0 values compare as
                     -1 < -2 < -3 < ... < 1 < 2 < 3 < ...
              There  may  be  several pages with the same count0 value in a single DVI file. This
              usually happens in documents with a per-chapter page numbering scheme.

              A number prefixed by an equals sign (`=num') specifies a physical  page,  i.e.  the
              num-th page appearing in the DVI file. Numbering starts with 1.  Note that with the
              long form of the option you actually need two equals signs, one as part of the long
              option and one as part of the page specification. Example:
                     catdvi --first-page==5 foo.dvi

              The  third  form  of  a  page  specification,  two  numbers  separated  by  a colon
              (`num1:num2'),  is  useful  for  documents  with  separately-numbered  parts,  e.g.
              chapters.   It  refers  to  the  page  with  count0 value equal to num2 that catdvi
              believes to be in part num1.  Since those part numbers are not stored  in  the  DVI
              file,  the  program  has to guess them: an internal chapter counter is increased by
              one every time the count0 value of the  current  page  is  not  greater  (in  above
              ordering)  than  that of the previous page.  The counter is initialized to 1 if the
              first page has  negative  count0  value  and  to  0  otherwise.  (A  document  with
              separately  numbered  parts  will  probably have separately numbered frontmatter as
              well, and then this rule keeps the  internal  counter  equal  to  real  world  part
              numbers.)

       -l pagespec, --last-page=pagespec
              Do  not  output  pages after page pagespec.  Pages are specified exactly as for the
              --first-page option above.

       -N, --list-page-numbers
              Instead of the contents of pages, output their physical page  count,  count0  value
              and chapter count (see the --first-page option above for a definition of these).

       -s, --sequential
              Do not attempt to reproduce the page layout; output glyphs in the order they appear
              in the DVI file. This may be useful with e.g. multi-column page layouts.

       -U, --show-unknown-glyphs
              Show the Unicode number of unknown glyphs instead of `?'.

       -h, --help
              Show usage information and a list of available output encodings, then exit.

       --version
              Show version information and exit.

       --copyright
              Show copyright information and exit.

ENVIRONMENT

       The usual environment variables TFMFONTS, TEXFONTS, etc.  for  Kpathsea  font  search  and
       creation apply.  Refer to the Kpathsea documentation for details.

SEE ALSO

       xdvi(1), dvips(1), tex(1), mktextfm(1), the Kpathsea texinfo documentation, utf-8(7).

BUGS

       These things do not work (yet):

       •      No rules are converted.

       •      Extensible  recipes (very large brackets, braces, etc. built out of several smaller
              pieces) are not properly handled.

       •      Complicated  math  formulae  are  sometimes  misaligned  (mostly  due  to  lack  of
              appropriate word break heuristics).

       •      Some fonts and font encodings are not recognised yet.

       •      Most  mathematical symbols have no representation in the available output character
              sets except Unicode, and hence show up as  `?'  unless  UTF-8  output  encoding  is
              selected. A textual transcription would be desirable.

       Watch out for these:

       •      If  there  is  a space where it does not belong or if there is no space where there
              should be one, report this as a bug (send the DVI file to  the  catdvi  maintainer,
              stating where in the file the bug is seen).

AUTHORS

       catdvi was written by Antti-Juhani Kaijanaho <gaia@iki.fi>, based on a skeletal version by
       J.H.M. Dassen  (Ray).    Bjoern   Brill   <brill@fs.math.uni-frankfurt.de>   did   further
       improvements and currently maintains the program.

       The  manual  page  was  compiled  by Bjoern Brill, using material written by the first two
       program authors.

                                         8 November 2002                                CATDVI(1)