Provided by: doclifter_2.11-1_all bug

NAME

       doclifter - translate troff requests into DocBook

SYNOPSIS

       doclifter [-e encoding] [-h hintfile] [-q] [-x] [-v] [-w] [-V] [-D token=type] [-I path]
                 [-I path] file...

DESCRIPTION

       doclifter translates documents written in troff macros to DocBook. Structural subsets of
       the requests in man(7), mdoc(7), ms(7), me(7), mm(7), and troff(1) are supported.

       The translation brings over all the structure of the original document at section,
       subsection, and paragraph level. Command and C function synopses are translated into
       DocBook markup, not just a verbatim display. Tables (TBL markup) are translated into
       DocBook table markup. PIC diagrams are translated into SVG. Troff-level information that
       might have structural implications is preserved in XML comments.

       Where possible, font-change macros are translated into structural markup.  doclifter
       recognizes stereotyped patterns of markup and content (such as the use of italics in a
       FILES section to mark filenames) and lifts them. A means to edit, add, and save semantic
       hints about highlighting is supported.

       Some cliches are recognized and lifted to structural markup even without highlighting.
       Patterns recognized include such things as URLs, email addresses, man page references, and
       C program listings.

       The tag .in and .ti requests are passed through with complaints. They indicate
       presentation-level markup that doclifter cannot translate into structure; the output will
       require hand-fixing.

       The tag .ta is passed through with a complaint unless the immediarely following by text
       lines contains a tab, in which case the following span of lines containing tabs is lifted
       to a table.

       Under some circumstances, doclifter can even lift formatted manual pages and the text
       output produced by lynx(1) from HTML. If it finds no macros in the input, but does find a
       NAME section header, it tries to interpret the plain text as a manual page (skipping
       boilerplate headers and footers generated by lynx(1)). Translations produced in this way
       will be prone to miss structural features, but this fallback is good enough for simple man
       pages.

       doclifter does not do a perfect job, merely a surprisingly good one. Final polish should
       be applied by a human being capable of recognizing patterns too subtle for a computer. But
       doclifter will almost always produce translations that are good enough to be usable before
       hand-hacking.

       See the Troubleshooting section for discussion of how to solve document conversion
       problems.

OPTIONS

       If called without arguments doclifter acts as a filter, translating troff source input on
       standard input to DocBook markup on standard output. If called with arguments, each
       argument file is translated separately (but hints are retained, see below); the suffix
       .xml is given to the translated output.

       -h
           Name a file to which information on semantic hints gathered during analysis should be
           written.

       -D
           The -D allows you to post a hint. This may be useful, for example, if doclifter is
           mis-parsing a synopsis because it doesn't recognize a token as a command. This hint is
           merged after hints in the input source have been read.

       -I
           The -I option adds its argument to the include path used when docfilter searches for
           inclusions. The include path is initially just the current directory.

       -e
           The -e allows you to set the encoding field to be emitted in the output XML. It
           defaults to ISO-8859-1 (Latin-1).

       -q
           Normally, requests that doclifter could not interpret (usually because they're
           presentation-level) are passed through to XML comments in the output. The -q option
           suppresses this. It also suppresses listing of macros. Messages about requests that
           are unrecognized or cannot be translated go to standard error whatever the state of
           this option. This option is intended to reduce clutter when you believe you have a
           clean lift of a document and want to lose the troff legacy.

       -x
           The -x option requests that doclifter generated DocBook version 5 compatible xml
           content, rather than its default DocBook version 4.4 output. Inclusions and entities
           may not be handled correctly with this switch enabled.

       -v
           The -v option makes doclifter noisier about what it's doing. This is mainly useful for
           debugging.

       -w
           Enable strict portability checking. Multiple instances of -w increase the strictness.
           See the section called “PORTABILITY CHECKING”.

       -V
           With this option, the program emits a version message and exits.

TRANSLATION RULES

       Overall, you can expect that font changes will be turned into Emphasis macros with a Remap
       attribute taken from the troff font name. The basic font names are R, I, B, U, CW, and SM.

       Troff and macro-package special character escapes are mapped into ISO character entities.

       When doclifter encounters a .so directive, it searches for the file. If it can get read
       access to the file, and open it, and the file consists entirely of command lines and
       comments, then it is included. If any of these conditions fails, an entity reference for
       it is generated.

       doclifter performs special parsing when it recognizes a display such as is generated by
       .DS/.DE. It repeatedly tries to parse first a function synopsis, and then plain text off
       what remains in the display. Thus, most inline C function prototypes will be lifted to
       structured markup.

       Some notes on specific translations:

   Man Translation
       doclifter does a good job on most man pages, It knows about the extended UR/UE/UN and URL
       requests supported under Linux. If any .UR request is present, it will translate these but
       not wrap URLs outide them with Ulink tags. It also knows about the extended .L (literal)
       font markup from Bell Labs Version 8, and its friends.

       The .TH macro is used to generate a RefMeta section. If present, the date/source/manual
       arguments (see man(7)) are wrapped in RefMiscInfo tag pairs with those class attributes.
       Note that doclifter does not change the date.

       doclifter performs special parsing when it recognizes a synopsis section. It repeatedly
       tries to parse first a function synopsis, then a command synopsis, and then plain text off
       what remains in the section.

       The following man macros are translated into emphasis tags with a remap attribute: .B, .I,
       .L, .BI, .BR, .BL, .IB, .IR, .IL, .RB, .RI, .RL, .LB, .LI, .LR, .SB, .SM. Some stereotyped
       patterns involving these macros are recognized and turned into semantic markup.

       The following macros are translated into paragraph breaks: .LP, .PP, .P, .HP, and the
       single-argument form of .IP.

       The two-argument form of .IP is translated either as a VariableList (usually) or
       ItemizedList (if the tag is the troff bullet or square character).

       The following macros are translated semantically: .SH,.SS, .TP, .UR, .UE, .UN, .IX. A .UN
       call just before .SH or .SS sets the ID for the new section.

       The \*R, \*(Tm, \*(lq, and \*(rq symbols are translated.

       The following (purely presentation-level) macros are ignored: .PD,.DT.

       The .RS/.RE macros are translated differently depending on whether or not they precede
       list markup. When .RS occurs just before .TP or .IP the result is nested lists. Otherwise,
       the .RS/.RE pair is translated into a Blockquote tag-pair.

       .DS/.DE is not part of the documented man macro set, but is recognized because it shows up
       with some frequency on legacy man pages from older Unixes.

       Certain extension macros originally defined under Ultrix are translated structurally,
       including those that occasionally show up on the manual pages of Linux and other
       open-source Unixes.  .EX/.EE (and the synonyms .Ex/.Ee), .Ds/.De,

       .NT/.NE, .PN, and .MS are translated structurally.

       The following extension macros used by the X distribution are also recognized and
       translated structurally: .FD, .FN, .IN, .ZN, .hN, and .C{/.C} The .TA and .IN requests are
       ignored.

       When the man macros are active, any .Pp macro definition containing the request .PP will
       be ignored. and all instances of .Pp replaced with .PP. Similarly, .Tp will be replaced
       with .TP. This is the least painful way to deal with some frequently-encountered
       stereotyped wrapper definitions that would otherwise cause serious interpretation problems

       Known problem areas with man translation:

       •   Weird uses of .TP. These will sometime generate invalid XML and sometimes result in a
           FIXME comment in the generated XML (a warning message will also go to standard error).

       •   It is debatable how the man macros .HP and .IP without tag should be translated. We
           treat them as an ordinary paragraph break. We could visually simulate a hanging
           paragraph with list markup, but this would not be a structural translation.

   Pod2man Translation
       doclifter recognizes the extension macros produced by pod2man (.Sh, .Sp, .Ip, .Vb, .Ve)
       and translates them structurally.

       The results of lifting pages produced by pod2man should be checked carefully by eyeball,
       especially the rendering of command and function synopses.  Pod2man generates rather
       perverse markup; doclifter's struggle to untangle it is sometimes in vain.

       If possible, generate your DocBook from the POD sources. There is a pod2docbook module on
       CPAN that does this.

   Tkman Translation
       doclifter recognizes the extension macros used by the Tcl/Tk documentation system: .AP,
       .AS, .BS, .BE, .CS, .CE, .DS, .DE, .SO, .SE, .UL, .VS, .VE. The .AP, .CS, .CE, .SO, .SE,
       .UL, .QW and .PQ macros are translated structurally.

   Mandoc Translation
       doclifter should be able to do an excellent job on most mdoc(7) pages, because this macro
       package expresses a lot of semantic structure.

       Known problems with mandoc translation: All .Bd/.Ed display blocks are translated as
       LiteralLayout tag pairs .

   Ms Translation
       doclifter does a good job on most ms pages. One weak spot to watch out for is the
       generation of Author and Affiliation tags. The heuristics used to mine this information
       out of the .AU section work for authors who format their names in the way usual for
       English (e.g. "M. E. Lesk", "Eric S. Raymond") but are quite brittle.

       For a document to be recognized as containing ms markup, it must have the extension .ms.
       This avoids problems with false positives.

       The .TL, .AU, .AI, and .AE macros turn into article metainformation in the expected way.
       The .PP, .LP, .SH, and .NH macros turn into paragraph and section structure. The tagged
       form of .IP is translated either as a VariableList (usually) or ItemizedList (if the tag
       is the troff bullet or square character); the untagged version is treated as an ordinary
       paragraph break.

       The .DS/.DE pair is translated to a LiteralLayout tag pair . The .FS/.FE pair is
       translated to a Footnote tag pair. The .QP/.QS/.QE requests define BlockQuotes.

       The .UL font change is mapped to U.  .SM and .LG become numeric plus or minus size steps
       suffixed to the Remap attribute.

       The .B1 and .B2 box macros are translated to a Sidebar tag pair.

       All macros relating to page footers, multicolumn mode, and keeps are ignored (.ND, .DA,
       .1C, .2C, .MC, .BX, .KS, .KE, .KF). The .R, .RS, and .RE macros are ignored as well.

   Me Translation
       Translation of me documents tends to produce crude results that need a lot of
       hand-hacking. The format has little usable structure, and documents written in it tend to
       use a lot of low-level troff macros; both these properties tend to confuse doclifter.

       For a document to be recognized as containing me markup, it must have the extension .me.
       This avoids problems with false positives.

       The following macros are translated into paragraph breaks: .lp, .pp. The .ip macro is
       translated into a VariableList. The .bp macro is translated into an ItemizedList. The .np
       macro is translated into an OrderedList.

       The b, i, and r fonts are mapped to emphasis tags with B, I, and R Remap attributes. The
       .rb ("real bold") font is treated the same as .b.

       .q(/.q) is translated structurally .

       Most other requests are ignored.

   Mm Translation
       Memorandum Macros documents translate well, as these macros carry a lot of structural
       information. The translation rules are tuned for Memorandum or Released Paper styles;
       information associated with external-letter style will be preserved in comments.

       For a document to be recognized as containing mm markup, it must have the extension .mm.
       This avoids problems with false positives.

       The following highlight macros are translated int Emphasis tags: .B, .I, .R, .BI, .BR,
       .IB, .IR, .RB, .RI.

       The following macros are structurally translated: .AE, .AF, .AL, .RL, .APP, .APPSK, .AS,
       .AT, .AU, .B1, .B2, .BE, .BL, .ML, .BS, .BVL, .VL, .DE, .DL .DS, .FE, .FS, .H, .HU, .IA,
       .IE, .IND, .LB, .LC, .LE, .LI, .P, .RF, .SM, .TL, .VERBOFF, .VERBON, .WA, .WE.

       The following macros are ignored:

        .)E, .1C, .2C, .AST, .AV, .AVL, .COVER, .COVEND, .EF, .EH, .EDP, .EPIC, .FC, .FD, .HC,
       .HM, .GETR, .GETST, .HM, .INITI, .INITR, .INDP, .ISODATE, .MT, .NS, .ND, .OF, .OH, .OP,
       .PGFORM, .PGNH, .PE, .PF, .PH, .RP, .S, .SA, .SP, .SG, .SK, .TAB, .TB, .TC, .VM, .WC.

       The following macros generate warnings: .EC, .EX, .FG, .GETHN, .GETPN, .GETR, .GETST, .LT,
       .LD, .LO, .MOVE, .MULB, .MULN, .MULE, .NCOL, .nP, .PIC, .RD, .RS, .RE, .SETR

        .BS/.BE and .IA/.IE pairs are passed through. The text inside them may need to be deleted
       or moved.

       The mark argument of .ML is ignored; the following list id formatted as a normal
       ItemizedList.

       The contents of .DS/.DE or .DF/.DE gets turned into a Screen display. Arguments
       controlling presentation-level formatting are ignored.

   Mwww Translation
       The mwww macros are an extension to the man macros supported by groff(1) for producing web
       pages.

       The URL, FTP, MAILTO, FTP, IMAGE, TAG tags are translated structurally. The HTMLINDEX,
       BODYCOLOR, BACKGROUND, HTML, and LINE tags are ignored.

   TBL Translation
       All structural features of TBL tables are translated, including both horizontal and
       vertical spanning with ‘s’ and ‘^’. The ‘l’, ‘r’, and ‘c’ formats are supported; the ‘n’
       column format is rendered as ‘r’. Line continuations with T{ and T} are handled correctly.
       So is .TH.

       The expand, box, doublebox, allbox, center, left, and right options are supported. The GNU
       synonyms frame and doubleframe are also recognized. But the distinction between single and
       double rules and boxes is lost.

       Table continuations (.T&) are not supported.

       If the first nonempty line of text immediately before a table is boldfaced, it is
       interpreted as a title for the table and the table is generated using a table and title.
       Otherwise the table is translated with informaltable.

       Most other presentation-level TBL commands are ignored. The ‘b’ format qualifier is
       processed, but point size and width qualifiers are not.

   Pic Translation
       PIC sections are translated to SVG.  doclifter calls out to pic2plot(1) to accomplish
       this; you must have that utility installed for PIC translation to work.

   Eqn Translation
       EQN sections are filtered into embedded MathML with eqn -TMathML if possible, otherwise
       passed through enclosed in LiteralLayout tags. After a delim statement has been seen,
       inline eqn delimiters are translated into an XML processing instruction. Exception: inline
       eqn equations consisting of a single character are translated to an Emphasis with a Role
       attribute of eqn.

   Troff Translation
       The troff translation is meant only to support interpretation of the macro sets. It is not
       useful standalone.

       The .nf and .fi macros are interpreted as literal-layout boundaries. Calls to the .so
       macro either cause inclusion or are translated into XML entity inclusions (see above).
       Calls to the .ul and .cu macros cause following lines to be wrapped in an Emphasis tag
       with a Remap attribute of "U". Calls to .ft generate corresponding start or end emphasis
       tags. Calls to .tr cause character translation on output. Calls to .bp generate a
       BeginPage tag (in paragraphed text only). Calls to .sp generate a paragraph break (in
       paragraphed text only). Calls to .ti wrap the following line in a BlockQuote These are the
       only troff requests we translate to DocBook. The rest of the troff emulation exists
       because macro packages use it internally to expand macros into elements that might be
       structural.

       Requests relating to macro definitions and strings (.ds, .as, .de, .am, .rm, .rn, .em) are
       processed and expanded. The .ig macro is also processed.

       Conditional macros (.if, .ie, .el) are handled. The built-in conditions o, n, t, e, and c
       are evaluated as if for nroff on page one of a document. The m, d, and r troff
       conditionals are also interpreted. String comparisons are evaluated by straight textual
       comparison. All numeric expressions evaluate to true.

       The extended groff requests cc, c2, ab, als, do, nop, and return and shift are
       interpreted. Its .PSPIC extension is translated into a MediaObject.

       The .tm macro writes its arguments to standard error (with -t). The .pm macro reports on
       defined macros and strings. These facilities may aid in debugging your translation.

       Some troff escape sequences are lifted:

        1. The \e and \\ escapes become a bare backslash, \. a period, and \- a bare dash.

        2. The troff escapes \^, \`, \' \&, \0, and \| are lifted to equivalent ISO special
           spacing characters.

        3. A \ followed by space is translated to an ISO non-breaking space entity.

        4. A \~ is also translated to an ISO non-breaking space entity; properly this should be a
           space that can't be used for a linebreak but stretches like ordinary whitepace during
           line adjustment, but there is no ISO or Unicode entity for that.

        5. The \u and \d half-line motion vertical motion escapes, when paired, become
           Superscript or Subscript tags.

        6. The \c escape is handled as a line continuation. in circumstances where that matters
           (e.g. for token-pasting).

        7. The \f escape for font changes is translated in various context-dependent ways. First,
           doclifter looks for cliches involving font changes that have semantic meaning, and
           lifts to a structural tag. If it can't do that, it generates an Emphasis tag.

        8. The \m[] extension is translated into a phrase span with a remap attribute carrying
           the color. Note: Stylesheets typically won't render this!

        9. Some uses of the \o request are translated: pairs with a letter followed by one of the
           characters ` ' : ^ o ~ are translated to combining forms with diacriticals acute,
           grave, umlaut, circumflex, ring, and tilde respectively if the corresponding Latin-1
           or Latin-2 character exists as an ISO literal.

       Other escapes than these will yield warnings or errors.

       All other troff requests are ignored but passed through into XML comments. A few (such as
       .ce) also trigger a warning message.

PORTABILITY CHECKING

       When portability checking is enabled, doclifter emits portability warnings about markup
       which it can handle but which will break various other viewers and interpreters.

        1. At level 1, it will warn about constructions that would break man2html(1), (the C
           program distributed with Linux man(1), not the older and much less capable Perl
           script). A close derivative of this code is used in GNOME yelp. This should be the
           minimum level of portability you aim for, and corresponds to what is recommended on
           the groff_man(7) manual page.

        2. At level 2, it will warn about constructions that will break portability back to the
           Unix classic tools (including long macro names and glyph references with \[]).

SEMANTIC ANALYSIS

       doclifter keeps two lists of semantic hints that it picks up from analyzing source
       documents (especially from parsing command and function synopses). The local list
       includes:

       •   Names of function formal arguments

       •   Names of command options

       Local hints are used to mark up the individual page from which they are gathered. The
       global list includes:

       •   Names of functions

       •   Names of commands

       •   Names of function return types

       If doclifter is applied to multiple files, the global list is retained in memory. You can
       dump a report of global hints at the end of the run with the -h option. The format of the
       hints is as follows:

            .\" | mark <phrase> as <markup>

       where <phrase> is an item of text and <markup> is the DocBook markup text it should be
       wrapped with whenever it appeared either highlighted or as a word surrounded by whitespace
       in the source text.

       Hints derived from earlier files are also applied to later ones. This behavior may be
       useful when lifting collections of documents that apply to a function or command library.
       What should be more useful is the fact that a hints file dumped with -h can be one of the
       file arguments to doclifter; the code detects this special case and does not write XML
       output for such a file. Thus, a good procedure for lifting a large library is to generate
       a hints file with a first run, inspect it to delete false positives, and use it as the
       first input to a second run.

       It is also possible to include a hints file directly in a troff sourcefile. This may be
       useful if you want to enrich the file by stages before converting to XML.

TROUBLESHOOTING

       doclifter tries to warn about problems that it can can diagnose but not fix by itself.
       When it says "look for FIXME", do that in the generated XML; the markup around that token
       may be wrong.

       Occasionally (less than 2% of the time) doclifter will produce invalid DocBook markup even
       from correct troff markup. Usually this results from strange constructions in the source
       page, or macro calls that are beyond the ability of doclifter's macro processor to get
       right. Here are some things to watch for, and how to fix them:

   Malformed command synopses.
       If you get a message that says "command synopsis parse failed", try rewriting the synopsis
       in your manual page source. The most common cause of failure is unbalanced [] groupings, a
       bug that can be very difficult to notice by eyeball. To assist with this, the error
       message includes a token number in parentheses indicating on which token the parse failed.

       For more information, use the -v option. This will trigger a dump telling you what the
       command synopsis looked like after preprocessing, and indicate on which token the parse
       failed (both with a token number and a caret sign inserted in the dump of the synopsis
       tokens). Try rewriting the synopsis in your manual page source. The most common cause of
       failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball.
       To assist with this, the error token dump tries to insert ‘$’ at the point of the last
       nesting-depth increase, but the code that does this is failure-prone.

   Confusing macro calls.
       Some manual page authors replace standard requests (like .PP, .SH and .TP) with versions
       that do different things in nroff and troff environments. While doclifter tries to cope
       and usually does a good job, the quirks of [nt]roff are legion and confusing macro calls
       sometimes lead to bad XML being generated. A common symptom of such problems is unclosed
       Emphasis tags.

   Malformed list syntax.
       The manual-page parser can be confused by .TP constructs that have header tags but no
       following body. If the XML produced doesn't validate, and the problem seems to be a
       misplaced listitem tag, try using the verbose (-v) option. This will enable line-numbered
       warnings that may help you zero in on the problem.

   Section nesting problems with SS.
       The message "possible section nesting error" means that the program has seen two adjacent
       subsection headers. In man pages, subsections don't have a depth argument, so doclifter
       cannot be certain how subsections should be nested. Any subsection heading between the
       indicated line and the beginning of the next top-level section might be wrong and require
       correcting by hand.

   Bad output with no doclifter error message
       If you're translating a page that uses user-defined macros, and doclifter fails to
       complain about it but you get bad output, the first thing to do is simplify or eliminate
       the user-defined macros. Replace them with stock requests where possible.

IMPROVING TRANSLATION QUALITY

       There are a few constructions that are a good idea to check by hand after lifting a page.

       Look near the BlockQuote tags. The troff temporary indent request (.ti) is translated into
       a BlockQuote wrapper around the following line. Sometimes LiteralLayout or ProgramListing
       would be a better translation, but doclifter has no way to know this.

       It is not possible to unambiguously detect candidates for wrapping in a DocBook option tag
       in running text. If you care, you'll have to check for these and fix them by hand.

BUGS AND LIMITATIONS

       About 3% of man pages will either make this program throw error status 1 or generate
       invalid XML. In almost all such cases the misbehavior is triggered by markup bugs in the
       source that are too severe to be coped with.

       Equation number arguments of EQN calls are ignored.

       The function-synopsis parser is crude (it's not a compiler) and prone to errors.
       Function-synopsis markup should be checked carefully by a human.

       If a man page has both paragraphed text in a Synopsis section and also a body section
       before the Synopis section, bad things will happen.

       Running text (e.g., explanatory notes) at the end of a Synopsis section cannot reliably be
       distinguished from synopsis-syntax markup. (This problem is AI-complete.)

       Some firewalls put in to cope with common malformations in troff code mean that the tail
       end of a span between two \f{B,I,U,(CW} or .ft highlight changes may not be completely
       covered by corresponding Emphasis macros if (for example) the span crosses a boundary
       between filled and unfilled (.nf/.fi) text.

       The treatment of conditionals relies on the assumption that conditional macros never
       generate structural or font-highlight markup that differs between the if and else
       branches. This appears to be true of all the standard macro packages, but if you roll any
       of your own macros you're on your own.

       Macro definitions in a manual page NAME section are not interpreted.

       Uses of \c for line continuation sometimes are not translated, leaving the \c in the
       output XML. The program will print a warning when this occurs.

       It is not possible to unambiguously detect candidates for wrapping in a DocBook option tag
       in running text. If you care, you'll have to check for these and fix them by hand.

       The line numbers in doclifter error messages are unreliable in the presence of .EQ/.EN,
       .PS/.PE, and quantum fluctuations.

OLD MACRO SETS

       There is a conflict between Berkeley ms's documented .P1 print-header-on-page request and
       an undocumented Bell Labs use for displayed program and equation listings. The ms
       translator uses the Bell Labs interpretation when .P2 is present in the document, and
       otherwise ignores the request.

RETURN VALUES

       On successful completion, the program returns status 0. It returns 1 if some file or
       standard input could not be translated. It returns 2 if one of the input sources was a .so
       inclusion. It returns 3 if there is an error in reading or writing files. It returns 4 to
       indicate an internal error. It returns 5 when aborted by a keyboard interrupt.

       Note that a zero return does not guarantee that the output is valid DocBook. It will
       almost always (as in, more than 98% of cases) be syntactically valid XML, but in some rare
       cases fixups by hand may be necessary to meet the semantics of the DocBook DTD. Validation
       problems are most likely to occur with complicated list markup.

REQUIREMENTS

       The pic2plot(1) utility must be installed in order to translate PIC diagrams to SVG.

SEE ALSO

       man(7), mdoc(7), ms(7), me(7), mm(7), mwww(7), troff(1).

AUTHOR

       Eric S. Raymond esr@thyrsus.com

       There is a project web page at http://www.catb.org/~esr/doclifter/.