Provided by: txt2pdbdoc_1.4.4-8_amd64 bug

html2pdbtxt(1)                               General Commands Manual                              html2pdbtxt(1)

NAME

       html2pdbtxt - HTML to Doc Text converter for Palm Pilots

SYNOPSIS

       html2pdbtxt [ -bchars ] [ -ttitle ] [ -uURL ] file.html [ file.txt ]
       html2pdbtxt -v

DESCRIPTION

       html2pdbtxt converts HTML to text suitable for conversion to a Doc(4) file via txt2pdbdoc(1).  If no text
       filename is given, the generated text is sent to standard output.

   HTML Tags
       The following HTML tags (and corresponding ending tags) are recognized: ADDRESS, A NAME, BLOCKQUOTE,  BR,
       CENTER, DIV, DL, DT, H1, H2, H3, H4, H5, H6, OL, OPTION, PRE, P, SELECT, SCRIPT, STYLE, TABLE, TITLE, UL.
       In all cases, the most ``reasonable'' thing is done given the constraints of the Doc(4) format  which  is
       essentially  plain  text.   ALT  attributes  (typically  found in IMG tags) have their text extracted and
       placed between brackets [like this].  All other HTML tags are stripped.

   Character Entities
       Both HTML character and numeric (decimal and hexadecimal) entity references are converted to  their  byte
       value  according  to  the  ISO  8859-1 (Latin 1) character set so they appear properly on the Pilot.  For
       example, ``résumé'' becomes ``resume'' with accented letter 'e's.

   Document Title
       Unless specified with the -t option, the HTML file is scanned for  <TITLE>  ...  </TITLE>  tags  and,  if
       found, the title is extracted and put on line 1 of the generated file.

   Bookmarks
       Bookmarks are placed into the generated file wherever <A NAME="..."> tags are found in the HTML file.

OPTIONS

       -bchars   Specify the character sequence that is to serve as the bookmark indicator.  The default is (*).
                 (See the CAVEATS.)

       -ttitle   Specify the title of the document that is to appear on line 1 of the generated file  overriding
                 any title found inside the HTML file between <TITLE> ... </TITLE> tags.

       -uurl     Specify  the  URL the HTML file supposedly came from and put it on the line after the title, if
                 any, in the generated file.

       -v        Print the version number to standard output and exit.

EXAMPLE

       To convert an HTML file to Doc:

            html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
            txt2pdbdoc "`head -1 alice.txt`" alice.txt alice.pdb

CAVEATS

       1.  Some Doc readers have a ``feature'' whereby, during the scan for bookmarks phase, they recognize  the
           bookmark sequence of characters anywhere in the text and not just at the beginning of a line.

       2.  Some  Doc  readers do not allow the bookmark sequence to contain the > character since they interpret
           that as the sequence delimiter, e.g., <->> will be interpreted as the sequence being merely -.

       3.  Ordered lists (via the OL tag) are treated as unordered lists (like the  UL  tag)  because  it  would
           greatly  complicate  the code since it would have to be parsed rather than simple substitutions being
           performed.

SEE ALSO

       pdbtxt2html(1), txt2pdbdoc(1), doc(4), pdb(4)

       International Standards Organization.  ``ISO 8859-1: Information Processing --  8-bit  single-byte  coded
       graphic character sets -- Part 1: Latin alphabet No. 1.''  1987.

       World  Wide  Web  Consortium.   ``Character  entity  references  in  HTML 4.0.''  HTML 4.0 Specification,
       http://www.w3.org/

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

html2pdbtxt                                     January 21, 2005                                  html2pdbtxt(1)