Provided by: txt2pdbdoc_1.4.4-6_i386 bug

html2pdbtxt(1)                                                  html2pdbtxt(1)

NAME

       html2pdbtxt - HTML to Doc Text converter for Palm Pilots

SYNOPSIS

       html2pdbtxt [ -bchars ] [ -ttitle ] [ -uURL ] file.html [ file.txt ]
       html2pdbtxt -v

DESCRIPTION

       html2pdbtxt  converts  HTML to text suitable for conversion to a Doc(4)
       file via txt2pdbdoc(1).  If no text filename is  given,  the  generated
       text is sent to standard output.

   HTML Tags
       The following HTML tags (and corresponding ending tags) are recognized:
       ADDRESS, A NAME, BLOCKQUOTE, BR, CENTER, DIV, DL, DT, H1, H2,  H3,  H4,
       H5,  H6,  OL,  OPTION, PRE, P, SELECT, SCRIPT, STYLE, TABLE, TITLE, UL.
       In  all  cases,  the  most  ``reasonable''  thing  is  done  given  the
       constraints  of the Doc(4) format which is essentially plain text.  ALT
       attributes (typically found in IMG tags) have their text extracted  and
       placed between brackets [like this].  All other HTML tags are stripped.

   Character Entities
       Both  HTML  character  and  numeric  (decimal  and  hexadecimal) entity
       references are converted to their  byte  value  according  to  the  ISO
       8859-1  (Latin  1)  character set so they appear properly on the Pilot.
       For example, ``résumé'' becomes  ``resume''  with  accented
       letter 'e's.

   Document Title
       Unless  specified  with  the  -t  option,  the HTML file is scanned for
       <TITLE> ... </TITLE> tags and, if found, the title is extracted and put
       on line 1 of the generated file.

   Bookmarks
       Bookmarks  are  placed  into the generated file wherever <A NAME="...">
       tags are found in the HTML file.

OPTIONS

       -bchars   Specify the character  sequence  that  is  to  serve  as  the
                 bookmark indicator.  The default is (*).  (See the CAVEATS.)

       -ttitle   Specify the title of the document that is to appear on line 1
                 of the generated file overriding any title found  inside  the
                 HTML file between <TITLE> ... </TITLE> tags.

       -uurl     Specify the URL the HTML file supposedly came from and put it
                 on the line after the title, if any, in the generated file.

       -v        Print the version number to standard output and exit.

EXAMPLE

       To convert an HTML file to Doc:

            html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
            txt2pdbdoc "`head -1 alice.txt`" alice.txt alice.pdb

CAVEATS

       1.  Some Doc readers have a ``feature'' whereby, during  the  scan  for
           bookmarks phase, they recognize the bookmark sequence of characters
           anywhere in the text and not just at the beginning of a line.

       2.  Some Doc readers do not allow the bookmark sequence to contain  the
           >  character  since  they interpret that as the sequence delimiter,
           e.g., <->> will be interpreted as the sequence being merely -.

       3.  Ordered lists (via the OL tag) are treated as unordered lists (like
           the  UL  tag) because it would greatly complicate the code since it
           would have to be parsed  rather  than  simple  substitutions  being
           performed.

SEE ALSO

       pdbtxt2html(1), txt2pdbdoc(1), doc(4), pdb(4)

       International   Standards   Organization.   ``ISO  8859-1:  Information
       Processing -- 8-bit single-byte coded graphic character sets -- Part 1:
       Latin alphabet No. 1.''  1987.

       World  Wide  Web  Consortium.   ``Character  entity  references in HTML
       4.0.''  HTML 4.0 Specification, http://www.w3.org/

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

html2pdbtxt                    January 21, 2005                 html2pdbtxt(1)