Provided by: txt2pdbdoc_1.4.4-8_amd64 bug

html2pdbtxt(1)                       General Commands Manual                       html2pdbtxt(1)

NAME

       html2pdbtxt - HTML to Doc Text converter for Palm Pilots

SYNOPSIS

       html2pdbtxt [ -bchars ] [ -ttitle ] [ -uURL ] file.html [ file.txt ]
       html2pdbtxt -v

DESCRIPTION

       html2pdbtxt  converts  HTML  to  text  suitable  for  conversion  to  a  Doc(4)  file  via
       txt2pdbdoc(1).  If no text filename is given, the  generated  text  is  sent  to  standard
       output.

   HTML Tags
       The  following  HTML tags (and corresponding ending tags) are recognized: ADDRESS, A NAME,
       BLOCKQUOTE, BR, CENTER, DIV, DL, DT, H1, H2, H3, H4, H5, H6, OL, OPTION, PRE,  P,  SELECT,
       SCRIPT,  STYLE,  TABLE,  TITLE,  UL.   In all cases, the most ``reasonable'' thing is done
       given the constraints  of  the  Doc(4)  format  which  is  essentially  plain  text.   ALT
       attributes  (typically  found  in  IMG  tags) have their text extracted and placed between
       brackets [like this].  All other HTML tags are stripped.

   Character Entities
       Both HTML character and numeric (decimal and hexadecimal) entity references are  converted
       to  their  byte  value  according to the ISO 8859-1 (Latin 1) character set so they appear
       properly on the  Pilot.   For  example,  ``résumé''  becomes  ``resume''  with
       accented letter 'e's.

   Document Title
       Unless  specified  with  the  -t option, the HTML file is scanned for <TITLE> ... </TITLE>
       tags and, if found, the title is extracted and put on line 1 of the generated file.

   Bookmarks
       Bookmarks are placed into the generated file wherever <A NAME="..."> tags are found in the
       HTML file.

OPTIONS

       -bchars   Specify  the character sequence that is to serve as the bookmark indicator.  The
                 default is (*).  (See the CAVEATS.)

       -ttitle   Specify the title of the document that is to appear on line 1 of  the  generated
                 file  overriding  any  title  found  inside  the  HTML  file between <TITLE> ...
                 </TITLE> tags.

       -uurl     Specify the URL the HTML file supposedly came from and put it on the line  after
                 the title, if any, in the generated file.

       -v        Print the version number to standard output and exit.

EXAMPLE

       To convert an HTML file to Doc:

            html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
            txt2pdbdoc "`head -1 alice.txt`" alice.txt alice.pdb

CAVEATS

       1.  Some Doc readers have a ``feature'' whereby, during the scan for bookmarks phase, they
           recognize the bookmark sequence of characters anywhere in the text and not just at the
           beginning of a line.

       2.  Some  Doc  readers do not allow the bookmark sequence to contain the > character since
           they interpret that as the sequence delimiter, e.g., <->> will be interpreted  as  the
           sequence being merely -.

       3.  Ordered  lists  (via  the  OL  tag)  are  treated as unordered lists (like the UL tag)
           because it would greatly complicate the code since it would have to be  parsed  rather
           than simple substitutions being performed.

SEE ALSO

       pdbtxt2html(1), txt2pdbdoc(1), doc(4), pdb(4)

       International  Standards  Organization.   ``ISO  8859-1:  Information  Processing -- 8-bit
       single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1.''  1987.

       World Wide Web Consortium.   ``Character  entity  references  in  HTML  4.0.''   HTML  4.0
       Specification, http://www.w3.org/

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

html2pdbtxt                              January 21, 2005                          html2pdbtxt(1)