Provided by: libhtml-prettyprinter-perl_0.03-5_all bug

NAME

        HTML::PrettyPrinter - generate nice HTML files from HTML syntax trees

SYNOPSIS

         use HTML::TreeBuilder;
         # generate a HTML syntax tree
         my $tree = new HTML::TreeBuilder;
         $tree->parse_file($file_name);
         # modify the tree if you want

         use HTML::PrettyPrinter;
         my $hpp = new HTML::PrettyPrinter ('linelength' => 130,
                                            'quote_attr' => 1);
         # configure
         $tree->address("0.1.0")->attr(_hpp_indent,0);    # for an individual element
         $hpp->set_force_nl(1,qw(body head));             # for tags
         $hpp->set_force_nl(1,qw(@SECTIONS));             # as above
         $hpp->set_nl_inside(0,'default!');               # for all tags

         # format the source
         my $linearray_ref = $hpp->format($tree);
         print @$linearray_ref;

         # alternative: print directly to filehandle
         use FileHandle;
         my $fh = new FileHandel ">$filenaem2";
         if (defined $fh) {
           $hpp->select($fh);
           $hpp->format();
           undef $fh;
           $hpp->select(undef),
         }

DESCRIPTION

       HTML::PrettyPrinter produces nicely formatted HTML code from a HTML syntax tree. It is
       especially useful if the produced HTML file shall be read or edited manually afterwards.
       Various parameters let you adapt the output to different styles and requirements.

       If you don't care how the HTML source looks like as long as it is valid and readable by
       browsers, you should use the as_HTML() method of HTML::Element instead of the pretty
       printer. It is about five times faster.

       The pretty printer will handle line wrapping, indention and structuring by the way the
       whitespace in the tree is represented in the output.  Furthermore upper/lowercase markup
       and markup minimization, quoting of attribute values, the encoding of entities and the
       presence of optional end tags are configurable.

       There are two types of parameters to influence the output, individual parameters that are
       set on a per element and per tag basis and common parameters that are set only once for
       each instance of a pretty printer.

       In order to faciliate the configuration a mechanism to handle tag groups is provided.
       Thus, it is possible to modify a parameter for a group of tags (e.g. all known block
       elements) without writing each tag name explicitly.  Perhaps the code for tag groups will
       move to another Perl module in the future.

       For HTML::Elements that require a special treatment like <PRE>, <XMP>, <SCRIPT>, comments
       and declarations, pretty printer will fall back to the method "as_HTML()" of the HTML
       elements.

INDIVIDUAL PARAMETERS

       Following individual parameters exist

       indent n
           The indent of new lines inside the element is increased by n coloumns. Default is 2
           for all tags.

       skip bool
           If true, the element and its content is skipped from output.  Default is false.

       nl_before n
           Number of newlines before the start tag. Default is 0 for inline elements and 1 for
           other elements.

       nl_inside n
           Number of newlines between the tags and the contents of an element.  Default is 0.

       nl_after n
           Number of newlines after an element. Default is 0 for inline elements and 1 for other
           elements.

       force_nl bool
           Force linebreaks before and after an element even if the HTML tree does not contain
           whitespace at this place. Default is false for inline elements and true for all other
           elements. This parameter is superseded if the common parameter allow_forced_nl is set
           to false.

       endtag bool
           Print an optional endtag. Default is true.

   Access Methods
       Following access methods exist for each individual paramenter.  Replace parameter by the
       respective name.

       $hpp->parameter($element)
           Takes a reference to an HTML element as argument. Returns the value of the parameter
           for that element. The priority to retrieve the value is:

           1.  The value of the element's internal attribute "_hpp_parameter".

           2.  The value specified inside the pretty printer for the tag of the element.

           3.  The value specified inside the pretty printer for 'default!'.

       $hpp->parameter('tag')
           Like "parameter($element)", except that only priorities 2 and 3 are evaluated.

       $hpp->set_parameter($value,'tag1','tag2',...)
           Sets the parameter for each tag in the list to $value.

           If $value is undefined, the entries for the tags are deleted.

           Beside individual tags the list may include tag groups like '@BLOCK' (see below) and
           '"default!"'. Individual tag names are written in lower case, the names of tag groups
           start with an '@' and are written in upper case letters. Tag groups are expanded
           during the call of "set_parameter()".  '"default!"' sets the default value, which is
           retrived if no value is defined for the individual element or tag.

       $hpp->set_parameter($value,'all!')
           Deletes all existing settings for parameter inside the pretty printer and sets the
           default to $value..

COMMON PARAMETERS

       tabify n
           If non zero, each n spaces at the beginnig of a line are converted into one TAB.
           Default is 8.

       linelength n
           The maximum number of character a line should have. Default is 80.

           The linelength may be exceeded if there is no proper way to break a line without
           modifying the content, e.g. inside <PRE> and other special elements or if there is no
           whitespace.

       min_bool_attr bool
           Minimize boolean attributes, e.g. print <UL COMPACT> instead of <UL COMPACT=COMPACT>.
           Default is true.

       quote_attr bool
           Always quote attribute values. If false, attribute values consisting entirely of
           letters, digits, periods and hyphens only are not put into quotes. Default is false.

       entities string
           The string contains all characters that are escaped to their entity names.  Default is
           the bare minimum of "&<>" plus the non breaking space 'nbsp' (because otherwise it is
           difficult for the human eye to distiguish it from a normal space in most editors).

       wrap_at_tagend NEVER|AFTER_ATTR|ALWAYS
           May pretty printer wrap lines before the closing ankle of a start tag?  Supported
           values are the predifined constants NEVER (allow line wraps at white space only ),
           AFTER_ATTR (allow line wraps at the end of tags that contain attributes only) and
           ALWAYS (allow line wraps at the end of every start tag). Default is AFTER_ATTR.

       allow_forced_nl bool
           Allow the addition of white space, that is not in the HTML tree.  If set to false (the
           default) the force_nl parameter is ignored.  It is recomended to set this parameter to
           true if the HTML tree was generated with ignore_ignorable_whitespace set to true.

       uppercase bool
           Use uppercase letters for markup. Default is the value of $HTML::Element::html_uc at
           the time the constructor is called.

   Access Method
       $hpp->paramter([value])
           Retrieves and optionaly sets the parameter.

OTHER METHODS

       $hpp = HTML::PrettyPrinter->new(%common_paremeters)
           This class method creates a new HTML::PrettyPrinter and returns it.  Key/value pair
           arguments may be provided to overwrite the default settings of common parameters.
           There is currently no mechanism to overwrite the default values for individual
           parameters at construction. Use the "$hpp-"set_parameter()> methods instead.

       $hpp->select($fh)
           Select a FileHandle object for output.

           If a FileHandle is selected the generated HTML is printed directly to that file. With
           $hpp->select(undef) you can switch back to the default behaviour.

       $line_array_ref = $hpp->format($tree,[$indent],[$line_array_ref])
           Format the HTML syntax (sub-) tree.

           $tree is not restricted to the root of the HTML syntax tree. A reference to any
           HTML::Element will do.

           The optional $indent indents the first element by n characters

           Return value is the reference to an array with the generated lines.  If such a
           reference is provided as third argument, the lines will be appended to that array.
           Otherwise a new array will be created.

           If a FileHandle is selected by a previous call of the "$hpp-"select($fh)> method, the
           lines are printed to the FileHandle object directly.  The array of lines is not
           changed in this case.

TAG GROUPS

       Tag groups are lists that contain the names of tags and other tag groups which are
       considered as subsets. This reflects the way allowed content is specified in HTML DTDs,
       where e.g. %flow consists of all %block and %inline elements and %inline covers several
       subsets like %phrase.

       If you add a tag name to a group A, it will be seen in any group that contains group A.
       Thus, it is easy to maintain groups of tags with similar properties. (and configure HTML
       pretty printer for these tags).

       The names of tag groups are written in upper case letters with a leading '@' (e.g.
       '@BLOCK'). The names of simple tags are written all lower case.

   Functions
       All the functions to handle and modify tag groups are included in the @EXPORT_OK list of
       "HTML::PrettyPrinter".

       @tag_groups = list_groups()
           Returns a list with the names of all defined tag groups

       @tags = group_expand('tag_or_tag_group0',['tag_or_tag_group1',...])
           Returns a list of every tag in the tag groups and their subgroups Each tag is listed
           once only. The order of the list is not specified.

       @tag_groups = sub_group('tag_group0',['tag_group1',...])
           Returns a list of every tag group and sub group in the list.  Each group is listed
           once only. The order of the list is not specified.

       group_get('@NAME')
           Return the (unexpanded) contents of a tag group.

       "group_set('@NAME',['tag_or_tag_group0',...])"
           Set a tag group.

       "group_add('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
           Add tags and tag groups to a group.

       "group_remove('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
           Remove tags or tag groups from a group. Subgroups are not expanded.  Thus,
           "group_remove('@A','@B')" will remove '@B' from '@A' if it is included directly. Tags
           included in '@B' will not be removed from '@A'.  Nor will '@A' be changed if '@B' is
           included in a aubgroup of '@A' but not in '@A' directly.

   Predefined Tag Groups
       There are a couple of predefined tag groups. Use "  foreach my $tg (list_groups()) {
           print�"'$tg'�=>�qw(".join(',',group_get($tg)).")\n";
         } " to get a list.

   Examples for tag groups
       1. create some groups
           "
             group_set('@A',qw(a1 a2 a3));
             group_set('@B',qw(b1 b2));
             group_set('@C',qw(@A @B c1 @D));
             # @D needs to be defined when @C is expannded
             group_set('@D',qw(d1 @B));
             group_set('@E',qw(e1 @D));
             group_set('@F',qw(f1 @A)); "

       2. add tags
           "
             group_add('@A',qw(a4 a5)); # @A contains (a1 a2 a3 a4 a5)
             group_add('@D',qw(d1));    # @D contains (d1 @B d1)
             group_add('@F',group_exapand('@B'),'@F');
             # @F contains (f1 @A b1 b2 f1 @F) "

       3. evaluate
           "
             group_exapand('@E');    # returns e1, d1, b1, b2
             sub_groups('@E');       # returns @B, @D
             sub_groups(qw(@E @F));  # returns @A, @B, @D
             group_get('@F'));       # returns f1, @A, b1, b2, f1, @F "

       4. remove tags
           "
             group_remove('@E','@C');  # @E not changed, because it doesn't contain @C
             group_remove('@E','@D');  # @D removed from @E
             group_remove('@D','d1');  # all d1's are removed. Now @D contains @B only
             group_remove('@C','@B');  # @C now contains (@a c1 @D), Thus
             sub_groups('@C');         # still returns @A, @B, @D,
                                       # because @B is included in @D, too "

       5. application
           "
             # set the indent for tags b1, b2, e1, g1 to 0
             $hpp->set_indent(0,qw(@D @E g1)); "

           If the groups @D or @E are modified afterwards, the configuration of the pretty
           printer is not affected, because "set_indent()" will expand the tag groups.

EXAMPLE

       Consider the following HTML tree

           <html> @0
             <head> @0.0
               <title> @0.0.0
                 "Demonstrate HTML::PrettyPrinter"
             <body> @0.1
               <h1> @0.1.0
                 "Headline"
               <p align="JUSTIFY"> @0.1.1
                 "Some text in "
                 <b> @0.1.1.1
                   "bold"
                 " and "
                 <i> @0.1.1.3
                   "italics"
                 " and with '�' & '�'."
               <table align="LEFT" border=0> @0.1.2
                 <tr> @0.1.2.0
                   <td align="RIGHT"> @0.1.2.0.0
                     "top right"
                 <tr> @0.1.2.1
                   <td align="LEFT"> @0.1.2.1.0
                     "bottom left"
               <hr noshade="NOSHADE" size=5> @0.1.3
               <address> @0.1.4
                 <a href="mailto:schotten@gmx.de"> @0.1.4.0
                   "Claus�Schotten"

       and "
         $hpp = HTML::PrettyPrinter-"new('uppercase' => 1);
         print @{$hpp->format($tree)}; >

       will print

         <HTML><HEAD><TITLE>Demonstrate
               HTML::PrettyPrinter</TITLE></HEAD><BODY><H1>Headline</H1><P
               ALIGN=JUSTIFY>Some text in <B>bold</B> and
               <I>italics</I> and with '�' &amp; '�'.</P><TABLE
               ALIGN=LEFT BORDER=0><TR><TD ALIGN=RIGHT>top
                   right</TD></TR><TR><TD ALIGN=LEFT>bottom
                   left</TD></TR></TABLE><HR NOSHADE SIZE=5
               ><ADDRESS><A HREF="mailto:schotten@gmx.de"
                 >Claus&nbsp;Schotten</A></ADDRESS></BODY></HTML>

       That doesn't look very nice. What went wrong? By default HTML::PrettyPrinter takes a
       conservative approach on whitespace. It will enlarge existing whitespace, but it will not
       introduce new whitespace outside of tags, because that might change the way a browser
       renders the HTML document. However the HTML tree was constructed with
       ""ignore_ignorable_whitespace> turned on.  Thus, there is no whitespace between block
       elements that the pretty printer could format. So pretty printer does line wrapping and
       indention only.  E.g. the title is in the third level of the tree. Thus, the second line
       is indented six characters. The table cells in the fifth level are indented by ten
       characters. Furthermore, you see that there is a whitespace inserted after the last
       attribute of the <A> tag.

       Let's set $hpp->allow_forced_nl(1);. Now the forced_nl parameters are enabled. By default,
       they are set for all non-inline tags. That creates

        <HTML>
          <HEAD>
            <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
          </HEAD>
          <BODY>
            <H1>Headline</H1>
            <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
              <I>italics</I> and with '�' &amp; '�'.</P>
            <TABLE ALIGN=LEFT BORDER=0>
              <TR>
                <TD ALIGN=RIGHT>top right</TD>
              </TR>
              <TR>
                <TD ALIGN=LEFT>bottom left</TD>
              </TR>
            </TABLE>
            <HR NOSHADE SIZE=5>
            <ADDRESS><A HREF="mailto:schotten@gmx.de"
                >Claus&nbsp;Schotten</A></ADDRESS>
          </BODY>
        </HTML>

       Much better, isn't it? Now let's improve the structuring.
         $hpp->set_nl_before(2,qw(body table));
         $hpp->set_nl_after(2,qw(table)); will require two new lines in front of <body> and
       <table> tags and after <table> tags.

        <HTML>
          <HEAD>
            <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
          </HEAD>

          <BODY>
            <H1>Headline</H1>
            <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
              <I>italics</I> and with '�' &amp; '�'.</P>

            <TABLE ALIGN=LEFT BORDER=0>
              <TR>
                <TD ALIGN=RIGHT>top right</TD>
              </TR>
              <TR>
                <TD ALIGN=LEFT>bottom left</TD>
              </TR>
            </TABLE>

            <HR NOSHADE SIZE=5>
            <ADDRESS><A HREF="mailto:schotten@gmx.de"
                >Claus&nbsp;Schotten</A></ADDRESS>
          </BODY>
        </HTML>

       Currently the mail address is the only attribute value which is quoted.  Here the quotes
       are required by the '@' character. For all other attribute values quotes are optional and
       thus ommited by default. $hpp->quote_attr(1); will turn the quotes on.

       $hpp->set_endtag(0,'all!') turns all optional endtags off.  This affects the </p> (and
       should affect </tr> and </td>, see below).  Alternatively, we could use
       $hpp->set_endtag(0,'default!'). That would turn the default off, too. But it wouldn't
       delete settings for individual tags that supersede the default.

       $hpp->set_nl_after(3,'head') requires three new lines after the <head> element. Because
       there are already two new lines required by the start of <body> only one additional line
       is added.

       $hpp->set_force_nl(0,'td') will inhibit the introduction of whitespace alround <td>. Thus,
       the table cells are now on the same line as the table rows.

         <HTML>
           <HEAD>
             <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
           </HEAD>

           <BODY>
             <H1>Headline</H1>
             <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
               <I>italics</I> and with '�' &amp; '�'.

             <TABLE ALIGN="LEFT" BORDER="0">
               <TR><TD ALIGN="RIGHT">top right</TD></TR>
               <TR><TD ALIGN="LEFT">bottom left</TD></TR>
             </TABLE>

             <HR NOSHADE SIZE="5">
             <ADDRESS><A HREF="mailto:schotten@gmx.de"
                 >Claus&nbsp;Schotten</A></ADDRESS>
           </BODY>
         </HTML>

       The end tags </td> and </tr> are printed because HTML:Tagset says they are mandatory.
         map {$HTML::Tagset::optionalEndTag{$_}=1} qw(td tr th); will fix that.

       The additional new line after </head> doesn't look nice. With
       $hpp->set_nl_after(undef,'head') we will reset the parameter for the <head> tag.

       $hpp->entities($hpp->entities().'�'); will enforce the entity encoding of '�'.

       $hpp->min_bool_attr(0); will inhibt the minimizyation of the NOSHADE attribute to <hr>.

       Let's fiddle with the indention:
         $hpp->set_indent(8,'@TEXTBLOCK');
         $hpp->set_indent(0,'html');

       New lines inside text blocks (here inside <h1>, <p> and <address>) will be indented by 8
       characters instead of two, whereas the code directly under <html> will not be indented.

        <HTML>
        <HEAD>
          <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
        </HEAD>

        <BODY>
          <H1>Headline</H1>
          <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
                  <I>italics</I> and with '&auml;' &amp; '�'.

          <TABLE ALIGN="LEFT" BORDER="0">
            <TR><TD ALIGN="RIGHT">top right
            <TR><TD ALIGN="LEFT">bottom left
          </TABLE>

          <HR NOSHADE="NOSHADE" SIZE="5">
          <ADDRESS><A HREF="mailto:schotten@gmx.de"
                    >Claus&nbsp;Schotten</A></ADDRESS>
        </BODY>
        </HTML>

       $hpp->wrap_at_tagend(HTML::PrettyPrinter::NEVER); will disable the line wrap between the
       attribute and the '>' of the <a> tag. The resulting line excedes the target line length by
       far, but the is no point left, where the pretty printer could legaly break this line.

       $hpp->set_endtag(1,'tr') will overwrite the default. Thus, the </tr> appears in the code
       whereas the other optional endtags are still omitted.

       Finally, we customize some individual elements:

       "$tree-"address('0.1.1')->attr('_hpp_skip',1)>
           will skip the <p> and its content from the output

       "$tree-"address('0.1.2.1.0')->attr('_hpp_force_nl',1)>
           will force new lines arround the second <td>, but will not affect the first.  <td>.

        <HTML>
        <HEAD>
          <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
        </HEAD>

        <BODY>
          <H1>Headline</H1>

          <TABLE ALIGN="LEFT" BORDER="0">
            <TR><TD ALIGN="RIGHT">top right</TR>
            <TR>
              <TD ALIGN="LEFT">bottom left
            </TR>
          </TABLE>

          <HR NOSHADE="NOSHADE" SIZE="5">
          <ADDRESS><A
                    HREF="mailto:schotten@gmx.de">Claus&nbsp;Schotten</A></ADDRESS>
        </BODY>
        </HTML>

KNOWN BUGS

       •   This is early alpha code. The interfaces are subject to changes.

       •   The module is tested with perl 5.005_03 only. It should work with perl 5.004 though.

       •   The predefined tag groups are incomplete. Several tags need to be added.

       •   Attribute values from a fixed set given in the DTD (e.g. ALIGN=LEFT|RIGHT etc.) should
           be converted to upper or lower case depending on the value of the uppercase parameter.
           Currently, they are printed as given in the HTML tree.

       •   No optimization for performance was done.

SEE ALSO

       HTML::TreeBuilder, HTML::Element, HTML::Tagset

COPYRIGHT

       Copyright 2000 Claus Schotten  schotten@gmx.de

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.

AUTHOR

       Claus Schotten <schotten@gmx.de>