Ubuntu Manpage: HTML::StripScripts - Strip scripting constructs out of HTML

Provided by: libhtml-stripscripts-perl_1.05-1_all

NAME

       HTML::StripScripts - Strip scripting constructs out of HTML

SYNOPSIS

         use HTML::StripScripts;

         my $hss = HTML::StripScripts->new({ Context => 'Inline' });

         $hss->input_start_document;

         $hss->input_start('<i>');
         $hss->input_text('hello, world!');
         $hss->input_end('</i>');

         $hss->input_end_document;

         print $hss->filtered_document;

DESCRIPTION

       This module strips scripting constructs out of HTML, leaving as much non-scripting markup in place as
       possible.  This allows web applications to display HTML originating from an untrusted source without
       introducing XSS (cross site scripting) vulnerabilities.

       You will probably use HTML::StripScripts::Parser rather than using this module directly.

       The process is based on whitelists of tags, attributes and attribute values.  This approach is the most
       secure against disguised scripting constructs hidden in malicious HTML documents.

       As well as removing scripting constructs, this module ensures that there is a matching end for each start
       tag, and that the tags are properly nested.

       Previously, in order to customise the output, you needed to subclass "HTML::StripScripts" and override
       methods.  Now, most customisation can be done through the "Rules" option provided to "new()". (See
       examples/declaration/ and examples/tags/ for cases where subclassing is necessary.)

       The HTML document must be parsed into start tags, end tags and text before it can be filtered by this
       module.  Use either HTML::StripScripts::Parser or HTML::StripScripts::Regex instead if you want to input
       an unparsed HTML document.

       See examples/direct/ for an example of how to feed tokens directly to
        HTML::StripScripts.

CONSTRUCTORS

       new ( CONFIG )
           Creates  a  new  "HTML::StripScripts"  filter  object,  bound  to  a particular filtering policy.  If
           present, the CONFIG parameter must be a hashref.  The following  keys  are  recognized  (unrecognized
           keys will be silently ignored).

               $s = HTML::Stripscripts->new({
                   Context         => 'Document|Flow|Inline|NoTags',
                   BanList         => [qw( br img )] | {br => '1', img => '1'},
                   BanAllBut       => [qw(p div span)],
                   AllowSrc        => 0|1,
                   AllowHref       => 0|1,
                   AllowRelURL     => 0|1,
                   AllowMailto     => 0|1,
                   EscapeFiltered  => 0|1,
                   Rules           => { See below for details },
               });

           "Context"
               A string specifying the context in which the filtered document will be used.  This influences the
               set of tags that will be allowed.

               If present, the "Context" value must be one of:

               "Document"
                   If  "Context"  is  "Document"  then the filter will allow a full HTML document, including the
                   "HTML" tag and "HEAD" and "BODY" sections.

               "Flow"
                   If "Context" is "Flow" then most of the cosmetic tags that one would  expect  to  find  in  a
                   document body are allowed, including lists and tables but not including forms.

               "Inline"
                   If "Context" is "Inline" then only inline tags such as "B" and "FONT" are allowed.

               "NoTags"
                   If "Context" is "NoTags" then no tags are allowed.

               The default "Context" value is "Flow".

           "BanList"
               If present, this option must be an arrayref or a hashref.  Any tag that would normally be allowed
               (because  it  presents no XSS hazard) will be blocked if the lowercase name of the tag is in this
               list.

               For example, in a guestbook application where "HR" tags are used to separate posts, you may  wish
               to prevent posts from including "HR" tags, even though "HR" is not an XSS risk.

           "BanAllBut"
               If  present,  this  option  must  be reference to an array holding a list of lowercase tag names.
               This has the effect of adding all but the listed tags to the ban list, so that  only  those  tags
               listed will be allowed.

           "AllowSrc"
               By  default,  the  filter  won't  allow  constructs  that  cause  the  browser  to  fetch  things
               automatically, such as "SRC" attributes in "IMG" tags.  If this option is present and  true  then
               those constructs will be allowed.

           "AllowHref"
               By  default, the filter won't allow constructs that cause the browser to fetch things if the user
               clicks on something, such as the "HREF" attribute in "A" tags.  Set this option to a  true  value
               to allow this type of construct.

           "AllowRelURL"
               By  default,  the  filter  won't  allow  relative  URLs such as "../foo.html" in "SRC" and "HREF"
               attribute values.  Set this option to a true value to allow them. "AllowHref" and / or "AllowSrc"
               also need to be set to true for this to have any effect.

           "AllowMailto"
               By default, "mailto:" links are not allowed. If "AllowMailto" is set to a true value,  then  this
               construct will be allowed. This can be enabled separately from AllowHref.

           "EscapeFiltered"
               By default, any filtered tags are outputted as "<!--filtered-->". If "EscapeFiltered" is set to a
               true value, then the filtered tags are converted to HTML entities.

               For instance:

                 <br>  -->  &lt;br&gt;

           "Rules"
               The "Rules" option provides a very flexible way of customising the filter.

               The  focus  is  safety-first,  so it is applied after all of the previous validation.  This means
               that you cannot all malicious data should already have been cleared.

               Rules can be specified for tags and for attributes. Any tag or attribute  not  explicitly  listed
               will be handled by the default "*" rules.

               The following is a synopsis of all of the options that you can use to configure rules.  Below, an
               example is broken into sections and explained.

                Rules => {

                    tag => 0 | 1 | sub { tag_callback }
                           | {
                               attr      => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                               '*'       => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                               required  => [qw(attrname attrname)],
                               tag       => sub { tag_callback }
                             },

                   '*' => 0 | 1 | sub { tag_callback }
                          | {
                              attr => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                              '*'  => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                              tag  => sub { tag_callback }
                            }

                   }

               EXAMPLE:

                   Rules => {

                       ##########################
                       ##### EXPLICIT RULES #####
                       ##########################

                       ## Allow <br> tags, reject <img> tags
                       br          => 1,
                       img         => 0,

                       ## Send all <div> tags to a sub
                       div         => sub { tag_callback },

                       ## Allow <blockquote> tags,and allow the 'cite' attribute
                       ## All other attributes are handled by the default C<*>
                       blockquote  => {
                           cite    => 1,
                       },

                       ## Allow <a> tags, and
                       a  => {

                           ## Allow the 'title' attribute
                           title     => 1,

                           ## Allow the 'href' attribute if it matches the regex
                           href    =>   '^http://yourdomain.com'
                      OR   href    => qr{^http://yourdomain.com},

                           ## 'style' attributes are handled by a sub
                           style     => sub { attr_callback },

                           ## All other attributes are rejected
                           '*'       => 0,

                           ## Additionally, the <a> tag should be handled by this sub
                           tag       => sub { tag_callback},

                           ## If the <a> tag doesn't have these attributes, filter the tag
                           required  => [qw(href title)],

                       },

                       ##########################
                       ##### DEFAULT RULES #####
                       ##########################

                       ## The default '*' rule - accepts all the same options as above.
                       ## If a tag or attribute is not mentioned above, then the default
                       ## rule is applied:

                       ## Reject all tags
                       '*'         => 0,

                       ## Allow all tags and all attributes
                       '*'         => 1,

                       ## Send all tags to the sub
                       '*'         => sub { tag_callback },

                       ## Allow all tags, reject all attributes
                       '*'         => { '*'  => 0 },

                       ## Allow all tags, and
                       '*' => {

                           ## Allow the 'title' attribute
                           title   => 1,

                           ## Allow the 'href' attribute if it matches the regex
                           href    =>   '^http://yourdomain.com'
                      OR   href    => qr{^http://yourdomain.com},

                           ## 'style' attributes are handled by a sub
                           style   => sub { attr_callback },

                           ## All other attributes are rejected
                           '*'     => 0,

                           ## Additionally, all tags should be handled by this sub
                           tag     => sub { tag_callback},

                       },

               Tag Callbacks
                       sub tag_callback {
                           my ($filter,$element) = (@_);

                           $element = {
                               tag      => 'tag',
                               content  => 'inner_html',
                               attr     => {
                                   attr_name => 'attr_value',
                               }
                           };
                           return 0 | 1;
                       }

                   A  tag  callback  accepts  two  parameters, the $filter object and the C$element>.  It should
                   return 0 to completely ignore the tag and its content (which includes any nested HTML  tags),
                   or 1 to accept and output the tag.

                   The $element is a hash ref containing the keys:

               "tag"
                   This  is  the tagname in lowercase, eg "a", "br", "img". If you set the tag value to an empty
                   string, then the tag will not be outputted, but the tag contents will.

               "content"
                   This is the equivalent of DOM's innerHTML. It contains the text content  and  any  HTML  tags
                   contained  within  this  element.  You can change the content or set it to an empty string so
                   that it is not outputted.

               "attr"
                   "attr" contains a hashref containing the attribute names and values

               If for instance, you wanted to replace "<b>" tags with "<span>" tags, you could do this:

                   sub b_callback {
                       my ($filter,$element)   = @_;
                       $element->{tag}         = 'span';
                       $element->{attr}{style} = 'font-weight:bold';
                       return 1;
                   }

           Attribute Callbacks
                   sub attr_callback {
                       my ( $filter, $tag, $attr_name, $attr_val ) = @_;
                       return undef | '' | 'value';
                   }

               Attribute callbacks accept four parameters, the $filter object, the $tag name, the $attr_name and
               the $attr_value.

               It should return either "undef" to reject the attribute, or the value to be used. An empty string
               keeps the attribute, but without a value.

           "BanList" vs "BanAllBut" vs "Rules"
               It is not necessary to use "BanList" or "BanAllBut" - everything can be done via "Rules", however
               it may be simpler to write:

                   BanAllBut => [qw(p div span)]

               The logic works as follows:

                  * If BanAllBut exists, then ban everything but the tags in the list
                  * Add to the ban list any elements in BanList
                  * Any tags mentioned explicitly in Rules (eg a => 0, br => 1)
                    are added or removed from the BanList
                  * A default rule of { '*' => 0 } would ban all tags except
                    those mentioned in Rules
                  * A default rule of { '*' => 1 } would allow all tags except
                    those disallowed in the ban list, or by explicit rules

METHODS

       This class provides the following methods:

       hss_init ()
           This method is called by new() and does the actual initialisation work for the new HTML::StripScripts
           object.

       input_start_document ()
           This method initializes the filter, and must be called once before starting on each HTML document  to
           be filtered.

       input_start ( TEXT )
           Handles a start tag from the input document.  TEXT must be the full text of the tag, including angle-
           brackets.

       input_end ( TEXT )
           Handles  an  end  tag  from the input document.  TEXT must be the full text of the end tag, including
           angle-brackets.

       input_text ( TEXT )
           Handles some non-tag text from the input document.

       input_process ( TEXT )
           Handles a processing instruction from the input document.

       input_comment ( TEXT )
           Handles an HTML comment from the input document.

       input_declaration ( TEXT )
           Handles an declaration from the input document.

       input_end_document ()
           Call this method to signal the end of the input document.

       filtered_document ()
           Returns the filtered document as a string.

SUBCLASSING

       The only reason for subclassing this module now is to add to the list of accepted  tags,  attributes  and
       styles (See "WHITELIST INITIALIZATION METHODS").  Everything else can be achieved with "Rules".

       The "HTML::StripScripts" class is subclassable.  Filter objects are plain hashes and "HTML::StripScripts"
       reserves  only  hash keys that start with "_hss".  The filter configuration can be set up by invoking the
       hss_init() method, which takes the same arguments as new().

OUTPUT METHODS

       The filter outputs a stream of  start  tags,  end  tags,  text,  comments,  declarations  and  processing
       instructions,  via  the  following  "output_*"  methods.   Subclasses may override these to intercept the
       filter output.

       The default implementations of the "output_*" methods pass the text  on  to  the  output()  method.   The
       default implementation of the output() method appends the text to a string, which can be fetched with the
       filtered_document() method once processing is complete.

       If  the  output()  method  or  the  individual  "output_*"  methods  are  overridden  in a subclass, then
       filtered_document() will not work in that subclass.

       output_start_document ()
           This method gets called once at the start of each HTML  document  passed  through  the  filter.   The
           default implementation does nothing.

       output_end_document ()
           This method gets called once at the end of each HTML document passed through the filter.  The default
           implementation does nothing.

       output_start ( TEXT )
           This method is used to output a filtered start tag.

       output_end ( TEXT )
           This method is used to output a filtered end tag.

       output_text ( TEXT )
           This method is used to output some filtered non-tag text.

       output_declaration ( TEXT )
           This method is used to output a filtered declaration.

       output_comment ( TEXT )
           This method is used to output a filtered HTML comment.

       output_process ( TEXT )
           This method is used to output a filtered processing instruction.

       output ( TEXT )
           This  method is invoked by all of the default "output_*" methods.  The default implementation appends
           the text to the string that the filtered_document() method will return.

       output_stack_entry ( TEXT )
           This method is invoked when a tag plus all text and nested HTML  content  within  the  tag  has  been
           processed. It adds the tag plus its content to the content for its parent tag.

REJECT METHODS

       When  the  filter encounters something in the input document which it cannot transform into an acceptable
       construct, it invokes one of the following "reject_*" methods to put something in the output document  to
       take the place of the unacceptable construct.

       The TEXT parameter is the full text of the unacceptable construct.

       The  default  implementations  of these methods output an HTML comment containing the text "filtered". If
       "EscapeFiltered" is set to true, then the rejected text is HTML escaped instead.

       Subclasses may override these methods, but should exercise caution.  The  TEXT  parameter  is  unfiltered
       input and may contain malicious constructs.

       reject_start ( TEXT )
       reject_end ( TEXT )
       reject_text ( TEXT )
       reject_declaration ( TEXT )
       reject_comment ( TEXT )
       reject_process ( TEXT )

WHITELIST INITIALIZATION METHODS

       The  filter  refers  to various whitelists to determine which constructs are acceptable.  To modify these
       whitelists, subclasses can override the following methods.

       Each method is called once at object initialization time, and must return a reference to  a  nested  data
       structure.   These  references are installed into the object, and used whenever the filter needs to refer
       to a whitelist.

       The default implementations of these methods can be invoked as class methods.

       See examples/tags/ and examples/declaration/ for examples of how to override these methods.

       init_context_whitelist ()
           Returns a reference to the "Context" whitelist, which determines which tags may appear at each  point
           in the document, and which other tags may be nested within them.

           It is a hash, and the keys are context names, such as "Flow" and "Inline".

           The  values  in  the hash are hashrefs.  The keys in these subhashes are lowercase tag names, and the
           values are context names, specifying the context that the tag  provides  to  any  other  tags  nested
           within it.

           The  special context "EMPTY" as a value in a subhash indicates that nothing can be nested within that
           tag.

       init_attrib_whitelist ()
           Returns a reference to the "Attrib" whitelist, which determines which attributes each  tag  can  have
           and the values that those attributes can take.

           It is a hash, and the keys are lowercase tag names.

           The  values in the hash are hashrefs.  The keys in these subhashes are lowercase attribute names, and
           the values are attribute value class names, which are short strings describing  the  type  of  values
           that the attribute can take, such as "color" or "number".

       init_attval_whitelist ()
           Returns  a reference to the "AttVal" whitelist, which is a hash that maps attribute value class names
           from the "Attrib" whitelist to coderefs to subs to validate (and optionally transform)  a  particular
           attribute value.

           The filter calls the attribute value validation subs with the following parameters:

           "filter"
               A reference to the filter object.

           "tagname"
               The lowercase name of the tag in which the attribute appears.

           "attrname"
               The name of the attribute.

           "attrval"
               The attribute value found in the input document, in canonical form (see "CANONICAL FORM").

           The validation sub can return undef to indicate that the attribute should be removed from the tag, or
           it can return the new value for the attribute, in canonical form.

       init_style_whitelist ()
           Returns  a  reference  to  the  "Style"  whitelist,  which  determines which CSS style directives are
           permitted  in  "style"  tag  attributes.   The  keys  are   value   names   such   as   "color"   and
           "background-color", and the values are class names to be used as keys into the "AttVal" whitelist.

       init_deinter_whitelist
           Returns  a reference to the "DeInter" whitelist, which determines which inline tags the filter should
           attempt to automatically de-interleave if they are encountered interleaved.  For example, the  filter
           will transform:

             <b>hello <i>world</b> !</i>

           Into:

             <b>hello <i>world</i></b><i> !</i>

           because both "b" and "i" appear as keys in the "DeInter" whitelist.

CHARACTER DATA PROCESSING

These methods transform attribute values and non-tag text from the input document into canonical form
(see "CANONICAL FORM"), and transform text in canonical form into a suitable form for the output
document.

text_to_canonical_form ( TEXT )
This method is used to reduce non-tag text from the input document to canonical form before passing
it to the filter_text() method.

The default implementation unescapes all entities that map to "US-ASCII" characters other than
ampersand, and replaces any ampersands that don't form part of valid entities with "&amp;".

quoted_to_canonical_form ( VALUE )
This method is used to reduce attribute values quoted with doublequotes or singlequotes to canonical
form before passing it to the handler subs in the "AttVal" whitelist.

The default behavior is the same as that of "text_to_canonical_form()", plus it converts any CR, LF
or TAB characters to spaces.

unquoted_to_canonical_form ( VALUE )
This method is used to reduce attribute values without quotes to canonical form before passing it to
the handler subs in the "AttVal" whitelist.

The default implementation simply replaces all ampersands with "&amp;", since that corresponds with
the way most browsers treat entities in unquoted values.

canonical_form_to_text ( TEXT )
This method is used to convert the text in canonical form returned by the filter_text() method to a
form suitable for inclusion in the output document.

The default implementation runs anything that doesn't look like a valid entity through the
escape_html_metachars() method.

canonical_form_to_attval ( ATTVAL )
This method is used to convert the text in canonical form returned by the "AttVal" handler subs to a
form suitable for inclusion in doublequotes in the output tag.

The default implementation converts CR, LF and TAB characters to a single space, and runs anything
that doesn't look like a valid entity through the escape_html_metachars() method.

validate_href_attribute ( TEXT )
If the "AllowHref" filter configuration option is set, then this method is used to validate "href"
type attribute values. TEXT is the attribute value in canonical form. Returns a possibly modified
attribute value (in canonical form) or "undef" to reject the attribute.

The default implementation allows only absolute "http" and "https" URLs, permits port numbers and
query strings, and imposes reasonable length limits.

It does not URI escape the query string, and it does not guarantee properly formatted URIs, it just
tries to give safe URIs. You can always use an attribute callback (see "Attribute Callbacks") to
provide stricter handling.

validate_mailto ( TEXT )
If the "AllowMailto" filter configuration option is set, then this method is used to validate "href"
type attribute values which begin with "mailto:". TEXT is the attribute value in canonical form.
Returns a possibly modified attribute value (in canonical form) or "undef" to reject the attribute.

This uses a lightweight regex and does not guarantee that email addresses are properly formatted. You
can always use an attribute callback (see "Attribute Callbacks") to provide stricter handling.

validate_src_attribute ( TEXT )
If the "AllowSrc" filter configuration option is set, then this method is used to validate "src" type
attribute values. TEXT is the attribute value in canonical form. Returns a possibly modified
attribute value (in canonical form) or "undef" to reject the attribute.

The default implementation behaves as validate_href_attribute().

OTHER METHODS TO OVERRIDE

As well as the output, reject, init and cdata methods listed above, it might make sense for subclasses to
override the following methods:

filter_text ( TEXT )
This method will be invoked to filter blocks of non-tag text in the input document. Both input and
output are in canonical form, see "CANONICAL FORM".

The default implementation does no filtering.

escape_html_metachars ( TEXT )
This method is used to escape all HTML metacharacters in TEXT. The return value must be a copy of
TEXT with metacharacters escaped.

The default implementation escapes a minimal set of metacharacters for security against XSS
vulnerabilities. The set of characters to escape is a compromise between the need for security and
the need to ensure that the filter will work for documents in as many different character sets as
possible.

Subclasses which make strong assumptions about the document character set will be able to escape much
more aggressively.

strip_nonprintable ( TEXT )
Returns a copy of TEXT with runs of nonprintable characters replaced with spaces or some other
harmless string. Avoids replacing anything with the empty string, as that can lead to other security
issues.

The default implementation strips out only NULL characters, in order to avoid scrambling text for as
many different character sets as possible.

Subclasses which make some sort of assumption about the character set in use will be able to have a
much wider definition of a nonprintable character, and hence a more secure strip_nonprintable()
implementation.

ATTRIBUTE VALUE HANDLER SUBS

References to the following subs appear in the "AttVal" whitelist returned by the init_attval_whitelist()
method.

_hss_attval_style( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value hander for the "style" attribute.

_hss_attval_size ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values are some sort of size or length.

_hss_attval_number ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values are a simple integer.

_hss_attval_color ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for color attributes.

_hss_attval_text ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for text attributes.

_hss_attval_word ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values must consist of a single short word, with minus
characters permitted.

_hss_attval_wordlist ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values must consist of one or more words, separated by
spaces and/or commas.

_hss_attval_wordlistq ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values must consist of one or more words, separated by
commas, with optional doublequotes around words and spaces allowed within the doublequotes.

_hss_attval_href ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for "href" type attributes. If the "AllowHref" or "AllowMailto"
configuration options are set, uses the validate_href_attribute() method to check the attribute
value.

_hss_attval_src ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for "src" type attributes. If the "AllowSrc" configuration option is set,
uses the validate_src_attribute() method to check the attribute value.

_hss_attval_stylesrc ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for "src" type style pseudo attributes.

_hss_attval_novalue ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes that have no value or a value that is ignored. Just returns
the attribute name as the value.

CANONICAL FORM

       Many  of  the  methods  described  above  deal  with text from the input document, encoded in what I call
       "canonical form", defined as follows:

       All characters other than ampersands represent themselves.  Literal ampersands are  encoded  as  "&amp;".
       Non  "US-ASCII" characters may appear as literals in whatever character set is in use, or they may appear
       as named or numeric HTML entities such as "&aelig;", "&#31337;" and  "&#xFF;".   Unknown  named  entities
       such as "&foo;" may appear.

       The  idea  is  to  be  able  to  be  able to reduce input text to a minimal form, without making too many
       assumptions about the character set in use.

PRIVATE METHODS

The following methods are internal to this class, and should not be invoked from elsewhere. Subclasses
should not use or override these methods.

_hss_prepare_ban_list (CFG)
Returns a hash ref representing all the banned tags, based on the values of BanList and BanAllBut

_hss_prepare_rules (CFG)
Returns a hash ref representing the tag and attribute rules (See "Rules").

Returns undef if no filters are specified, in which case the attribute filter code has very little
performance impact. If any rules are specified, then every tag and attribute is checked.

_hss_get_attr_filter ( DEFAULT_FILTERS TAG_FILTERS ATTR_NAME)
Returns the attribute filter rule to apply to this particular attribute.

Checks for:

- a named attribute rule in a named tag
- a default * attribute rule in a named tag
- a named attribute rule in the default * rules
- a default * attribute rule in the default * rules

_hss_join_attribs (FILTERED_ATTRIBS)
Accepts a hash ref containing the attribute names as the keys, and the attribute values as the
values. Escapes them and returns a string ready for output to HTML

_hss_decode_numeric ( NUMERIC )
Returns the string that should replace the numeric entity NUMERIC in the text_to_canonical_form()
method.

_hss_tag_is_banned ( TAGNAME )
Returns true if the lower case tag name TAGNAME is on the list of harmless tags that the filter is
configured to block, false otherwise.

_hss_get_to_valid_context ( TAG )
Tries to get the filter to a context in which the tag TAG is allowed, by introducing extra end tags
or start tags if necessary. TAG can be either the lower case name of a tag or the string 'CDATA'.

Returns 1 if an allowed context is reached, or 0 if there's no reasonable way to get to an allowed
context and the tag should just be rejected.

_hss_close_innermost_tag ()
Closes the innermost open tag.

_hss_context ()
Returns the current named context of the filter.

_hss_valid_in_context ( TAG, CONTEXT )
Returns true if the lowercase tag name TAG is valid in context CONTEXT, false otherwise.

_hss_valid_in_current_context ( TAG )
Returns true if the lowercase tag name TAG is valid in the filter's current context, false otherwise.

BUGS AND LIMITATIONS

       Performance
           This  module  does  a  lot  of  work  to ensure that tags are correctly nested and are not left open,
           causing unnecessary overhead for applications where that doesn't matter.

           Such applications may benefit from using the  more  lightweight  HTML::Scrubber::StripScripts  module
           instead.

       Strictness
           URIs  and  email  addresses are cleaned up to be safe, but not necessarily accurate.  That would have
           required adding dependencies.  Attribute callbacks can be used to add this functionality if required,
           or the validation methods can be overriden.

           By default, filtered HTML may not be valid strict XHTML, for instance empty required  attributes  may
           be outputted.  However, with "Rules", it should be possible to force the HTML to validate.

       REPORTING BUGS
           Please  report  any bugs or feature requests to bug-html-stripscripts@rt.cpan.org, or through the web
           interface at <http://rt.cpan.org>.

AUTHOR

       Original author Nick Cleaton <nick@cleaton.net>

       New code added and module maintained by Clinton Gormley <clint@traveljury.com>

COPYRIGHT

       Copyright (C) 2003 Nick Cleaton.  All Rights Reserved.

       Copyright (C) 2007 Clinton Gormley.  All Rights Reserved.

LICENSE

       This module is free software; you can redistribute it and/or modify it  under  the  same  terms  as  Perl
       itself.

perl v5.10.0                                       2009-11-05                                  StripScripts(3pm)