Ubuntu Manpage: HTML::TreeBuilder::XPath - add XPath support to HTML::TreeBuilder

name
synopsis
description
methods
see also
repository
author
copyright and license

trusty (3) HTML::TreeBuilder::XPath.3pm.gz

Provided by: libhtml-treebuilder-xpath-perl_0.14-1_all

NAME

       HTML::TreeBuilder::XPath - add XPath support to HTML::TreeBuilder

SYNOPSIS

         use HTML::TreeBuilder::XPath;
         my $tree= HTML::TreeBuilder::XPath->new;
         $tree->parse_file( "mypage.html");
         my $nb=$tree->findvalue( '/html/body//p[@class="section_title"]/span[@class="nb"]');
         my $id=$tree->findvalue( '/html/body//p[@class="section_title"]/@id');

         my $p= $html->findnodes( '//p[@id="toto"]')->[0];
         my $link_texts= $p->findvalue( './a'); # the texts of all a elements in $p
         $tree->delete; # to avoid memory leaks, if you parse many HTML documents

DESCRIPTION

       This module adds typical XPath methods to HTML::TreeBuilder, to make it easy to query a document.

METHODS

       Extra methods added both to the tree object and to each element:

   findnodes ($path)
       Returns a list of nodes found by $path.  In scalar context returns an "Tree::XPathEngine::NodeSet"
       object.

   findnodes_as_string ($path)
       Returns the text values of the nodes, as one string.

   findnodes_as_strings ($path)
       Returns a list of the values of the result nodes.

   findvalue ($path)
       Returns either a "Tree::XPathEngine::Literal", a "Tree::XPathEngine::Boolean" or a
       "Tree::XPathEngine::Number" object. If the path returns a NodeSet, $nodeset->xpath_to_literal is called
       automatically for you (and thus a "Tree::XPathEngine::Literal" is returned). Note that for each of the
       objects stringification is overloaded, so you can just print the value found, or manipulate it in the
       ways you would a normal perl value (e.g. using regular expressions).

   findvalues ($path)
       Returns the values of the matching nodes as a list. This is mostly the same as findnodes_as_strings,
       except that the elements of the list are objects (with overloaded stringification) instead of plain
       strings.

   exists ($path)
       Returns true if the given path exists.

   matches($path)
       Returns true if the element matches the path.

   find ($path)
       The find function takes an XPath expression (a string) and returns either a Tree::XPathEngine::NodeSet
       object containing the nodes it found (or empty if no nodes matched the path), or one of
       XML::XPathEngine::Literal (a string), XML::XPathEngine::Number, or XML::XPathEngine::Boolean. It should
       always return something - and you can use ->isa() to find out what it returned. If you need to check how
       many nodes it found you should check $nodeset->size.  See XML::XPathEngine::NodeSet.

   as_XML_compact
       HTML::TreeBuilder's "as_XML" output is not really nice to look at, so I added a new method, that can be
       used as a simple replacement for it.  It escapes only the '<', '>' and '&' (plus '"' in attribute
       values), and wraps CDATA elements in CDATA sections.

       Note that the XML is actually not garanteed to be valid at this point. Nothing is done about the encoding
       of the string. Patches or just ideas of how it could work are welcome.

   as_XML_indented
       Same as as_XML, except that the output is indented.

REPOSITORY

       https://github.com/mirod/HTML--TreeBuilder--XPath <https://github.com/mirod/HTML--TreeBuilder--XPath>

AUTHOR

       Michel Rodriguez, <mirod@cpan.org>

COPYRIGHT AND LICENSE

       Copyright (C) 2006-2011 by Michel Rodriguez

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl
       itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

NAME

SYNOPSIS

DESCRIPTION

METHODS

SEE ALSO

REPOSITORY

AUTHOR

COPYRIGHT AND LICENSE