Ubuntu Manpage: xsxp - eXtremely Simple Xml Parser

NAME

       xsxp - eXtremely Simple Xml Parser

SYNOPSIS

       package require Tcl  8.4

       package require xsxp  1

       package require xml

       xsxp::parse xml

       xsxp::fetch pxml path ?part?

       xsxp::fetchall pxml_list path ?part?

       xsxp::only pxml tagname

       xsxp::prettyprint pxml ?chan?

________________________________________________________________________________________________________________

DESCRIPTION

       This  package provides a simple interface to parse XML into a pure-value list.  It also provides accessor
       routines to pull out specific subtags, not unlike DOM access.  This package was written for and  is  used
       by Darren New's Amazon S3 access package.

       This is pretty lame, but I needed something like this for S3, and at the time, TclDOM would not work with
       the new 8.5 Tcl due to version number problems.

       In addition, this is a pure-value implementation. There is no garbage to clean  up  in  the  event  of  a
       thrown  error, for example.  This simplifies the code for sufficiently small XML documents, which is what
       Amazon's S3 guarantees.

       Copyright 2006 Darren New. All Rights Reserved.  NO WARRANTIES OF ANY TYPE ARE PROVIDED.  COPYING OR  USE
       INDEMNIFIES  THE  AUTHOR IN ALL WAYS.  This software is licensed under essentially the same terms as Tcl.
       See LICENSE.txt for the terms.

COMMANDS

       The package implements five rather simple procedures.  One parses, one is for  debugging,  and  the  rest
       pull various parts of the parsed document out for processing.

       xsxp::parse xml
              This parses an XML document (using the standard xml tcllib module in a SAX sort of way) and builds
              a data structure which it returns if the parsing succeeded. The return value is referred to herein
              as a "pxml", or "parsed xml". The list consists of two or more elements:

              •      The first element is the name of the tag.

              •      The  second  element  is  an  array-get  formatted  list  of  key/value pairs. The keys are
                     attribute names and the values are attribute values. This is an empty list if there are  no
                     attributes on the tag.

              •      The  third  through  end  elements  are  the  children  of the node, if any. Each child is,
                     recursively, a pxml.

              •      Note that if the zero'th element, i.e. the tag name, is "%PCDATA", then the attributes will
                     be  empty  and  the  third  element  will  be  the  text of the element. In addition, if an
                     element's contents consists only of PCDATA, it will have only one child, and all the PCDATA
                     will  be  concatenated. In other words, this parser works poorly for XML with elements that
                     contain both child tags and PCDATA.  Since Amazon S3 does not do this (and for that  matter
                     most  uses of XML where XML is a poor choice don't do this), this is probably not a serious
                     limitation.

       xsxp::fetch pxml path ?part?
              pxml is a parsed XML, as returned from xsxp::parse.  path is a list of  element  tag  names.  Each
              element  is  the  name  of a child to look up, optionally followed by a hash ("#") and a string of
              digits. An empty list or an initial empty element selects pxml. If no hash sign  is  present,  the
              behavior  is  as  if  "#0"  had been appended to that element. (In addition to a list, slashes can
              separate subparts where convenient.)

              An element of path scans the children at the indicated level for the  n'th  instance  of  a  child
              whose  tag  matches  the  part  of  the  element before the hash sign. If an element is simply "#"
              followed by digits, that indexed child is selected, regardless of the tags in the children. Hence,
              an element of "#3" will always select the fourth child of the node under consideration.

              part defaults to "%ALL". It can be one of the following case-sensitive terms:

              %ALL   returns the entire selected element.

              %TAGNAME
                     returns lindex 0 of the selected element.

              %ATTRIBUTES
                     returns index 1 of the selected element.

              %CHILDREN
                     returns lrange 2 through end of the selected element, resulting in a list of elements being
                     returned.

              %PCDATA
                     returns a concatenation of all the bodies of direct children of  this  node  whose  tag  is
                     %PCDATA.   It  throws  an  error if no such children are found. That is, part=%PCDATA means
                     return the textual content found in that node but not its children nodes.

              %PCDATA?
                     is like %PCDATA, but returns an empty string if no PCDATA is found.

       For example, to fetch the first bold text from the fifth paragraph of the body of your HTML file,

              xsxp::fetch $pxml {body p#4 b} %PCDATA

       xsxp::fetchall pxml_list path ?part?
              This iterates over each PXML in pxml_list (which must be a list of pxmls) selecting the  indicated
              path from it, building a new list with the selected data, and returning that new list.

              For example, pxml_list might be the %CHILDREN of a particular element, and the path and part might
              select from each child a sub-element in which we're interested.

       xsxp::only pxml tagname
              This iterates over the direct children of pxml and selects only those with tagname as  their  tag.
              Returns a list of matching elements.

       xsxp::prettyprint pxml ?chan?
              This outputs to chan (default stdout) a pretty-printed version of pxml.

BUGS, IDEAS, FEEDBACK

       This  document,  and  the package it describes, will undoubtedly contain bugs and other problems.  Please
       report such in the category amazon-s3  of  the  Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].
       Please also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note  further  that  attachments  are strongly preferred over inlined patches. Attachments can be made by
       going to the Edit form of the ticket immediately after its creation, and then using the left-most  button
       in the secondary navigation bar.

KEYWORDS

       dom, parser, xml

COPYRIGHT

       2006 Darren New. All Rights Reserved.

NAME

SYNOPSIS

DESCRIPTION

COMMANDS

BUGS, IDEAS, FEEDBACK

KEYWORDS

CATEGORY

COPYRIGHT