Ubuntu Manpage: xsxp - eXtremely Simple Xml Parser

NAME

       xsxp - eXtremely Simple Xml Parser

SYNOPSIS

       package require Tcl  8.4

       package require xsxp  1

       package require xml

       xsxp::parse xml

       xsxp::fetch pxml path ?part?

       xsxp::fetchall pxml_list path ?part?

       xsxp::only pxml tagname

       xsxp::prettyprint pxml ?chan?

_________________________________________________________________________________________________

DESCRIPTION

       This  package  provides  a  simple interface to parse XML into a pure-value list.  It also
       provides accessor routines to pull out specific subtags,  not  unlike  DOM  access.   This
       package was written for and is used by Darren New's Amazon S3 access package.

       This  is  pretty  lame,  but  I needed something like this for S3, and at the time, TclDOM
       would not work with the new 8.5 Tcl due to version number problems.

       In addition, this is a pure-value implementation. There is no garbage to clean up  in  the
       event of a thrown error, for example.  This simplifies the code for sufficiently small XML
       documents, which is what Amazon's S3 guarantees.

       Copyright 2006 Darren New. All Rights Reserved.  NO WARRANTIES OF ANY TYPE  ARE  PROVIDED.
       COPYING  OR  USE  INDEMNIFIES  THE  AUTHOR  IN  ALL WAYS.  This software is licensed under
       essentially the same terms as Tcl. See LICENSE.txt for the terms.

COMMANDS

       The package implements five rather simple procedures.  One parses, one is  for  debugging,
       and the rest pull various parts of the parsed document out for processing.

       xsxp::parse xml
              This  parses an XML document (using the standard xml tcllib module in a SAX sort of
              way) and builds a data structure which it returns if  the  parsing  succeeded.  The
              return  value is referred to herein as a "pxml", or "parsed xml". The list consists
              of two or more elements:

              •      The first element is the name of the tag.

              •      The second element is an array-get formatted list of  key/value  pairs.  The
                     keys  are  attribute  names  and the values are attribute values. This is an
                     empty list if there are no attributes on the tag.

              •      The third through end elements are the children of the node,  if  any.  Each
                     child is, recursively, a pxml.

              •      Note  that if the zero'th element, i.e. the tag name, is "%PCDATA", then the
                     attributes will be empty and the third element  will  be  the  text  of  the
                     element.  In  addition, if an element's contents consists only of PCDATA, it
                     will have only one child, and all the PCDATA will be concatenated. In  other
                     words,  this  parser  works  poorly  for XML with elements that contain both
                     child tags and PCDATA.  Since Amazon S3 does  not  do  this  (and  for  that
                     matter  most  uses of XML where XML is a poor choice don't do this), this is
                     probably not a serious limitation.

       xsxp::fetch pxml path ?part?
              pxml is a parsed XML, as returned from xsxp::parse.  path is a list of element  tag
              names.  Each  element  is  the name of a child to look up, optionally followed by a
              hash ("#") and a string of digits. An  empty  list  or  an  initial  empty  element
              selects  pxml.  If  no  hash  sign  is present, the behavior is as if "#0" had been
              appended to that element. (In addition to a list,  slashes  can  separate  subparts
              where convenient.)

              An  element of path scans the children at the indicated level for the n'th instance
              of a child whose tag matches the part of the element before the hash  sign.  If  an
              element  is  simply  "#"   followed  by  digits,  that  indexed  child is selected,
              regardless of the tags in the children. Hence,  an  element  of  "#3"  will  always
              select the fourth child of the node under consideration.

              part defaults to "%ALL". It can be one of the following case-sensitive terms:

              %ALL   returns the entire selected element.

              %TAGNAME
                     returns lindex 0 of the selected element.

              %ATTRIBUTES
                     returns index 1 of the selected element.

              %CHILDREN
                     returns lrange 2 through end of the selected element, resulting in a list of
                     elements being returned.

              %PCDATA
                     returns a concatenation of all the bodies of direct children  of  this  node
                     whose  tag  is  %PCDATA.   It throws an error if no such children are found.
                     That is, part=%PCDATA means return the textual content found  in  that  node
                     but not its children nodes.

              %PCDATA?
                     is like %PCDATA, but returns an empty string if no PCDATA is found.

       For  example,  to  fetch  the first bold text from the fifth paragraph of the body of your
       HTML file,

              xsxp::fetch $pxml {body p#4 b} %PCDATA

       xsxp::fetchall pxml_list path ?part?
              This iterates over each PXML in pxml_list (which must be a list of pxmls) selecting
              the  indicated  path  from  it,  building  a  new  list with the selected data, and
              returning that new list.

              For example, pxml_list might be the %CHILDREN of a particular element, and the path
              and part might select from each child a sub-element in which we're interested.

       xsxp::only pxml tagname
              This  iterates over the direct children of pxml and selects only those with tagname
              as their tag. Returns a list of matching elements.

       xsxp::prettyprint pxml ?chan?
              This outputs to chan (default stdout) a pretty-printed version of pxml.

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes,  will  undoubtedly  contain  bugs  and  other
       problems.    Please  report  such  in  the  category  amazon-s3  of  the  Tcllib  Trackers
       [http://core.tcl.tk/tcllib/reportlist].  Please also report any ideas for enhancements you
       may have for either package and/or documentation.

KEYWORDS

       dom, parser, xml

COPYRIGHT

       2006 Darren New. All Rights Reserved.

NAME

SYNOPSIS

DESCRIPTION

COMMANDS

BUGS, IDEAS, FEEDBACK

KEYWORDS

CATEGORY

COPYRIGHT