Provided by: tclxml_3.3~svn11-3_amd64 bug

NAME

       TclXML - XML parser support for Tcl

SYNOPSIS

       package require xml

       package require parserclass

       ::xml::parserclass option ? arg arg ... ?

       ::xml::parser ? name? ? -option value ... ?

       parser option arg

DESCRIPTION

       TclXML provides event-based parsing of XML documents.  The application may register callback scripts  for
       certain  document  features, and when the parser encounters those features while parsing the document the
       callback is evaluated.

       The parser may also perform other functions, such as normalisation, validation and/or  entity  expansion.
       Generally,  these  functions are under the control of configuration options.  Whether these functions can
       be performed at all depends on the parser implementation.

       The TclXML package provides a generic interface for use by a Tcl  application,  along  with  a  low-level
       interface  for  use  by a parser implementation.  Each implementation provides a class of XML parser, and
       these register themselves using the ::xml::parserclass create command.   One  of  the  registered  parser
       classes will be the default parser class.

       Loading  the  package  with  the  generic package require xml command allows the package to automatically
       determine the default parser class.  In order to select a particular parser class as  the  default,  that
       class'  package  may  be  loaded directly, eg. package require xml::libxml2.  In all cases, all available
       parser classes are registered with the TclXML package, the difference is simply in which one becomes  the
       default.

COMMANDS

   ::xml::parserclass
       The ::xml::parserclass command is used to manage XML parser classes.

   Command Options
       The following command options may be used:

              create
                      create  name ? -createcommand script? ? -createentityparsercommand script? ? -parsecommand
                     script? ? -configurecommand script? ? -getcommand script? ? -deletecommand script?

              Creates an XML parser class with the given name.

              destroy
                      destroy name

              Destroys an XML parser class.

              info
                      info names default

              Returns information about registered XML parser classes.

   ::xml::parser
       The ::xml::parser command creates an XML parser object.  The return value of the command is the  name  of
       the newly created parser.

       The  parser  scans  an XML document's syntactical structure, evaluating callback scripts for each feature
       found.  At the very least the parser will normalise  the  document  and  check  the  document  for  well-
       formedness.   If  the document is not well-formed then the  -errorcommand option will be evaluated.  Some
       parser classes may perform additional functions, such as validation.  Additional features provided by the
       various parser classes are described in the section Parser Classes

       Parsing is performed synchronously.  The command blocks  until  the  entire  document  has  been  parsed.
       Parsing may be terminated by an application callback, see the section Callback Return Codes.  Incremental
       parsing is also supported by using the  -final configuration option.

   Configuration Options
       The ::xml::parser command accepts the following configuration options:

               -attlistdeclcommand
                      -attlistdeclcommand script

              Specifies  the  prefix  of a Tcl command to be evaluated whenever an attribute list declaration is
              encountered in the DTD subset of an XML document.  The command evaluated is: script name  attrname
              type default value

              where:

                                                  name

                                               Element type name

                                                  attrname

                                               Attribute name being declared

                                                  type

                                               Attribute type

                                                  default

                                               Attribute default, such as #IMPLIED

                                                  value

                                               Default attribute value.  Empty string if none given.

               -baseuri -baseurl
                      -baseuri URI

                      -baseurl URI

              Specifies  the  base URI for resolving relative URIs that may be used in the XML document to refer
              to external entities.

               -baseurl is deprecated in favour of  -baseuri.

               -characterdatacommand
                      -characterdatacommand script

              Specifies the prefix of a Tcl command to be evaluated whenever character data  is  encountered  in
              the XML document being parsed.  The command evaluated is: script data

              where:

                                                  data

                                               Character data in the document

               -commentcommand
                      -commentcommand script

              Specifies the prefix of a Tcl command to be evaluated whenever a comment is encountered in the XML
              document being parsed.  The command evaluated is: script data

              where:

                                                  data

                                               Comment data

               -defaultcommand
                      -defaultcommand script

              Specifies  the prefix of a Tcl command to be evaluated when no other callback has been defined for
              a document feature which has been encountered.  The command evaluated is: script data

              where:

                                                  data

                                               Document data

               -defaultexpandinternalentities
                      -defaultexpandinternalentities boolean

              Specifies whether entities declared in the internal DTD subset are expanded with their replacement
              text.  If entities are not expanded then the entity references will be reported with no expansion.

               -doctypecommand
                      -doctypecommand script

              Specifies the prefix of a Tcl command to be  evaluated  when  the  document  type  declaration  is
              encountered.  The command evaluated is: script name public system dtd

              where:

                                                  name

                                               The name of the document element

                                                  public

                                               Public identifier for the external DTD subset

                                                  system

                                               System identifier for the external DTD subset.  Usually a URI.

                                                  dtd

                                               The internal DTD subset

              See also  -startdoctypedeclcommand and  -enddoctypedeclcommand.

               -elementdeclcommand
                      -elementdeclcommand script

              Specifies  the  prefix  of  a  Tcl  command  to be evaluated when an element markup declaration is
              encountered.  The command evaluated is: script name model

              where:

                                                  name

                                               The element type name

                                                  model

                                               Content model specification

               -elementendcommand
                      -elementendcommand script

              Specifies the prefix of a Tcl command to be evaluated when an element end tag is encountered.  The
              command evaluated is: script name args

              where:

                                                  name

                                               The element type name that has ended

                                                  args

                                               Additional information about this element

              Additional information about the element  takes  the  form  of  configuration  options.   Possible
              options are:

                                                -empty

                                                  boolean

                                               The empty element syntax was used for this element

                                                -namespace

                                                  uri

                                               The element is in the XML namespace associated with the given URI

               -elementstartcommand
                      -elementstartcommand script

              Specifies  the  prefix  of a Tcl command to be evaluated when an element start tag is encountered.
              The command evaluated is: script name attlist args

              where:

                                                  name

                                               The element type name that has started

                                                  attlist

                                               A Tcl list containing the attributes for this element.  The  list
                            of attributes is formatted as pairs of attribute names and their values.

                                                  args

                                               Additional information about this element

              Additional  information  about  the  element  takes  the  form of configuration options.  Possible
              options are:

                                                -empty

                                                  boolean

                                               The empty element syntax was used for this element

                                                -namespace

                                                  uri

                                               The element is in the XML namespace associated with the given URI

                                                -namespacedecls

                                                  list

                                               The start tag included one or more  XML  Namespace  declarations.
                            list  is  a Tcl list giving the namespaces declared.  The list is formatted as pairs
                            of values, the first value is the namespace URI and the second value is  the  prefix
                            used  for  the namespace in this document.  A default XML namespace declaration will
                            have an empty string for the prefix.

               -encoding
                      -encoding value

              Gives the character encoding of the document.  This option only has an effect before a document is
              parsed.  The default character encoding is utf-8.  If the value unknown is  given,  or  any  value
              other  than  utf-8,  then  the  document text is treated as binary data.  If the value is given as
              unknown then the parser will attempt to automatically determine  the  character  encoding  of  the
              document  (using  byte-order-marks,  etc).  If any value other than utf-8 or unknown is given then
              the parser will read the document text using that character encoding.

               -endcdatasectioncommand
                      -endcdatasectioncommand script

              Specifies the prefix of a Tcl command to be evaluated when end of a CDATA section is  encountered.
              The command is evaluated with no further arguments.

               -enddoctypedeclcommand
                      -enddoctypedeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when end of the document type declaration is
              encountered.  The command is evaluated with no further arguments.

               -entitydeclcommand
                      -entitydeclcommand script

              Specifies  the  prefix of a Tcl command to be evaluated when an entity declaration is encountered.
              The command evaluated is: script name args

              where:

                                                  name

                                               The name of the entity being declared

                                                  args

                                               Additional information about the entity declaration.  An internal
                            entity shall have a single argument,  the  replacement  text.   An  external  parsed
                            entity  shall  have  two additional arguments, the public and system indentifiers of
                            the external resource.  An external unparsed  entity  shall  have  three  additional
                            arguments, the public and system identifiers followed by the notation name.

               -entityreferencecommand
                      -entityreferencecommand script

              Specifies  the  prefix  of  a Tcl command to be evaluated when an entity reference is encountered.
              The command evaluated is: script name

              where:

                                                  name

                                               The name of the entity being referenced

               -errorcommand
                      -errorcommand script

              Specifies the prefix of a Tcl command to be evaluated when a fatal error is detected.   The  error
              may  be  due to the XML document not being well-formed.  In the case of a validating parser class,
              the error may also be due to the XML document not obeying validity  constraints.   By  default,  a
              callback  script  is  provided  which causes an error return code, but an application may supply a
              script which attempts to continue parsing.  The command evaluated is: script errorcode errormsg

              where:

                                                  errorcode

                                               A single word description of the error, intended for  use  by  an
                            application

                                                  errormsg

                                               A human-readable description of the error

               -externalentitycommand
                      -externalentitycommand script

              Specifies the prefix of a Tcl command to be evaluated to resolve an external entity reference.  If
              the  parser  has  been  configured to validate the XML document, a default script is supplied that
              resolves the URI given as the system identifier of the external entity and recursively parses  the
              entity's  data.   If  the  parser  has been configured as a non-validating parser, then by default
              external entities are not resolved.  This option can be used to override  the  default  behaviour.
              The command evaluated is: script name baseuri uri id

              where:

                                                  name

                                               The Tcl command name of the current parser

                                                  baseuri

                                               An  absolute  URI  for  the current entity which is to be used to
                            resolve relative URIs

                                                  uri

                                               The system identifier of the external entity, usually a URI

                                                  id

                                               The public identifier of  the  external  entity.   If  no  public
                            identifier was given in the entity declaration then id will be an empty string.

              The  return  result  of  the callback script determines the action of the parser.  Note that these
              codes are interpreted in a different manner to other callbacks.

                     TCL_OK

                     The return result of the callback script is used as the external entity's  data.   The  URI
                     passed to the callback script is used as the entity's base URI.

                     This  is  useful to either override the normal loading of an entity's data, or to implement
                     new or alternative URI schemes.  As an example, the script below sets  an  external  entity
                     handler that intercepts "tcl:" URIs and evaluates them as inline Tcl scripts:

                     package require xml

                     proc External {name baseuri uri id} {
                         switch  -glob  --  $uri {      tcl:* {          regexp {^tcl:(.*)$} $uri discard script
                              return [uplevel #0 $script]      }      default {          return  -code  continue
                     {}      }
                         } }

                     set parser [xml::parser -externalentitycommand External] $parser parse {<!DOCTYPE example [
                       <!ENTITY example SYSTEM "tcl:set%20example%20HelloWorld"> ]> <example>
                       &example; </example> }

                     puts $example

                     This script will print "HelloWorld" to stdout.

                     TCL_CONTINUE

                     In a normal (non-safe) interpreter, the default external entity handler is used to load the
                     external  entity data as per normal operation of the parser.  If the parser is executing in
                     a Safe Tcl interpreter then the entity is not loaded at all.

                     This is useful to interpose on the loading of external entities  without  interfering  with
                     the loading of entities.

                     TCL_BREAK

                     No data is returned for this entity, ie. the entity is ignored.  No error is propagated.

                     TCL_ERROR

                     No  data  is  returned  for  this entity, ie. the entity is ignored.  A background error is
                     registered, using the result of the callback script.

               -final
                      -final boolean

              Specifies whether the  XML  document  being  parsed  is  complete.   If  the  document  is  to  be
              incrementally parsed then this option will be set to false, and when the last fragment of document
              is parsed it is set to true.  For example,

              set  parser  [::xml::parser  -final 0] $parser parse $data1 $parser parse $data2 $parser configure
              -final 1 $parser parse $finaldata

               -ignorewhitespace
                      -ignorewhitespace boolean

              If this option is set to true then spans of character data in the XML document which are  composed
              only of white-space (CR, LF, space, tab) will not be reported to the application.  In other words,
              the data passed to every invocation of the  -characterdatacommand script will contain at least one
              non-white-space character.

               -notationdeclcommand
                      -notationdeclcommand script

              Specifies  the prefix of a Tcl command to be evaluated when a notation declaration is encountered.
              The command evaluated is: script name uri

              where:

                                                  name

                                               The name of the notation

                                                  uri

                                               An external identifier for the notation, usually a URI.

               -notstandalonecommand
                      -notstandalonecommand script

              Specifies the prefix of a Tcl command to be evaluated when the  parser  determines  that  the  XML
              document being parsed is not a standalone document.

               -paramentityparsing
                      -paramentityparsing boolean

              Controls whether external parameter entities are parsed.

               -parameterentitydeclcommand
                      -parameterentitydeclcommand script

              Specifies  the  prefix  of  a  Tcl  command to be evaluated when a parameter entity declaration is
              encountered.  The command evaluated is: script name args

              where:

                                                  name

                                               The name of the parameter entity

                                                  args

                                               For an internal parameter entity there  is  only  one  additional
                            argument,  the  replacement  text.   For  external  parameter entities there are two
                            additional arguments, the system and public identifiers respectively.

               -parser
                      -parser name

              The name of the parser class to instantiate for this parser  object.   This  option  may  only  be
              specified when the parser instance is created.

               -processinginstructioncommand
                      -processinginstructioncommand script

              Specifies  the  prefix  of  a  Tcl  command  to  be  evaluated  when  a  processing instruction is
              encountered.  The command evaluated is: script target data

              where:

                                                  target

                                               The name of the processing instruction target

                                                  data

                                               Remaining data from the processing instruction

               -reportempty
                      -reportempty boolean

              If this option is enabled then when an element is encountered that uses the special empty  element
              syntax,  additional  arguments  are  appended to the  -elementstartcommand and  -elementendcommand
              callbacks.  The arguments  -empty 1 are appended.  For example: script -empty 1

               -startcdatasectioncommand
                      -startcdatasectioncommand script

              Specifies the prefix of a Tcl command to be evaluated when the start of a CDATA section section is
              encountered.  No arguments are appended to the script.

               -startdoctypedeclcommand
                      -startdoctypedeclcommand script

              Specifies the prefix of a Tcl command to be evaluated at the start of a document type declaration.
              No arguments are appended to the script.

               -unknownencodingcommand
                      -unknownencodingcommand script

              Specifies the prefix of a Tcl command to be evaluated when a  character  is  encountered  with  an
              unknown encoding.  This option has not been implemented.

               -unparsedentitydeclcommand
                      -unparsedentitydeclcommand script

              Specifies  the  prefix  of  a Tcl command to be evaluated when a declaration is encountered for an
              unparsed entity.  The command evaluated is: script system public notation

              where:

                                                  system

                                               The system identifier of the external entity, usually a URI

                                                  public

                                               The public identifier of the external entity

                                                  notation

                                               The name of the notation for the external entity

               -validate
                      -validate boolean

              Enables validation of the XML document to be parsed.  Any changes to this option are ignored after
              an XML document has started to be parsed, but the option may be changed after a reset.

               -warningcommand
                      -warningcommand script

              Specifies the prefix of a Tcl command to be evaluated when a warning  condition  is  detected.   A
              warning  condition  is  where the XML document has not been authored correctly, but is still well-
              formed and may be valid.  For example, the special empty element syntax may be used for an element
              which has not been declared to have empty content.  By default,  a  callback  script  is  provided
              which silently ignores the warning.  The command evaluated is: script warningcode warningmsg

              where:

                                                  warningcode

                                               A  single word description of the warning, intended for use by an
                            application

                                                  wanringmsg

                                               A human-readable description of the warning

               -xmldeclcommand
                      -xmldeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when the  XML  declaration  is  encountered.
              The command evaluated is: script version encoding standalone

              where:

                                                  version

                                               The  version  number  of  the  XML  specification  to  which this
                            document purports to conform

                                                  encoding

                                               The character encoding of the document

                                                  standalone

                                               A boolean declaring whether the document is standalone

   Parser Command
       The ::xml::parser command creates a new Tcl command with the same name as the parser.  This  command  may
       be  used  to  invoke  various  operations  on the parser object.  It has the following general form: name
       option arg

        option and the  arg determine the exact behaviour of the command.  The following commands  are  possible
       for parser objects:

               cget
                      cget -option

              Returns  the  current value of the configuration option given by  option.   Option may have any of
              the values accepted by the parser object.

               configure
                      configure -option value

              Modify the configuration options of the parser  object.    Option  may  have  any  of  the  values
              accepted by the parser object.

               entityparser
                      entityparser option value

              Creates a new parser object.  The new object inherits the same configuration options as the parent
              parser  object,  but  is able to parse XML data in a parsed entity.  The option  -dtdsubset allows
              markup declarations to be treated as being in the internal or external DTD subset.

               free
                      free name

              Frees all resources associated with the parser object.   The  object  is  not  usable  after  this
              command has been invoked.

               get
                      get name args

              Returns  information  about  the  XML document being parsed.  Each parser class provides different
              information, see the documentation for the parser class.

               parse
                      parse xml args

              Parses the XML document.  The usual desired effect is for  various  application  callbacks  to  be
              evaluated.   Other  functions  will  also be performed by the parser class, at the very least this
              includes checking the XML document for well-formedness.

               reset
                      reset

              Initialises the parser object in preparation for parsing a new XML document.

CALLBACK RETURN CODES

       Every callback script evaluated by a parser may return a return code other than   TCL_OK.   Return  codes
       are interpreted as follows:

              break Suppresses invocation of all further callback scripts.  The parse method returns the TCL_OK
              return code.

              continue Suppresses invocation of further callback scripts until the current element has finished.

              error Suppresses invocation of all further callback scripts.  The parse method also returns the
              TCL_ERROR return code.

              default Any other return code suppresses invocation of all further callback scripts.  The parse
              method returns the same return code.

ERROR MESSAGES

       If  an  error  or  warning  condition  is detected then an error message is returned.  These messages are
       structured as a Tcl list, as described below:

       {domain level code node line message int1 int2 string1 string2 string3}

              domain

              A code for the subsystem that detected the error.

              level

              Severity level of the problem.

              code

              A one word string describing the error.

              node

              If available, the token of the DOM node associated with the problem.

              line

              If known, the line number in the source XML document where the problem was detected.

              message

              A human-readable description of the problem.

              int1

              Additional integer data.  For the parser domain, this is  usually  the  column  number  where  the
              problem was detected.

              int2

              Additional integer data.

              string1

              Additional string data.

              string2

              Additional string data.

              string3

              Additional string data.

APPLICATION EXAMPLES

       This script outputs the character data of an XML document read from stdin.

       package require xml

       proc cdata {data args} {
           puts -nonewline $data }

       set parser [::xml::parser -characterdatacommand cdata] $parser parse [read stdin]

       This script counts the number of elements in an XML document read from stdin.

       package require xml

       proc EStart {varName name attlist args} {
           upvar #0 $varName var
           incr var }

       set  count  0  set  parser  [::xml::parser  -elementstartcommand [list EStart count]] $parser parse [read
       stdin] puts "The XML document contains $count elements"

SAFE XML

       TclXML/Tcl and TclXML/libxml2 may be used in a Safe Tcl interpreter.  When a document is parsed in a Safe
       Tcl interpreter, any attempt by  the  XML  document  to  load  an  external  entity  is  handled  by  the
       -externalentitycommand  callback.   This callback is evaluated in the context of the safe interpreter and
       therefore is subject to the security policy in force for that interpreter.   The  default  entity  loader
       will not be invoked, even if the callback script returns a TCL_CONTINUE code.

       See the description of the -externalentitycommand for further details.

PARSER CLASSES

       This section will discuss how a parser class is implemented.

   Tcl Parser Class
       The  pure-Tcl  parser  class  requires  no  compilation - it is a collection of Tcl scripts.  This parser
       implementation is non-validating, ie. it can only check  well-formedness  in  a  document.   However,  by
       enabling  the   -validate  option  it  will  read the document's DTD and resolve external entities.  This
       parser class is referred to as TclXML/tcl.

       This parser implementation aims to implement XML v1.0 and supports XML Namespaces.

       Generally the parser produces XML Infoset information  items.   That  is,  it  gives  the  application  a
       slightly higher-level view than the raw XML syntax.  For example, it does not report CDATA Sections.

       TclXML/tcl is not able to handle character encodings other than UTF-8.

   libxml2 Parser Class
       The  libxml2  parser class provides a Tcl interface to the libxml2 XML parser library.  This parser class
       is referred to as TclXML/libxml2.

       When the package is loaded the variable ::xml::libxml2::libxml2version is set to the  version  number  of
       the libxml2 library being used.

       On  MS  Windows,  it  is  necessary  to  load  the generic XML package first, and then the TclXML/libxml2
       package.  For example,

       package require xml package require xml::libxml2

   get Method
       TclXML/libxml2 provides the following arguments to the get method:

               document

              Returns the parsed document object.  libxml2  builds  an  in-memory  data  structure  of  the  XML
              document it parses (a DOM tree).  This method returns a handle (or token) for that structure.

              TclXML/libxml2  manages  the  document  object  as  a  Tcl  object.   See  the   -keep for further
              information.

   Additional Options
               -keep
                      -keep normal | implicit

              Controls how the TclXML/libxml2 packages manages  the  document  object.   The  default  value  is
              implicit;  the  document  is destroyed when the Tcl Object's internal representation is freed.  If
              the option is given the value normal then the document must be explicit destroyed.  The  only  way
              to explicitly destroy the document is by using the C API.

               -retainpath
                      -retainpath xpath

              The given XPath location path specifies which part of the document is to be kept after the parsing
              operation has completed.  By default, all document data is discard after it has been parsed.

               -retainpathns
                      -retainpathns prefix ns ...

              The  value  of  this  option  is a list of pairs of XML Namespace prefixes and their corresponding
              namespace URIs.  These are used by the XPath location path given in the  -retainpath option.

   Limitations
       The libxml2 parser classes has the following limitations:

              *   -reportempty has no effect.  libxml2 does not report empty element syntax.

              *  Incremental (push) parsing, ie.   -final 0 is not supported.

              *  TclXML/libxml2 does  not  provide  (DTD)  validation,  (WXS)  schema  validation  or  Relax  NG
              validation,  although  the  libxml2  library  does  provide  those functions.  These functions are
              provided by the TclDOM/libxml2 package, but only in a "posteriori" fashion  (ie.  only  after  the
              document has been parsed).

              *  libxml2 supports XML Namespaces.  The use of XML Namespaces can be queried, but the declaration
              of a XML Namespace is not reported.

KEYWORDS

TclXML                                                 3.2                                          TclXML(3tcl)