Provided by: tclxml_3.3~svn11-3_amd64 bug

NAME

       TclXML - XML parser support for Tcl

SYNOPSIS

       package require xml

       package require parserclass

       ::xml::parserclass option ? arg arg ... ?

       ::xml::parser ? name? ? -option value ... ?

       parser option arg

DESCRIPTION

       TclXML  provides  event-based  parsing  of  XML  documents.   The application may register
       callback scripts for certain document features,  and  when  the  parser  encounters  those
       features while parsing the document the callback is evaluated.

       The  parser  may  also  perform  other functions, such as normalisation, validation and/or
       entity expansion.  Generally, these functions  are  under  the  control  of  configuration
       options.   Whether  these  functions  can  be  performed  at  all  depends  on  the parser
       implementation.

       The TclXML package provides a generic interface for use by a Tcl application, along with a
       low-level  interface  for  use by a parser implementation.  Each implementation provides a
       class of XML parser, and these register themselves  using  the  ::xml::parserclass  create
       command.  One of the registered parser classes will be the default parser class.

       Loading  the  package  with  the generic package require xml command allows the package to
       automatically determine the default parser class.  In order to select a particular  parser
       class  as  the  default,  that  class' package may be loaded directly, eg. package require
       xml::libxml2.  In all cases, all available parser classes are registered with  the  TclXML
       package, the difference is simply in which one becomes the default.

COMMANDS

   ::xml::parserclass
       The ::xml::parserclass command is used to manage XML parser classes.

   Command Options
       The following command options may be used:

              create
                      create name ? -createcommand script? ? -createentityparsercommand script? ?
                     -parsecommand script? ? -configurecommand script? ?  -getcommand  script?  ?
                     -deletecommand script?

              Creates an XML parser class with the given name.

              destroy
                      destroy name

              Destroys an XML parser class.

              info
                      info names default

              Returns information about registered XML parser classes.

   ::xml::parser
       The  ::xml::parser  command creates an XML parser object.  The return value of the command
       is the name of the newly created parser.

       The parser scans an XML document's syntactical structure, evaluating callback scripts  for
       each  feature  found.   At the very least the parser will normalise the document and check
       the  document  for  well-formedness.   If  the  document  is  not  well-formed  then   the
       -errorcommand  option  will  be  evaluated.   Some  parser  classes may perform additional
       functions, such as validation.  Additional features provided by the various parser classes
       are described in the section Parser Classes

       Parsing is performed synchronously.  The command blocks until the entire document has been
       parsed.  Parsing may be terminated by an application callback, see  the  section  Callback
       Return  Codes.   Incremental  parsing is also supported by using the  -final configuration
       option.

   Configuration Options
       The ::xml::parser command accepts the following configuration options:

               -attlistdeclcommand
                      -attlistdeclcommand script

              Specifies the prefix of a Tcl command to be evaluated whenever  an  attribute  list
              declaration  is  encountered  in  the  DTD  subset of an XML document.  The command
              evaluated is: script name attrname type default value

              where:

                                                  name

                                               Element type name

                                                  attrname

                                               Attribute name being declared

                                                  type

                                               Attribute type

                                                  default

                                               Attribute default, such as #IMPLIED

                                                  value

                                               Default attribute value.   Empty  string  if  none
                            given.

               -baseuri -baseurl
                      -baseuri URI

                      -baseurl URI

              Specifies  the  base  URI  for  resolving relative URIs that may be used in the XML
              document to refer to external entities.

               -baseurl is deprecated in favour of  -baseuri.

               -characterdatacommand
                      -characterdatacommand script

              Specifies the prefix of a Tcl command to be evaluated whenever  character  data  is
              encountered  in  the  XML  document being parsed.  The command evaluated is: script
              data

              where:

                                                  data

                                               Character data in the document

               -commentcommand
                      -commentcommand script

              Specifies the prefix of a Tcl  command  to  be  evaluated  whenever  a  comment  is
              encountered  in  the  XML  document being parsed.  The command evaluated is: script
              data

              where:

                                                  data

                                               Comment data

               -defaultcommand
                      -defaultcommand script

              Specifies the prefix of a Tcl command to be evaluated when no  other  callback  has
              been  defined  for  a  document  feature  which  has been encountered.  The command
              evaluated is: script data

              where:

                                                  data

                                               Document data

               -defaultexpandinternalentities
                      -defaultexpandinternalentities boolean

              Specifies whether entities declared in the internal DTD subset  are  expanded  with
              their  replacement  text.   If entities are not expanded then the entity references
              will be reported with no expansion.

               -doctypecommand
                      -doctypecommand script

              Specifies the prefix of a Tcl command  to  be  evaluated  when  the  document  type
              declaration  is  encountered.   The command evaluated is: script name public system
              dtd

              where:

                                                  name

                                               The name of the document element

                                                  public

                                               Public identifier for the external DTD subset

                                                  system

                                               System identifier for  the  external  DTD  subset.
                            Usually a URI.

                                                  dtd

                                               The internal DTD subset

              See also  -startdoctypedeclcommand and  -enddoctypedeclcommand.

               -elementdeclcommand
                      -elementdeclcommand script

              Specifies  the  prefix  of  a  Tcl  command  to be evaluated when an element markup
              declaration is encountered.  The command evaluated is: script name model

              where:

                                                  name

                                               The element type name

                                                  model

                                               Content model specification

               -elementendcommand
                      -elementendcommand script

              Specifies the prefix of a Tcl command to be evaluated when an element  end  tag  is
              encountered.  The command evaluated is: script name args

              where:

                                                  name

                                               The element type name that has ended

                                                  args

                                               Additional information about this element

              Additional  information  about the element takes the form of configuration options.
              Possible options are:

                                                -empty

                                                  boolean

                                               The empty element syntax was used for this element

                                                -namespace

                                                  uri

                                               The element is in  the  XML  namespace  associated
                            with the given URI

               -elementstartcommand
                      -elementstartcommand script

              Specifies  the prefix of a Tcl command to be evaluated when an element start tag is
              encountered.  The command evaluated is: script name attlist args

              where:

                                                  name

                                               The element type name that has started

                                                  attlist

                                               A Tcl list  containing  the  attributes  for  this
                            element.   The  list of attributes is formatted as pairs of attribute
                            names and their values.

                                                  args

                                               Additional information about this element

              Additional information about the element takes the form of  configuration  options.
              Possible options are:

                                                -empty

                                                  boolean

                                               The empty element syntax was used for this element

                                                -namespace

                                                  uri

                                               The  element  is  in  the XML namespace associated
                            with the given URI

                                                -namespacedecls

                                                  list

                                               The start tag included one or more  XML  Namespace
                            declarations.   list  is  a  Tcl list giving the namespaces declared.
                            The list is formatted as pairs of values,  the  first  value  is  the
                            namespace  URI  and  the  second  value  is  the  prefix used for the
                            namespace in this document.  A default XML namespace declaration will
                            have an empty string for the prefix.

               -encoding
                      -encoding value

              Gives  the  character  encoding  of  the  document.  This option only has an effect
              before a document is parsed.  The default character  encoding  is  utf-8.   If  the
              value  unknown  is  given, or any value other than utf-8, then the document text is
              treated as binary data.  If the value is given as  unknown  then  the  parser  will
              attempt  to  automatically  determine the character encoding of the document (using
              byte-order-marks, etc).  If any value other than utf-8 or unknown is given then the
              parser will read the document text using that character encoding.

               -endcdatasectioncommand
                      -endcdatasectioncommand script

              Specifies  the  prefix of a Tcl command to be evaluated when end of a CDATA section
              is encountered.  The command is evaluated with no further arguments.

               -enddoctypedeclcommand
                      -enddoctypedeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when end of the document type
              declaration is encountered.  The command is evaluated with no further arguments.

               -entitydeclcommand
                      -entitydeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when an entity declaration is
              encountered.  The command evaluated is: script name args

              where:

                                                  name

                                               The name of the entity being declared

                                                  args

                                               Additional   information    about    the    entity
                            declaration.   An  internal  entity shall have a single argument, the
                            replacement  text.   An  external  parsed  entity  shall   have   two
                            additional  arguments,  the  public  and  system  indentifiers of the
                            external resource.  An external  unparsed  entity  shall  have  three
                            additional  arguments,  the public and system identifiers followed by
                            the notation name.

               -entityreferencecommand
                      -entityreferencecommand script

              Specifies the prefix of a Tcl command to be evaluated when an entity  reference  is
              encountered.  The command evaluated is: script name

              where:

                                                  name

                                               The name of the entity being referenced

               -errorcommand
                      -errorcommand script

              Specifies  the  prefix  of  a  Tcl  command  to  be evaluated when a fatal error is
              detected.  The error may be due to the XML document not being well-formed.  In  the
              case  of  a  validating parser class, the error may also be due to the XML document
              not obeying validity constraints.  By default, a callback script is provided  which
              causes  an error return code, but an application may supply a script which attempts
              to continue parsing.  The command evaluated is: script errorcode errormsg

              where:

                                                  errorcode

                                               A single word description of the  error,  intended
                            for use by an application

                                                  errormsg

                                               A human-readable description of the error

               -externalentitycommand
                      -externalentitycommand script

              Specifies the prefix of a Tcl command to be evaluated to resolve an external entity
              reference.  If the parser has been configured  to  validate  the  XML  document,  a
              default  script is supplied that resolves the URI given as the system identifier of
              the external entity and recursively parses the entity's data.  If  the  parser  has
              been  configured  as a non-validating parser, then by default external entities are
              not resolved.  This option can be used to  override  the  default  behaviour.   The
              command evaluated is: script name baseuri uri id

              where:

                                                  name

                                               The Tcl command name of the current parser

                                                  baseuri

                                               An absolute URI for the current entity which is to
                            be used to resolve relative URIs

                                                  uri

                                               The system  identifier  of  the  external  entity,
                            usually a URI

                                                  id

                                               The  public identifier of the external entity.  If
                            no public identifier was given in the entity declaration then id will
                            be an empty string.

              The return result of the callback script determines the action of the parser.  Note
              that these codes are interpreted in a different manner to other callbacks.

                     TCL_OK

                     The return result of the callback script is used as  the  external  entity's
                     data.   The  URI  passed to the callback script is used as the entity's base
                     URI.

                     This is useful to either override the normal loading of an entity's data, or
                     to  implement  new  or  alternative  URI schemes.  As an example, the script
                     below sets an external  entity  handler  that  intercepts  "tcl:"  URIs  and
                     evaluates them as inline Tcl scripts:

                     package require xml

                     proc External {name baseuri uri id} {
                         switch  -glob  --  $uri {      tcl:* {          regexp {^tcl:(.*)$} $uri
                     discard script          return [uplevel #0 $script]       }       default  {
                              return -code continue {}      }
                         } }

                     set  parser  [xml::parser  -externalentitycommand  External]  $parser  parse
                     {<!DOCTYPE example [
                       <!ENTITY example SYSTEM "tcl:set%20example%20HelloWorld"> ]> <example>
                       &example; </example> }

                     puts $example

                     This script will print "HelloWorld" to stdout.

                     TCL_CONTINUE

                     In a normal (non-safe) interpreter, the default external entity  handler  is
                     used to load the external entity data as per normal operation of the parser.
                     If the parser is executing in a Safe Tcl interpreter then the entity is  not
                     loaded at all.

                     This  is  useful  to  interpose  on the loading of external entities without
                     interfering with the loading of entities.

                     TCL_BREAK

                     No data is returned for this entity, ie. the entity is ignored.  No error is
                     propagated.

                     TCL_ERROR

                     No  data  is  returned  for  this  entity,  ie.  the  entity  is ignored.  A
                     background error is registered, using the result of the callback script.

               -final
                      -final boolean

              Specifies whether the XML document being parsed is complete.  If the document is to
              be  incrementally  parsed  then this option will be set to false, and when the last
              fragment of document is parsed it is set to true.  For example,

              set parser [::xml::parser -final 0]  $parser  parse  $data1  $parser  parse  $data2
              $parser configure -final 1 $parser parse $finaldata

               -ignorewhitespace
                      -ignorewhitespace boolean

              If  this  option  is  set  to true then spans of character data in the XML document
              which are composed only of white-space (CR, LF, space, tab) will not be reported to
              the  application.   In  other  words,  the  data  passed to every invocation of the
              -characterdatacommand script will contain at least one non-white-space character.

               -notationdeclcommand
                      -notationdeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when a  notation  declaration
              is encountered.  The command evaluated is: script name uri

              where:

                                                  name

                                               The name of the notation

                                                  uri

                                               An external identifier for the notation, usually a
                            URI.

               -notstandalonecommand
                      -notstandalonecommand script

              Specifies the prefix of a Tcl command to be evaluated when  the  parser  determines
              that the XML document being parsed is not a standalone document.

               -paramentityparsing
                      -paramentityparsing boolean

              Controls whether external parameter entities are parsed.

               -parameterentitydeclcommand
                      -parameterentitydeclcommand script

              Specifies  the  prefix  of  a  Tcl  command to be evaluated when a parameter entity
              declaration is encountered.  The command evaluated is: script name args

              where:

                                                  name

                                               The name of the parameter entity

                                                  args

                                               For an internal parameter entity there is only one
                            additional  argument,  the  replacement text.  For external parameter
                            entities there are two additional arguments, the  system  and  public
                            identifiers respectively.

               -parser
                      -parser name

              The  name  of  the parser class to instantiate for this parser object.  This option
              may only be specified when the parser instance is created.

               -processinginstructioncommand
                      -processinginstructioncommand script

              Specifies the prefix of a Tcl command to be evaluated when a processing instruction
              is encountered.  The command evaluated is: script target data

              where:

                                                  target

                                               The name of the processing instruction target

                                                  data

                                               Remaining data from the processing instruction

               -reportempty
                      -reportempty boolean

              If this option is enabled then when an element is encountered that uses the special
              empty   element   syntax,   additional    arguments    are    appended    to    the
              -elementstartcommand  and   -elementendcommand  callbacks.  The arguments  -empty 1
              are appended.  For example: script -empty 1

               -startcdatasectioncommand
                      -startcdatasectioncommand script

              Specifies the prefix of a Tcl command to be evaluated when the  start  of  a  CDATA
              section section is encountered.  No arguments are appended to the script.

               -startdoctypedeclcommand
                      -startdoctypedeclcommand script

              Specifies  the  prefix  of a Tcl command to be evaluated at the start of a document
              type declaration.  No arguments are appended to the script.

               -unknownencodingcommand
                      -unknownencodingcommand script

              Specifies the prefix of  a  Tcl  command  to  be  evaluated  when  a  character  is
              encountered with an unknown encoding.  This option has not been implemented.

               -unparsedentitydeclcommand
                      -unparsedentitydeclcommand script

              Specifies  the  prefix  of  a  Tcl  command  to  be evaluated when a declaration is
              encountered for an unparsed entity.  The command evaluated is: script system public
              notation

              where:

                                                  system

                                               The  system  identifier  of  the  external entity,
                            usually a URI

                                                  public

                                               The public identifier of the external entity

                                                  notation

                                               The name of the notation for the external entity

               -validate
                      -validate boolean

              Enables validation of the XML document to be parsed.  Any changes  to  this  option
              are  ignored  after an XML document has started to be parsed, but the option may be
              changed after a reset.

               -warningcommand
                      -warningcommand script

              Specifies the prefix of a Tcl command to be evaluated when a warning  condition  is
              detected.   A  warning  condition  is  where the XML document has not been authored
              correctly, but is still well-formed and may be valid.   For  example,  the  special
              empty element syntax may be used for an element which has not been declared to have
              empty content.  By default, a callback script is provided  which  silently  ignores
              the warning.  The command evaluated is: script warningcode warningmsg

              where:

                                                  warningcode

                                               A single word description of the warning, intended
                            for use by an application

                                                  wanringmsg

                                               A human-readable description of the warning

               -xmldeclcommand
                      -xmldeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when the XML  declaration  is
              encountered.  The command evaluated is: script version encoding standalone

              where:

                                                  version

                                               The  version  number  of  the XML specification to
                            which this document purports to conform

                                                  encoding

                                               The character encoding of the document

                                                  standalone

                                               A  boolean  declaring  whether  the  document   is
                            standalone

   Parser Command
       The  ::xml::parser  command  creates  a  new Tcl command with the same name as the parser.
       This command may be used to invoke various operations on the parser object.   It  has  the
       following general form: name option arg

        option and the  arg determine the exact behaviour of the command.  The following commands
       are possible for parser objects:

               cget
                      cget -option

              Returns the current value of the configuration option given  by   option.    Option
              may have any of the values accepted by the parser object.

               configure
                      configure -option value

              Modify the configuration options of the parser object.   Option may have any of the
              values accepted by the parser object.

               entityparser
                      entityparser option value

              Creates a new parser object.   The  new  object  inherits  the  same  configuration
              options  as  the  parent  parser  object, but is able to parse XML data in a parsed
              entity.  The option  -dtdsubset allows markup declarations to be treated  as  being
              in the internal or external DTD subset.

               free
                      free name

              Frees  all  resources  associated with the parser object.  The object is not usable
              after this command has been invoked.

               get
                      get name args

              Returns information about  the  XML  document  being  parsed.   Each  parser  class
              provides different information, see the documentation for the parser class.

               parse
                      parse xml args

              Parses  the  XML  document.   The  usual  desired effect is for various application
              callbacks to be evaluated.  Other functions will also be performed  by  the  parser
              class,  at  the  very  least  this  includes  checking  the  XML document for well-
              formedness.

               reset
                      reset

              Initialises the parser object in preparation for parsing a new XML document.

CALLBACK RETURN CODES

       Every callback script evaluated by a parser may return a return code other  than   TCL_OK.
       Return codes are interpreted as follows:

              break  Suppresses  invocation  of  all  further callback scripts.  The parse method
              returns the TCL_OK return code.

              continue Suppresses invocation  of  further  callback  scripts  until  the  current
              element has finished.

              error Suppresses invocation of all further callback scripts.  The parse method also
              returns the TCL_ERROR return code.

              default Any other  return  code  suppresses  invocation  of  all  further  callback
              scripts.  The parse method returns the same return code.

ERROR MESSAGES

       If  an  error  or  warning condition is detected then an error message is returned.  These
       messages are structured as a Tcl list, as described below:

       {domain level code node line message int1 int2 string1 string2 string3}

              domain

              A code for the subsystem that detected the error.

              level

              Severity level of the problem.

              code

              A one word string describing the error.

              node

              If available, the token of the DOM node associated with the problem.

              line

              If known, the line number  in  the  source  XML  document  where  the  problem  was
              detected.

              message

              A human-readable description of the problem.

              int1

              Additional  integer data.  For the parser domain, this is usually the column number
              where the problem was detected.

              int2

              Additional integer data.

              string1

              Additional string data.

              string2

              Additional string data.

              string3

              Additional string data.

APPLICATION EXAMPLES

       This script outputs the character data of an XML document read from stdin.

       package require xml

       proc cdata {data args} {
           puts -nonewline $data }

       set parser [::xml::parser -characterdatacommand cdata] $parser parse [read stdin]

       This script counts the number of elements in an XML document read from stdin.

       package require xml

       proc EStart {varName name attlist args} {
           upvar #0 $varName var
           incr var }

       set count 0 set parser [::xml::parser -elementstartcommand [list  EStart  count]]  $parser
       parse [read stdin] puts "The XML document contains $count elements"

SAFE XML

       TclXML/Tcl  and  TclXML/libxml2 may be used in a Safe Tcl interpreter.  When a document is
       parsed in a Safe Tcl interpreter, any attempt by the XML  document  to  load  an  external
       entity  is  handled by the -externalentitycommand callback.  This callback is evaluated in
       the context of the safe interpreter and therefore is subject to  the  security  policy  in
       force  for  that  interpreter.  The default entity loader will not be invoked, even if the
       callback script returns a TCL_CONTINUE code.

       See the description of the -externalentitycommand for further details.

PARSER CLASSES

       This section will discuss how a parser class is implemented.

   Tcl Parser Class
       The pure-Tcl parser class requires no compilation - it is a  collection  of  Tcl  scripts.
       This  parser  implementation is non-validating, ie. it can only check well-formedness in a
       document.  However, by enabling the  -validate option it will read the document's DTD  and
       resolve external entities.  This parser class is referred to as TclXML/tcl.

       This parser implementation aims to implement XML v1.0 and supports XML Namespaces.

       Generally  the  parser  produces  XML  Infoset  information  items.  That is, it gives the
       application a slightly higher-level view than the raw XML syntax.  For  example,  it  does
       not report CDATA Sections.

       TclXML/tcl is not able to handle character encodings other than UTF-8.

   libxml2 Parser Class
       The libxml2 parser class provides a Tcl interface to the libxml2 XML parser library.  This
       parser class is referred to as TclXML/libxml2.

       When the package is loaded the  variable  ::xml::libxml2::libxml2version  is  set  to  the
       version number of the libxml2 library being used.

       On  MS  Windows,  it  is  necessary  to  load  the generic XML package first, and then the
       TclXML/libxml2 package.  For example,

       package require xml package require xml::libxml2

   get Method
       TclXML/libxml2 provides the following arguments to the get method:

               document

              Returns the parsed document object.  libxml2 builds an in-memory data structure  of
              the  XML  document it parses (a DOM tree).  This method returns a handle (or token)
              for that structure.

              TclXML/libxml2 manages the document object as a Tcl object.   See  the   -keep  for
              further information.

   Additional Options
               -keep
                      -keep normal | implicit

              Controls  how the TclXML/libxml2 packages manages the document object.  The default
              value is implicit; the  document  is  destroyed  when  the  Tcl  Object's  internal
              representation is freed.  If the option is given the value normal then the document
              must be explicit destroyed.  The only way to explicitly destroy the document is  by
              using the C API.

               -retainpath
                      -retainpath xpath

              The  given  XPath  location path specifies which part of the document is to be kept
              after the parsing operation has  completed.   By  default,  all  document  data  is
              discard after it has been parsed.

               -retainpathns
                      -retainpathns prefix ns ...

              The  value  of  this  option is a list of pairs of XML Namespace prefixes and their
              corresponding namespace URIs.  These are used by the XPath location path  given  in
              the  -retainpath option.

   Limitations
       The libxml2 parser classes has the following limitations:

              *   -reportempty has no effect.  libxml2 does not report empty element syntax.

              *  Incremental (push) parsing, ie.   -final 0 is not supported.

              *   TclXML/libxml2  does  not  provide (DTD) validation, (WXS) schema validation or
              Relax NG validation, although the libxml2 library  does  provide  those  functions.
              These  functions  are  provided  by  the  TclDOM/libxml2  package,  but  only  in a
              "posteriori" fashion (ie. only after the document has been parsed).

              *  libxml2 supports XML Namespaces.  The use of XML Namespaces can be queried,  but
              the declaration of a XML Namespace is not reported.

KEYWORDS