bionic (3) tclxml.3tcl.gz

Provided by: tclxml_3.3~svn11-3_amd64 bug

NAME

       TclXML - XML parser support for Tcl

SYNOPSIS

       package require xml

       package require parserclass

       ::xml::parserclass option ? arg arg ... ?

       ::xml::parser ? name? ? -option value ... ?

       parser option arg

DESCRIPTION

       TclXML  provides event-based parsing of XML documents.  The application may register callback scripts for
       certain document features, and when the parser encounters those features while parsing the  document  the
       callback is evaluated.

       The  parser  may also perform other functions, such as normalisation, validation and/or entity expansion.
       Generally, these functions are under the control of configuration options.  Whether these  functions  can
       be performed at all depends on the parser implementation.

       The  TclXML  package  provides  a  generic interface for use by a Tcl application, along with a low-level
       interface for use by a parser implementation.  Each implementation provides a class of  XML  parser,  and
       these  register  themselves  using  the  ::xml::parserclass create command.  One of the registered parser
       classes will be the default parser class.

       Loading the package with the generic package require xml command  allows  the  package  to  automatically
       determine  the  default  parser class.  In order to select a particular parser class as the default, that
       class' package may be loaded directly, eg. package require xml::libxml2.  In  all  cases,  all  available
       parser  classes are registered with the TclXML package, the difference is simply in which one becomes the
       default.

COMMANDS

   ::xml::parserclass
       The ::xml::parserclass command is used to manage XML parser classes.

   Command Options
       The following command options may be used:

              create
                      create name ? -createcommand script? ? -createentityparsercommand script? ?  -parsecommand
                     script? ? -configurecommand script? ? -getcommand script? ? -deletecommand script?

              Creates an XML parser class with the given name.

              destroy
                      destroy name

              Destroys an XML parser class.

              info
                      info names default

              Returns information about registered XML parser classes.

   ::xml::parser
       The  ::xml::parser  command creates an XML parser object.  The return value of the command is the name of
       the newly created parser.

       The parser scans an XML document's syntactical structure, evaluating callback scripts  for  each  feature
       found.   At  the  very  least  the  parser  will  normalise the document and check the document for well-
       formedness.  If the document is not well-formed then the  -errorcommand option will be  evaluated.   Some
       parser classes may perform additional functions, such as validation.  Additional features provided by the
       various parser classes are described in the section Parser Classes

       Parsing is performed synchronously.  The command blocks  until  the  entire  document  has  been  parsed.
       Parsing may be terminated by an application callback, see the section Callback Return Codes.  Incremental
       parsing is also supported by using the  -final configuration option.

   Configuration Options
       The ::xml::parser command accepts the following configuration options:

               -attlistdeclcommand
                      -attlistdeclcommand script

              Specifies the prefix of a Tcl command to be evaluated whenever an attribute  list  declaration  is
              encountered  in the DTD subset of an XML document.  The command evaluated is: script name attrname
              type default value

              where:

                                                  name

                                               Element type name

                                                  attrname

                                               Attribute name being declared

                                                  type

                                               Attribute type

                                                  default

                                               Attribute default, such as #IMPLIED

                                                  value

                                               Default attribute value.  Empty string if none given.

               -baseuri -baseurl
                      -baseuri URI

                      -baseurl URI

              Specifies the base URI for resolving relative URIs that may be used in the XML document  to  refer
              to external entities.

               -baseurl is deprecated in favour of  -baseuri.

               -characterdatacommand
                      -characterdatacommand script

              Specifies  the  prefix  of a Tcl command to be evaluated whenever character data is encountered in
              the XML document being parsed.  The command evaluated is: script data

              where:

                                                  data

                                               Character data in the document

               -commentcommand
                      -commentcommand script

              Specifies the prefix of a Tcl command to be evaluated whenever a comment is encountered in the XML
              document being parsed.  The command evaluated is: script data

              where:

                                                  data

                                               Comment data

               -defaultcommand
                      -defaultcommand script

              Specifies  the prefix of a Tcl command to be evaluated when no other callback has been defined for
              a document feature which has been encountered.  The command evaluated is: script data

              where:

                                                  data

                                               Document data

               -defaultexpandinternalentities
                      -defaultexpandinternalentities boolean

              Specifies whether entities declared in the internal DTD subset are expanded with their replacement
              text.  If entities are not expanded then the entity references will be reported with no expansion.

               -doctypecommand
                      -doctypecommand script

              Specifies  the  prefix  of  a  Tcl  command  to be evaluated when the document type declaration is
              encountered.  The command evaluated is: script name public system dtd

              where:

                                                  name

                                               The name of the document element

                                                  public

                                               Public identifier for the external DTD subset

                                                  system

                                               System identifier for the external DTD subset.  Usually a URI.

                                                  dtd

                                               The internal DTD subset

              See also  -startdoctypedeclcommand and  -enddoctypedeclcommand.

               -elementdeclcommand
                      -elementdeclcommand script

              Specifies the prefix of a Tcl command to be  evaluated  when  an  element  markup  declaration  is
              encountered.  The command evaluated is: script name model

              where:

                                                  name

                                               The element type name

                                                  model

                                               Content model specification

               -elementendcommand
                      -elementendcommand script

              Specifies the prefix of a Tcl command to be evaluated when an element end tag is encountered.  The
              command evaluated is: script name args

              where:

                                                  name

                                               The element type name that has ended

                                                  args

                                               Additional information about this element

              Additional information about the element  takes  the  form  of  configuration  options.   Possible
              options are:

                                                -empty

                                                  boolean

                                               The empty element syntax was used for this element

                                                -namespace

                                                  uri

                                               The element is in the XML namespace associated with the given URI

               -elementstartcommand
                      -elementstartcommand script

              Specifies  the  prefix  of a Tcl command to be evaluated when an element start tag is encountered.
              The command evaluated is: script name attlist args

              where:

                                                  name

                                               The element type name that has started

                                                  attlist

                                               A Tcl list containing the attributes for this element.  The  list
                            of attributes is formatted as pairs of attribute names and their values.

                                                  args

                                               Additional information about this element

              Additional  information  about  the  element  takes  the  form of configuration options.  Possible
              options are:

                                                -empty

                                                  boolean

                                               The empty element syntax was used for this element

                                                -namespace

                                                  uri

                                               The element is in the XML namespace associated with the given URI

                                                -namespacedecls

                                                  list

                                               The start tag included one or more  XML  Namespace  declarations.
                            list  is  a Tcl list giving the namespaces declared.  The list is formatted as pairs
                            of values, the first value is the namespace URI and the second value is  the  prefix
                            used  for  the namespace in this document.  A default XML namespace declaration will
                            have an empty string for the prefix.

               -encoding
                      -encoding value

              Gives the character encoding of the document.  This option only has an effect before a document is
              parsed.   The  default  character  encoding is utf-8.  If the value unknown is given, or any value
              other than utf-8, then the document text is treated as binary data.  If  the  value  is  given  as
              unknown  then  the  parser  will  attempt to automatically determine the character encoding of the
              document (using byte-order-marks, etc).  If any value other than utf-8 or unknown  is  given  then
              the parser will read the document text using that character encoding.

               -endcdatasectioncommand
                      -endcdatasectioncommand script

              Specifies  the prefix of a Tcl command to be evaluated when end of a CDATA section is encountered.
              The command is evaluated with no further arguments.

               -enddoctypedeclcommand
                      -enddoctypedeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when end of the document type declaration is
              encountered.  The command is evaluated with no further arguments.

               -entitydeclcommand
                      -entitydeclcommand script

              Specifies  the  prefix of a Tcl command to be evaluated when an entity declaration is encountered.
              The command evaluated is: script name args

              where:

                                                  name

                                               The name of the entity being declared

                                                  args

                                               Additional information about the entity declaration.  An internal
                            entity  shall  have  a  single  argument,  the replacement text.  An external parsed
                            entity shall have two additional arguments, the public and  system  indentifiers  of
                            the  external  resource.   An  external  unparsed entity shall have three additional
                            arguments, the public and system identifiers followed by the notation name.

               -entityreferencecommand
                      -entityreferencecommand script

              Specifies the prefix of a Tcl command to be evaluated when an  entity  reference  is  encountered.
              The command evaluated is: script name

              where:

                                                  name

                                               The name of the entity being referenced

               -errorcommand
                      -errorcommand script

              Specifies  the  prefix of a Tcl command to be evaluated when a fatal error is detected.  The error
              may be due to the XML document not being well-formed.  In the case of a validating  parser  class,
              the  error  may  also  be due to the XML document not obeying validity constraints.  By default, a
              callback script is provided which causes an error return code, but an  application  may  supply  a
              script which attempts to continue parsing.  The command evaluated is: script errorcode errormsg

              where:

                                                  errorcode

                                               A  single  word  description of the error, intended for use by an
                            application

                                                  errormsg

                                               A human-readable description of the error

               -externalentitycommand
                      -externalentitycommand script

              Specifies the prefix of a Tcl command to be evaluated to resolve an external entity reference.  If
              the  parser  has  been  configured to validate the XML document, a default script is supplied that
              resolves the URI given as the system identifier of the external entity and recursively parses  the
              entity's  data.   If  the  parser  has been configured as a non-validating parser, then by default
              external entities are not resolved.  This option can be used to override  the  default  behaviour.
              The command evaluated is: script name baseuri uri id

              where:

                                                  name

                                               The Tcl command name of the current parser

                                                  baseuri

                                               An  absolute  URI  for  the current entity which is to be used to
                            resolve relative URIs

                                                  uri

                                               The system identifier of the external entity, usually a URI

                                                  id

                                               The public identifier of  the  external  entity.   If  no  public
                            identifier was given in the entity declaration then id will be an empty string.

              The  return  result  of  the callback script determines the action of the parser.  Note that these
              codes are interpreted in a different manner to other callbacks.

                     TCL_OK

                     The return result of the callback script is used as the external entity's  data.   The  URI
                     passed to the callback script is used as the entity's base URI.

                     This  is  useful to either override the normal loading of an entity's data, or to implement
                     new or alternative URI schemes.  As an example, the script below sets  an  external  entity
                     handler that intercepts "tcl:" URIs and evaluates them as inline Tcl scripts:

                     package require xml

                     proc External {name baseuri uri id} {
                         switch  -glob  --  $uri {      tcl:* {          regexp {^tcl:(.*)$} $uri discard script
                              return [uplevel #0 $script]      }      default {          return  -code  continue
                     {}      }
                         } }

                     set parser [xml::parser -externalentitycommand External] $parser parse {<!DOCTYPE example [
                       <!ENTITY example SYSTEM "tcl:set%20example%20HelloWorld"> ]> <example>
                       &example; </example> }

                     puts $example

                     This script will print "HelloWorld" to stdout.

                     TCL_CONTINUE

                     In a normal (non-safe) interpreter, the default external entity handler is used to load the
                     external entity data as per normal operation of the parser.  If the parser is executing  in
                     a Safe Tcl interpreter then the entity is not loaded at all.

                     This  is  useful  to interpose on the loading of external entities without interfering with
                     the loading of entities.

                     TCL_BREAK

                     No data is returned for this entity, ie. the entity is ignored.  No error is propagated.

                     TCL_ERROR

                     No data is returned for this entity, ie. the entity is  ignored.   A  background  error  is
                     registered, using the result of the callback script.

               -final
                      -final boolean

              Specifies  whether  the  XML  document  being  parsed  is  complete.   If  the  document  is to be
              incrementally parsed then this option will be set to false, and when the last fragment of document
              is parsed it is set to true.  For example,

              set  parser  [::xml::parser  -final 0] $parser parse $data1 $parser parse $data2 $parser configure
              -final 1 $parser parse $finaldata

               -ignorewhitespace
                      -ignorewhitespace boolean

              If this option is set to true then spans of character data in the XML document which are  composed
              only of white-space (CR, LF, space, tab) will not be reported to the application.  In other words,
              the data passed to every invocation of the  -characterdatacommand script will contain at least one
              non-white-space character.

               -notationdeclcommand
                      -notationdeclcommand script

              Specifies  the prefix of a Tcl command to be evaluated when a notation declaration is encountered.
              The command evaluated is: script name uri

              where:

                                                  name

                                               The name of the notation

                                                  uri

                                               An external identifier for the notation, usually a URI.

               -notstandalonecommand
                      -notstandalonecommand script

              Specifies the prefix of a Tcl command to be evaluated when the  parser  determines  that  the  XML
              document being parsed is not a standalone document.

               -paramentityparsing
                      -paramentityparsing boolean

              Controls whether external parameter entities are parsed.

               -parameterentitydeclcommand
                      -parameterentitydeclcommand script

              Specifies  the  prefix  of  a  Tcl  command to be evaluated when a parameter entity declaration is
              encountered.  The command evaluated is: script name args

              where:

                                                  name

                                               The name of the parameter entity

                                                  args

                                               For an internal parameter entity there  is  only  one  additional
                            argument,  the  replacement  text.   For  external  parameter entities there are two
                            additional arguments, the system and public identifiers respectively.

               -parser
                      -parser name

              The name of the parser class to instantiate for this parser  object.   This  option  may  only  be
              specified when the parser instance is created.

               -processinginstructioncommand
                      -processinginstructioncommand script

              Specifies  the  prefix  of  a  Tcl  command  to  be  evaluated  when  a  processing instruction is
              encountered.  The command evaluated is: script target data

              where:

                                                  target

                                               The name of the processing instruction target

                                                  data

                                               Remaining data from the processing instruction

               -reportempty
                      -reportempty boolean

              If this option is enabled then when an element is encountered that uses the special empty  element
              syntax,  additional  arguments  are  appended to the  -elementstartcommand and  -elementendcommand
              callbacks.  The arguments  -empty 1 are appended.  For example: script -empty 1

               -startcdatasectioncommand
                      -startcdatasectioncommand script

              Specifies the prefix of a Tcl command to be evaluated when the start of a CDATA section section is
              encountered.  No arguments are appended to the script.

               -startdoctypedeclcommand
                      -startdoctypedeclcommand script

              Specifies the prefix of a Tcl command to be evaluated at the start of a document type declaration.
              No arguments are appended to the script.

               -unknownencodingcommand
                      -unknownencodingcommand script

              Specifies the prefix of a Tcl command to be evaluated when a  character  is  encountered  with  an
              unknown encoding.  This option has not been implemented.

               -unparsedentitydeclcommand
                      -unparsedentitydeclcommand script

              Specifies  the  prefix  of  a Tcl command to be evaluated when a declaration is encountered for an
              unparsed entity.  The command evaluated is: script system public notation

              where:

                                                  system

                                               The system identifier of the external entity, usually a URI

                                                  public

                                               The public identifier of the external entity

                                                  notation

                                               The name of the notation for the external entity

               -validate
                      -validate boolean

              Enables validation of the XML document to be parsed.  Any changes to this option are ignored after
              an XML document has started to be parsed, but the option may be changed after a reset.

               -warningcommand
                      -warningcommand script

              Specifies  the  prefix  of  a Tcl command to be evaluated when a warning condition is detected.  A
              warning condition is where the XML document has not been authored correctly, but  is  still  well-
              formed and may be valid.  For example, the special empty element syntax may be used for an element
              which has not been declared to have empty content.  By default,  a  callback  script  is  provided
              which silently ignores the warning.  The command evaluated is: script warningcode warningmsg

              where:

                                                  warningcode

                                               A  single word description of the warning, intended for use by an
                            application

                                                  wanringmsg

                                               A human-readable description of the warning

               -xmldeclcommand
                      -xmldeclcommand script

              Specifies the prefix of a Tcl command to be evaluated when the  XML  declaration  is  encountered.
              The command evaluated is: script version encoding standalone

              where:

                                                  version

                                               The  version  number  of  the  XML  specification  to  which this
                            document purports to conform

                                                  encoding

                                               The character encoding of the document

                                                  standalone

                                               A boolean declaring whether the document is standalone

   Parser Command
       The ::xml::parser command creates a new Tcl command with the same name as the parser.  This  command  may
       be  used  to  invoke  various  operations  on the parser object.  It has the following general form: name
       option arg

        option and the  arg determine the exact behaviour of the command.  The following commands  are  possible
       for parser objects:

               cget
                      cget -option

              Returns  the  current value of the configuration option given by  option.   Option may have any of
              the values accepted by the parser object.

               configure
                      configure -option value

              Modify the configuration options of the parser  object.    Option  may  have  any  of  the  values
              accepted by the parser object.

               entityparser
                      entityparser option value

              Creates a new parser object.  The new object inherits the same configuration options as the parent
              parser object, but is able to parse XML data in a parsed entity.  The  option   -dtdsubset  allows
              markup declarations to be treated as being in the internal or external DTD subset.

               free
                      free name

              Frees  all  resources  associated  with  the  parser  object.  The object is not usable after this
              command has been invoked.

               get
                      get name args

              Returns information about the XML document being parsed.  Each  parser  class  provides  different
              information, see the documentation for the parser class.

               parse
                      parse xml args

              Parses  the  XML  document.   The  usual desired effect is for various application callbacks to be
              evaluated.  Other functions will also be performed by the parser class, at  the  very  least  this
              includes checking the XML document for well-formedness.

               reset
                      reset

              Initialises the parser object in preparation for parsing a new XML document.

CALLBACK RETURN CODES

       Every  callback  script  evaluated by a parser may return a return code other than  TCL_OK.  Return codes
       are interpreted as follows:

              break Suppresses invocation of all further callback scripts.  The parse method returns the  TCL_OK
              return code.

              continue Suppresses invocation of further callback scripts until the current element has finished.

              error  Suppresses  invocation  of all further callback scripts.  The parse method also returns the
              TCL_ERROR return code.

              default Any other return code suppresses invocation of all further callback  scripts.   The  parse
              method returns the same return code.

ERROR MESSAGES

       If  an  error  or  warning  condition  is detected then an error message is returned.  These messages are
       structured as a Tcl list, as described below:

       {domain level code node line message int1 int2 string1 string2 string3}

              domain

              A code for the subsystem that detected the error.

              level

              Severity level of the problem.

              code

              A one word string describing the error.

              node

              If available, the token of the DOM node associated with the problem.

              line

              If known, the line number in the source XML document where the problem was detected.

              message

              A human-readable description of the problem.

              int1

              Additional integer data.  For the parser domain, this is  usually  the  column  number  where  the
              problem was detected.

              int2

              Additional integer data.

              string1

              Additional string data.

              string2

              Additional string data.

              string3

              Additional string data.

APPLICATION EXAMPLES

       This script outputs the character data of an XML document read from stdin.

       package require xml

       proc cdata {data args} {
           puts -nonewline $data }

       set parser [::xml::parser -characterdatacommand cdata] $parser parse [read stdin]

       This script counts the number of elements in an XML document read from stdin.

       package require xml

       proc EStart {varName name attlist args} {
           upvar #0 $varName var
           incr var }

       set  count  0  set  parser  [::xml::parser  -elementstartcommand [list EStart count]] $parser parse [read
       stdin] puts "The XML document contains $count elements"

SAFE XML

       TclXML/Tcl and TclXML/libxml2 may be used in a Safe Tcl interpreter.  When a document is parsed in a Safe
       Tcl  interpreter,  any  attempt  by  the  XML  document  to  load  an  external  entity is handled by the
       -externalentitycommand callback.  This callback is evaluated in the context of the safe  interpreter  and
       therefore  is  subject  to  the security policy in force for that interpreter.  The default entity loader
       will not be invoked, even if the callback script returns a TCL_CONTINUE code.

       See the description of the -externalentitycommand for further details.

PARSER CLASSES

       This section will discuss how a parser class is implemented.

   Tcl Parser Class
       The pure-Tcl parser class requires no compilation - it is a  collection  of  Tcl  scripts.   This  parser
       implementation  is  non-validating,  ie.  it  can  only check well-formedness in a document.  However, by
       enabling the  -validate option it will read the document's  DTD  and  resolve  external  entities.   This
       parser class is referred to as TclXML/tcl.

       This parser implementation aims to implement XML v1.0 and supports XML Namespaces.

       Generally  the  parser  produces  XML  Infoset  information  items.   That is, it gives the application a
       slightly higher-level view than the raw XML syntax.  For example, it does not report CDATA Sections.

       TclXML/tcl is not able to handle character encodings other than UTF-8.

   libxml2 Parser Class
       The libxml2 parser class provides a Tcl interface to the libxml2 XML parser library.  This  parser  class
       is referred to as TclXML/libxml2.

       When  the  package  is loaded the variable ::xml::libxml2::libxml2version is set to the version number of
       the libxml2 library being used.

       On MS Windows, it is necessary to load the  generic  XML  package  first,  and  then  the  TclXML/libxml2
       package.  For example,

       package require xml package require xml::libxml2

   get Method
       TclXML/libxml2 provides the following arguments to the get method:

               document

              Returns  the  parsed  document  object.   libxml2  builds  an  in-memory data structure of the XML
              document it parses (a DOM tree).  This method returns a handle (or token) for that structure.

              TclXML/libxml2 manages the  document  object  as  a  Tcl  object.   See  the   -keep  for  further
              information.

   Additional Options
               -keep
                      -keep normal | implicit

              Controls  how  the  TclXML/libxml2  packages  manages  the  document object.  The default value is
              implicit; the document is destroyed when the Tcl Object's internal representation  is  freed.   If
              the  option  is given the value normal then the document must be explicit destroyed.  The only way
              to explicitly destroy the document is by using the C API.

               -retainpath
                      -retainpath xpath

              The given XPath location path specifies which part of the document is to be kept after the parsing
              operation has completed.  By default, all document data is discard after it has been parsed.

               -retainpathns
                      -retainpathns prefix ns ...

              The  value  of  this  option  is a list of pairs of XML Namespace prefixes and their corresponding
              namespace URIs.  These are used by the XPath location path given in the  -retainpath option.

   Limitations
       The libxml2 parser classes has the following limitations:

              *   -reportempty has no effect.  libxml2 does not report empty element syntax.

              *  Incremental (push) parsing, ie.   -final 0 is not supported.

              *  TclXML/libxml2 does  not  provide  (DTD)  validation,  (WXS)  schema  validation  or  Relax  NG
              validation,  although  the  libxml2  library  does  provide  those functions.  These functions are
              provided by the TclDOM/libxml2 package, but only in a "posteriori" fashion  (ie.  only  after  the
              document has been parsed).

              *  libxml2 supports XML Namespaces.  The use of XML Namespaces can be queried, but the declaration
              of a XML Namespace is not reported.

KEYWORDS