Provided by: tdom_0.9.2-1_amd64 bug

NAME

       expat - Creates an instance of an expat parser object

SYNOPSIS

       package require tdom

       expat ?parsername? ?-namespace? ?arg arg ..

       xml::parser ?parsername? ?-namespace? ?arg arg ..
_________________________________________________________________

DESCRIPTION

       The  parser  created with expat or xml::parser (which is just another name for the same command in an own
       namespace) are able to parse any kind of well-formed XML. The parsers are  stream  oriented  XML  parser.
       This  means  that you register handler scripts with the parser prior to starting the parse. These handler
       scripts are called when the parser discovers the associated structures in the document being  parsed.   A
       start tag is an example of the kind of structures for which you may register a handler script.

       The parsers always check for XML well-formedness of the input (and report error, if the input isn't well-
       formed). They parse the internal DTD and, at request, external DTD and external entities, if you  resolve
       the  identifier  of  the external entities with the -externalentitycommand script (see there). If you use
       the -validateCmd option (see there), the input is additionally validated.

       Additionly, the Tcl extension code that implements this command provides an API for adding C level  coded
       handlers.  Up to now, there exists the parser extension command "tdom". The handler set installed by this
       extension build an in memory "tDOM" DOM tree, while the parser is parsing the input.

       It is possible to register an arbitrary amount of different handler scripts and C level handlers for most
       of the events. If the event occurs, they are called in turn.

COMMAND OPTIONS

       -namespace

              Enables  namespace  parsing.  You must use this option while creating the parser with the expat or
              xml::parser command. You can't enable (nor disable) namespace parsing with  <parserobj>  configure
              ....

       -namespaceseparator  char

              This  option  has  only effect, if used together with the option -namespace. If given, this option
              determines the character inserted between namespace URI and the local name, while reporting an XML
              element  name  to  a  handler  script.  The default is the character ':'. The value must be a one-
              character string less or equal to \u00FF, preferably a 7-bit ASCII character or the empty  string.
              If  the  value  is  the  empty string (as well, as if the value is \x00) the namespace URI and the
              local name will be concatenated without any separator.

       -final  boolean

              This option indicates whether the document data next presented to the parse method  is  the  final
              part  of  the  document.  A  value  of  "0"  indicates  that more data is expected. A value of "1"
              indicates that no more is expected.  The default value is "1".

              If this option is set to "0" then the parser will not report certain errors if the XML data is not
              well-formed upon end of input, such as unclosed or unbalanced start or end tags. Instead some data
              may be saved by the parser until the next call to the parse method, thus delaying the reporting of
              some of the data.

              If  this  option  is  set  to  "1" then documents which are not well-formed upon end of input will
              generate an error.

       -validateCmd  <tdom schema cmd>

              This option expects the name of a tDOM schema command. If this option is given, then the input  is
              also  validated. If the schema command hasn't set a reportcmd then the first validation error will
              stop further parsing (as a well-formedness error).

       -baseurl  url

              Reports the base url of the document to the parser.

       -elementstartcommand  script

              Specifies a Tcl command to associate with the start tag of an element. The actual command consists
              of this option followed by at least two arguments: the element type name and the attribute list.

              The attribute list is a Tcl list consisting of name/value pairs, suitable for passing to the array
              set Tcl command.

              Example:

                     proc HandleStart {name attlist} {
                         puts stderr "Element start ==> $name has attributes $attlist"
                     }

                     $parser configure -elementstartcommand HandleStart

                     $parser parse {<test id="123"></test>}

              This would result in the following command being invoked:

                     HandleStart text {id 123}

       -elementendcommand  script

              Specifies a Tcl command to associate with the end tag of an element. The actual  command  consists
              of  this  option  followed  by  at  least one argument: the element type name. In addition, if the
              -reportempty option is set then the command may be invoked with the -empty configuration option to
              indicate  whether  it  is  an empty element. See the description of the -reportempty option for an
              example.

              Example:

                     proc HandleEnd {name} {
                         puts stderr "Element end ==> $name"
                     }

                     $parser configure -elementendcommand HandleEnd

                     $parser parse {<test id="123"></test>}

              This would result in the following command being invoked:

                     HandleEnd test

       -characterdatacommand  script

              Specifies a Tcl command to associate with character data in the document,  ie.  text.  The  actual
              command consists of this option followed by one argument: the text.

              It  is  not  guaranteed  that character data will be passed to the application in a single call to
              this command. That is, the application should be prepared to receive multiple invocations of  this
              callback with no intervening callbacks from other features.

              Example:

                     proc HandleText {data} {
                         puts stderr "Character data ==> $data"
                     }

                     $parser configure -characterdatacommand HandleText

                     $parser parse {<test>this is a test document</test>}

              This would result in the following command being invoked:

                     HandleText {this is a test document}

       -processinginstructioncommand  script

              Specifies  a  Tcl  command  to  associate with processing instructions in the document. The actual
              command consists of this option followed by two arguments: the PI target and the PI data.

              Example:

                     proc HandlePI {target data} {
                         puts stderr "Processing instruction ==> $target $data"
                     }

                     $parser configure -processinginstructioncommand HandlePI

                     $parser parse {<test><?special this is a processing instruction?></test>}

              This would result in the following command being invoked:

                     HandlePI special {this is a processing instruction}

        -notationdeclcommand  script

              Specifies a Tcl command to associate with notation declaration in the document. The actual command
              consists  of  this  option  followed  by  four  arguments:  the notation name, the base uri of the
              document (this means, whatever was set by the -baseurl option),  the  system  identifier  and  the
              public identifier. The notation name is never empty, the other arguments may be.

        -externalentitycommand  script

              Specifies  a  Tcl  command  to associate with references to external entities in the document. The
              actual command consists of this option followed by three  arguments:  the  base  uri,  the  system
              identifier  of  the  entity  and  the public identifier of the entity. The base uri and the public
              identifier may be the empty list.

              This handler script has to return a tcl list consisting of three elements. The  first  element  of
              this  list signals, how the external entity is returned to the processor. At the moment, the three
              allowed types are "string", "channel" and "filename". The second element of the list has to be the
              (absolute)  base URI of the external entity to be parsed.  The third element of the list are data,
              either the already read data out of the external entity as string in the case of type "string", or
              the name of a tcl channel, in the case of type "channel", or the path to the external entity to be
              read in case of type "filename". Behind the scene, the external entity referenced by the  returned
              Tcl channel, string or file name will be parsed with an expat external entity parser with the same
              handler sets as the main parser. If parsing of the external entity fails,  the  whole  parsing  is
              stopped  with an error message. If a Tcl command registered as externalentitycommand isn't able to
              resolve an external entity it is allowed to return TCL_CONTINUE. In this case,  the  wrapper  give
              the next registered externalentitycommand a try. If no externalentitycommand is able to handle the
              external entity parsing stops with an error.

              Example:

                     proc externalEntityRefHandler {base systemId publicId} {
                         if {![regexp {^[a-zA-Z]+:/} $systemId]}  {
                             regsub {^[a-zA-Z]+:} $base {} base
                             set basedir [file dirname $base]
                             set systemId "[set basedir]/[set systemId]"
                         } else {
                             regsub {^[a-zA-Z]+:} $systemId systemId
                         }
                         if {[catch {set fd [open $systemId]}]} {
                             return -code error \
                                     -errorinfo "Failed to open external entity $systemId"
                         }
                         return [list channel $systemId $fd]
                     }

                     set parser [expat -externalentitycommand externalEntityRefHandler \
                                       -baseurl "file:///local/doc/doc.xml" \
                                       -paramentityparsing notstandalone]
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test SYSTEM "test.dtd">
                     <test/>}

              This would result in the following command being invoked:

                     externalEntityRefHandler file:///local/doc/doc.xml test.dtd {}

              External entities are only tried to resolve via this handler script,  if  necessary.  This  means,
              external  parameter  entities  triggers  this  handler  only,  if -paramentityparsing is used with
              argument "always" or if -paramentityparsing is used with argument "notstandalone" and the document
              isn't marked as standalone.

        -unknownencodingcommand  script

              Not implemented at Tcl level.

       -startnamespacedeclcommand  script

              Specifies  a  Tcl command to associate with start scope of namespace declarations in the document.
              The actual command consists of this option followed by two arguments: the namespace prefix and the
              namespace  URI. For an xmlns attribute, prefix will be the empty list.  For an xmlns="" attribute,
              uri will be the empty list. The call to the start and end element handlers occur between the calls
              to the start and end namespace declaration handlers.

        -endnamespacedeclcommand  script

              Specifies a Tcl command to associate with end scope of namespace declarations in the document. The
              actual command consists of this option followed by the namespace prefix as argument. In case of an
              xmlns  attribute,  prefix  will  be the empty list. The call to the start and end element handlers
              occur between the calls to the start and end namespace declaration handlers.

        -commentcommand  script

              Specifies a Tcl command to associate with comments in the document. The actual command consists of
              this option followed by one argument: the comment data.

              Example:

                     proc HandleComment {data} {
                         puts stderr "Comment ==> $data"
                     }

                     $parser configure -commentcommand HandleComment

                     $parser parse {<test><!-- this is <obviously> a comment --></test>}

              This would result in the following command being invoked:

                     HandleComment { this is <obviously> a comment }

        -notstandalonecommand  script

              This  Tcl  command  is  called,  if the document is not standalone (it has an external subset or a
              reference to a parameter entity, but does  not  have  standalone="yes").  It  is  called  with  no
              additional arguments.

        -startcdatasectioncommand  script

              Specifies  a  Tcl  command  to  associate with the start of a CDATA section.  It is called with no
              additional arguments.

        -endcdatasectioncommand  script

              Specifies a Tcl command to associate with the end of a  CDATA  section.   It  is  called  with  no
              additional arguments.

        -elementdeclcommand  script

              Specifies  a  Tcl  command  to associate with element declarations. The actual command consists of
              this option followed by two arguments: the name of the element and the content model. The  content
              model  arg  is  a  tcl list of four elements. The first list element specifies the type of the XML
              element; the six different possible types are reported  as  "MIXED",  "NAME",  "EMPTY",  "CHOICE",
              "SEQ"  or "ANY". The second list element reports the quantifier to the content model in XML Syntax
              ("?", "*" or "+") or is the empty list. If the type is "MIXED", then the quantifier will be  "{}",
              indicating  an  PCDATA  only element, or "*", with the allowed elements to intermix with PCDATA as
              tcl list as the fourth argument. If the type is "NAME", the name is the third arg;  otherwise  the
              third  argument  is  the  empty  list.  If  the type is "CHOICE" or "SEQ" the fourth argument will
              contain a list of content models build like this one. The "EMPTY", "ANY", and "MIXED"  types  will
              only occur at top level.

              Examples:

                     proc elDeclHandler {name content} {
                          puts "$name $content"
                     }

                     set parser [expat -elementdeclcommand elDeclHandler]
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test [
                     <!ELEMENT test (#PCDATA)>
                     ]>
                     <test>foo</test>}

              This would result in the following command being invoked:

                     test {MIXED {} {} {}}

                     $parser reset
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test [
                     <!ELEMENT test (a|b)>
                     ]>
                     <test><a/></test>}

              This would result in the following command being invoked:

                     elDeclHandler test {CHOICE {} {} {{NAME {} a {}} {NAME {} b {}}}}

        -attlistdeclcommand  script

              Specifies  a  Tcl  command  to associate with attlist declarations. The actual command consists of
              this option followed by five arguments.  The Attlist declaration  handler  is  called  for  *each*
              attribute.  So  a  single  Attlist  declaration  with  multiple  attributes declared will generate
              multiple calls to this handler. The arguments are the element name this attribute belongs to,  the
              name  of the attribute, the type of the attribute, the default value (may be the empty list) and a
              required flag. If this flag is true and the default value is not the empty list, then  this  is  a
              "#FIXED" default.

              Example:

                     proc attlistHandler {elname name type default isRequired} {
                         puts "$elname $name $type $default $isRequired"
                     }

                     set parser [expat -attlistdeclcommand attlistHandler]
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test [
                     <!ELEMENT test EMPTY>
                     <!ATTLIST test
                               id      ID      #REQUIRED
                               name    CDATA   #IMPLIED>
                     ]>
                     <test/>}

              This would result in the following commands being invoked:

                     attlistHandler test id ID {} 1
                     attlistHandler test name CDATA {} 0

        -startdoctypedeclcommand  script

              Specifies  a  Tcl  command to associate with the start of the DOCTYPE declaration. This command is
              called before any DTD or internal subset is parsed.  The actual command consists  of  this  option
              followed  by  four arguments: the doctype name, the system identifier, the public identifier and a
              boolean, that shows if the DOCTYPE has an internal subset.

        -enddoctypedeclcommand  script

              Specifies a Tcl command to associate with the end of the  DOCTYPE  declaration.  This  command  is
              called after processing any external subset.  It is called with no additional arguments.

        -paramentityparsing  never|notstandalone|always

              "never"  disables  expansion  of  parameter  entities, "always" expands always and "notstandalone"
              only, if the document isn't "standalone='no'". The default ist "never"

        -entitydeclcommand  script

              Specifies a Tcl command to associate with any entity declaration. The actual command  consists  of
              this  option  followed  by  seven  arguments:  the  entity  name,  a boolean identifying parameter
              entities, the value of the entity, the base uri, the system identifier, the public identifier  and
              the  notation  name. According to the type of entity declaration some of this arguments may be the
              empty list.

        -ignorewhitecdata  boolean

              If this flag is set, element content which  contain  only  whitespaces  isn't  reported  with  the
              -characterdatacommand.

        -ignorewhitespace  boolean
              Another name for  -ignorewhitecdata; see there.

        -handlerset  name

              This  option  sets  the  Tcl  handler  set  scope for the configure options. Any option value pair
              following this option in the same call to the parser are modifying the named Tcl handler  set.  If
              you don't use this option, you are modifying the default Tcl handler set, named "default".

        -noexpand  boolean

              Normally,  the parser will try to expand references to entities defined in the internal subset. If
              this option is set to a true value this entities are not expanded, but reported  literal  via  the
              default  handler.  Warning:  If  you set this option to true and doesn't install a default handler
              (with the -defaultcommand option) for every handler set of the parser all  internal  entities  are
              silent lost for the handler sets without a default handler.

       -useForeignDTD  <boolen>
              If  <boolen>  is  true and the document does not have an external subset, the parser will call the
              -externalentitycommand script with empty values for the  systemId  and  publicID  arguments.  This
              option  must  be  set,  before  the  first piece of data is parsed. Setting this option, after the
              parsing has started has no effect. The default is not  to  use  a  foreign  DTD.  The  default  is
              restored,  after  resetting the parser. Pleace notice, that a -paramentityparsing value of "never"
              (which is the default) suppresses any call to the -externalentitycommand  script.  Pleace  notice,
              that,  if  the  document  also  doesn't  have an internal subset, the -startdoctypedeclcommand and
              enddoctypedeclcommand scripts, if set, are not called.

 COMMAND METHODS
       parser configure option value ?option value?

              Sets configuration options for the parser. Every command option, except -namespace can be  set  or
              modified with this method.

       parser cget ?-handlerset name? option

              Return the current configuration value option for the parser.

              If the -handlerset option is used, the configuration for the named handler set is returned.

       parser currentmarkup

              Returns  the  current  markup  as  found in the XML, if called from within one of its markup event
              handler     script     (-elementstartcommand,     -elementendcommand,     -commentcommand      and
              -processinginstructioncommand). Otherwise it return the empty string.

       parser delete

              Deletes  the  parser  and  the  parser  command. A parser cannot be deleted from within one of its
              handler callbacks (neither directly nor indirectly) and will raise a tcl error in this case.

       parser free

              Another name to call the method delete, see there.

       parser                                                                                                get
       -specifiedattributecount|-idattributeindex|-currentbytecount|-currentlinenumber|-currentcolumnnumber|-currentbyteindex

              -specifiedattributecount

                     Returns  the  number  of  the  attribute/value  pairs  passed   in   last   call   to   the
                     elementstartcommand  that  were  specified  in  the  start-tag  rather than defaulted. Each
                     attribute/value pair counts as 2; thus this corresponds to an index into the attribute list
                     passed to the elementstartcommand.

              -idattributeindex

                     Returns  the  index of the ID attribute passed in the last call to XML_StartElementHandler,
                     or -1 if there is no ID attribute.  Each  attribute/value  pair  counts  as  2;  thus  this
                     corresponds to an index into the attributes list passed to the elementstartcommand.

              -currentbytecount

                     Return  the number of bytes in the current event.  Returns 0 if the event is in an internal
                     entity.

              -currentlinenumber

                     Returns the line number of the current parse location.

              -currentcolumnnumber

                     Returns the column number of the current parse location.

              -currentbyteindex

                     Returns the byte index of the current parse location.

              Only one value may be requested at a time.

       parser parse data

              Parses the XML string data. The event callback scripts will be called, as there triggering  events
              happens.  This  method  cannot be used from within a callback (neither directly nor indirectly) of
              the parser to be used and will raise an error in this case.

       parser parsechannel channelID

              Reads the XML data out of the tcl channel channelID (starting  at  the  current  access  position,
              without  any  seek)  up to the end of file condition and parses that data. The channel encoding is
              respected. Use the helper proc tDOM::xmlOpenFile out of the tDOM script library to open a file, if
              you  want  to use this method. This method cannot be used from within a callback (neither directly
              nor indirectly) of the parser to be used and will raise an error in this case.

       parser parsefile filename

              Reads the XML data directly out of the file with the filename filename and parses that data.  This
              is  done  with  low  level file operations. The XML data must be in US-ASCII, ISO-8859-1, UTF-8 or
              UTF-16 encoding. If applicable, this is the fastest way, to parse XML data. This method cannot  be
              used  from  within  a callback (neither directly nor indirectly) of the parser to be used and will
              raise an error in this case.

       parser reset

              Resets the parser in preparation for parsing another document.  A  parser  cannot  be  reset  from
              within  one  of its handler callbacks (neither directly nor indirectly) and will raise a tcl error
              in this cases.

Callback Command Return Codes

       A  script  invoked  for  any  of  the   parser   callback   commands,   such   as   -elementstartcommand,
       -elementendcommand,  etc,  may  return  an  error  code  other than "ok" or "error". All callbacks may in
       addition return "break" or "continue".

       If a callback script returns an "error" error code then processing of the document is terminated and  the
       error is propagated in the usual fashion.

       If a callback script returns a "break" error code then all further processing of every handler script out
       of this Tcl handler set is suppressed for the further parsing. This does not influence any other  handler
       set.

       If  a  callback  script  returns  a "continue" error code then processing of the current element, and its
       children, ceases for every handler script out of this Tcl handler set and processing continues  with  the
       next (sibling) element. This does not influence any other handler set.

       If  a  callback script returns a "return" error code then parsing is canceled altogether, but no error is
       raised.

SEE ALSO

       expatapi, tdom

KEYWORDS

       SAX, push, pushparser