Provided by: tdom_0.9.3-1build1_amd64 bug

NAME

       expat - Creates an instance of an expat parser object

SYNOPSIS

       package require tdom

       expat ?parsername? ?-namespace? ?arg arg ..

       xml::parser ?parsername? ?-namespace? ?arg arg ..
_________________________________________________________________

DESCRIPTION

       The  parser  created  with  expat  or xml::parser (which is just another name for the same
       command in an own namespace) are able to parse any kind of well-formed  XML.  The  parsers
       are  stream  oriented  XML  parser.  This means that you register handler scripts with the
       parser prior to starting the parse. These handler  scripts  are  called  when  the  parser
       discovers  the  associated  structures  in  the  document being parsed.  A start tag is an
       example of the kind of structures for which you may register a handler script.

       The parsers always check for XML well-formedness of the input (and report  error,  if  the
       input  isn't  well-formed).  They parse the internal DTD and, at request, external DTD and
       external entities, if you resolve  the  identifier  of  the  external  entities  with  the
       -externalentitycommand script (see there). If you use the -validateCmd option (see there),
       the input is additionally validated.

       Additionly, the Tcl extension code that implements this command provides an API for adding
       C  level  coded handlers. Up to now, there exists the parser extension command "tdom". The
       handler set installed by this extension build an in memory  "tDOM"  DOM  tree,  while  the
       parser is parsing the input.

       It  is  possible  to register an arbitrary amount of different handler scripts and C level
       handlers for most of the events. If the event occurs, they are called in turn.

COMMAND OPTIONS

       -namespace

              Enables namespace parsing. You must use this option while creating the parser  with
              the  expat or xml::parser command. You can't enable (nor disable) namespace parsing
              with <parserobj> configure ....

       -namespaceseparator  char

              This option has only effect, if used together with the option -namespace. If given,
              this  option  determines the character inserted between namespace URI and the local
              name, while reporting an XML element name to a handler script. The default  is  the
              character  ':'.  The  value must be a one-character string less or equal to \u00FF,
              preferably a 7-bit ASCII character or the empty string. If the value is  the  empty
              string (as well, as if the value is \x00) the namespace URI and the local name will
              be concatenated without any separator.

       -final  boolean

              This option indicates whether the document data next presented to the parse  method
              is  the  final  part  of  the  document. A value of "0" indicates that more data is
              expected. A value of "1" indicates that no more is expected.  The default value  is
              "1".

              If  this option is set to "0" then the parser will not report certain errors if the
              XML data is not well-formed upon end of input, such as unclosed or unbalanced start
              or  end  tags.  Instead some data may be saved by the parser until the next call to
              the parse method, thus delaying the reporting of some of the data.

              If this option is set to "1" then documents which are not well-formed upon  end  of
              input will generate an error.

       -validateCmd  <tdom schema cmd>

              This  option  expects  the  name of a tDOM schema command. If this option is given,
              then the input is also validated. If the schema command hasn't set a reportcmd then
              the first validation error will stop further parsing (as a well-formedness error).

       -baseurl  url

              Reports the base url of the document to the parser.

       -elementstartcommand  script

              Specifies  a  Tcl command to associate with the start tag of an element. The actual
              command consists of this option followed by at least  two  arguments:  the  element
              type name and the attribute list.

              The  attribute  list  is  a  Tcl  list consisting of name/value pairs, suitable for
              passing to the array set Tcl command.

              Example:

                     proc HandleStart {name attlist} {
                         puts stderr "Element start ==> $name has attributes $attlist"
                     }

                     $parser configure -elementstartcommand HandleStart

                     $parser parse {<test id="123"></test>}

              This would result in the following command being invoked:

                     HandleStart text {id 123}

       -elementendcommand  script

              Specifies a Tcl command to associate with the end tag of  an  element.  The  actual
              command consists of this option followed by at least one argument: the element type
              name. In addition, if the -reportempty option  is  set  then  the  command  may  be
              invoked  with  the  -empty  configuration option to indicate whether it is an empty
              element. See the description of the -reportempty option for an example.

              Example:

                     proc HandleEnd {name} {
                         puts stderr "Element end ==> $name"
                     }

                     $parser configure -elementendcommand HandleEnd

                     $parser parse {<test id="123"></test>}

              This would result in the following command being invoked:

                     HandleEnd test

       -characterdatacommand  script

              Specifies a Tcl command to associate with character data in the document, ie. text.
              The actual command consists of this option followed by one argument: the text.

              It  is  not  guaranteed  that character data will be passed to the application in a
              single call to this command. That is, the application should be prepared to receive
              multiple  invocations  of  this  callback  with no intervening callbacks from other
              features.

              Example:

                     proc HandleText {data} {
                         puts stderr "Character data ==> $data"
                     }

                     $parser configure -characterdatacommand HandleText

                     $parser parse {<test>this is a test document</test>}

              This would result in the following command being invoked:

                     HandleText {this is a test document}

       -processinginstructioncommand  script

              Specifies a Tcl command to associate with processing instructions in the  document.
              The actual command consists of this option followed by two arguments: the PI target
              and the PI data.

              Example:

                     proc HandlePI {target data} {
                         puts stderr "Processing instruction ==> $target $data"
                     }

                     $parser configure -processinginstructioncommand HandlePI

                     $parser parse {<test><?special this is a processing instruction?></test>}

              This would result in the following command being invoked:

                     HandlePI special {this is a processing instruction}

        -notationdeclcommand  script

              Specifies a Tcl command to associate with notation declaration in the document. The
              actual  command  consists  of  this option followed by four arguments: the notation
              name, the base uri of the document (this means, whatever was set  by  the  -baseurl
              option),  the  system  identifier  and  the public identifier. The notation name is
              never empty, the other arguments may be.

        -externalentitycommand  script

              Specifies a Tcl command to associate with references to external  entities  in  the
              document.  The  actual command consists of this option followed by three arguments:
              the base uri, the system identifier of the entity and the public identifier of  the
              entity. The base uri and the public identifier may be the empty list.

              This  handler  script  has  to  return a tcl list consisting of three elements. The
              first element of this list signals, how the external  entity  is  returned  to  the
              processor.  At  the  moment,  the  three  allowed types are "string", "channel" and
              "filename". The second element of the list has to be the (absolute) base URI of the
              external  entity  to be parsed.  The third element of the list are data, either the
              already read data out of the  external  entity  as  string  in  the  case  of  type
              "string",  or the name of a tcl channel, in the case of type "channel", or the path
              to the external entity to be read in case of type "filename". Behind the scene, the
              external entity referenced by the returned Tcl channel, string or file name will be
              parsed with an expat external entity parser with the same handler sets as the  main
              parser.  If parsing of the external entity fails, the whole parsing is stopped with
              an error message. If a Tcl command registered as externalentitycommand  isn't  able
              to  resolve  an external entity it is allowed to return TCL_CONTINUE. In this case,
              the  wrapper  give  the  next  registered  externalentitycommand  a  try.   If   no
              externalentitycommand  is  able to handle the external entity parsing stops with an
              error.

              Example:

                     proc externalEntityRefHandler {base systemId publicId} {
                         if {![regexp {^[a-zA-Z]+:/} $systemId]}  {
                             regsub {^[a-zA-Z]+:} $base {} base
                             set basedir [file dirname $base]
                             set systemId "[set basedir]/[set systemId]"
                         } else {
                             regsub {^[a-zA-Z]+:} $systemId systemId
                         }
                         if {[catch {set fd [open $systemId]}]} {
                             return -code error \
                                     -errorinfo "Failed to open external entity $systemId"
                         }
                         return [list channel $systemId $fd]
                     }

                     set parser [expat -externalentitycommand externalEntityRefHandler \
                                       -baseurl "file:///local/doc/doc.xml" \
                                       -paramentityparsing notstandalone]
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test SYSTEM "test.dtd">
                     <test/>}

              This would result in the following command being invoked:

                     externalEntityRefHandler file:///local/doc/doc.xml test.dtd {}

              External entities are only tried to resolve via this handler script, if  necessary.
              This   means,   external   parameter   entities  triggers  this  handler  only,  if
              -paramentityparsing is used with argument "always"  or  if  -paramentityparsing  is
              used with argument "notstandalone" and the document isn't marked as standalone.

        -unknownencodingcommand  script

              Not implemented at Tcl level.

       -startnamespacedeclcommand  script

              Specifies  a Tcl command to associate with start scope of namespace declarations in
              the document. The actual command consists of this option followed by two arguments:
              the  namespace prefix and the namespace URI. For an xmlns attribute, prefix will be
              the empty list.  For an xmlns="" attribute, uri will be the empty list. The call to
              the  start  and  end  element handlers occur between the calls to the start and end
              namespace declaration handlers.

        -endnamespacedeclcommand  script

              Specifies a Tcl command to associate with end scope of  namespace  declarations  in
              the  document. The actual command consists of this option followed by the namespace
              prefix as argument. In case of an xmlns attribute, prefix will be the  empty  list.
              The call to the start and end element handlers occur between the calls to the start
              and end namespace declaration handlers.

        -commentcommand  script

              Specifies a Tcl command to associate with comments  in  the  document.  The  actual
              command consists of this option followed by one argument: the comment data.

              Example:

                     proc HandleComment {data} {
                         puts stderr "Comment ==> $data"
                     }

                     $parser configure -commentcommand HandleComment

                     $parser parse {<test><!-- this is <obviously> a comment --></test>}

              This would result in the following command being invoked:

                     HandleComment { this is <obviously> a comment }

        -notstandalonecommand  script

              This  Tcl  command is called, if the document is not standalone (it has an external
              subset or a reference to a parameter entity, but does not  have  standalone="yes").
              It is called with no additional arguments.

        -startcdatasectioncommand  script

              Specifies  a  Tcl  command  to  associate with the start of a CDATA section.  It is
              called with no additional arguments.

        -endcdatasectioncommand  script

              Specifies a Tcl command to associate with the end of a CDATA section.  It is called
              with no additional arguments.

        -elementdeclcommand  script

              Specifies  a Tcl command to associate with element declarations. The actual command
              consists of this option followed by two arguments: the name of the element and  the
              content model. The content model arg is a tcl list of four elements. The first list
              element specifies the type of the XML element; the six different possible types are
              reported  as  "MIXED",  "NAME",  "EMPTY", "CHOICE", "SEQ" or "ANY". The second list
              element reports the quantifier to the content model in XML Syntax ("?", "*" or "+")
              or  is  the  empty  list. If the type is "MIXED", then the quantifier will be "{}",
              indicating an PCDATA only element, or "*", with the allowed  elements  to  intermix
              with  PCDATA as tcl list as the fourth argument. If the type is "NAME", the name is
              the third arg; otherwise the third argument is the  empty  list.  If  the  type  is
              "CHOICE"  or  "SEQ" the fourth argument will contain a list of content models build
              like this one. The "EMPTY", "ANY", and "MIXED" types will only occur at top level.

              Examples:

                     proc elDeclHandler {name content} {
                          puts "$name $content"
                     }

                     set parser [expat -elementdeclcommand elDeclHandler]
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test [
                     <!ELEMENT test (#PCDATA)>
                     ]>
                     <test>foo</test>}

              This would result in the following command being invoked:

                     test {MIXED {} {} {}}

                     $parser reset
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test [
                     <!ELEMENT test (a|b)>
                     ]>
                     <test><a/></test>}

              This would result in the following command being invoked:

                     elDeclHandler test {CHOICE {} {} {{NAME {} a {}} {NAME {} b {}}}}

        -attlistdeclcommand  script

              Specifies a Tcl command to associate with attlist declarations. The actual  command
              consists  of  this  option  followed  by  five  arguments.  The Attlist declaration
              handler is called for *each*  attribute.  So  a  single  Attlist  declaration  with
              multiple  attributes  declared  will  generate  multiple calls to this handler. The
              arguments are the  element  name  this  attribute  belongs  to,  the  name  of  the
              attribute, the type of the attribute, the default value (may be the empty list) and
              a required flag. If this flag is true and the default value is not the empty  list,
              then this is a "#FIXED" default.

              Example:

                     proc attlistHandler {elname name type default isRequired} {
                         puts "$elname $name $type $default $isRequired"
                     }

                     set parser [expat -attlistdeclcommand attlistHandler]
                     $parser parse {<?xml version='1.0'?>
                     <!DOCTYPE test [
                     <!ELEMENT test EMPTY>
                     <!ATTLIST test
                               id      ID      #REQUIRED
                               name    CDATA   #IMPLIED>
                     ]>
                     <test/>}

              This would result in the following commands being invoked:

                     attlistHandler test id ID {} 1
                     attlistHandler test name CDATA {} 0

        -startdoctypedeclcommand  script

              Specifies  a  Tcl  command  to associate with the start of the DOCTYPE declaration.
              This command is called before any DTD or internal subset  is  parsed.   The  actual
              command  consists  of this option followed by four arguments: the doctype name, the
              system identifier, the public identifier and a boolean, that shows if  the  DOCTYPE
              has an internal subset.

        -enddoctypedeclcommand  script

              Specifies  a Tcl command to associate with the end of the DOCTYPE declaration. This
              command is called after processing any external  subset.   It  is  called  with  no
              additional arguments.

        -paramentityparsing  never|notstandalone|always

              "never"  disables  expansion  of  parameter  entities,  "always" expands always and
              "notstandalone" only, if the document  isn't  "standalone='no'".  The  default  ist
              "never"

        -entitydeclcommand  script

              Specifies  a  Tcl  command  to  associate  with  any entity declaration. The actual
              command consists of this option followed by seven arguments:  the  entity  name,  a
              boolean  identifying parameter entities, the value of the entity, the base uri, the
              system identifier, the public identifier and the notation name.  According  to  the
              type of entity declaration some of this arguments may be the empty list.

        -ignorewhitecdata  boolean

              If  this flag is set, element content which contain only whitespaces isn't reported
              with the -characterdatacommand.

        -ignorewhitespace  boolean
              Another name for  -ignorewhitecdata; see there.

        -handlerset  name

              This option sets the Tcl handler set scope for the configure  options.  Any  option
              value  pair  following this option in the same call to the parser are modifying the
              named Tcl handler set. If you don't use this option, you are modifying the  default
              Tcl handler set, named "default".

        -noexpand  boolean

              Normally,  the  parser  will  try  to  expand references to entities defined in the
              internal subset. If this option is set to  a  true  value  this  entities  are  not
              expanded,  but  reported  literal via the default handler. Warning: If you set this
              option to true and doesn't install a  default  handler  (with  the  -defaultcommand
              option)  for  every handler set of the parser all internal entities are silent lost
              for the handler sets without a default handler.

       -useForeignDTD  <boolen>
              If <boolen> is true and the document does not have an external subset,  the  parser
              will  call the -externalentitycommand script with empty values for the systemId and
              publicID arguments. This option must be set, before the  first  piece  of  data  is
              parsed.  Setting  this  option,  after  the  parsing has started has no effect. The
              default is not to use a foreign DTD. The default is restored, after  resetting  the
              parser.  Pleace  notice,  that a -paramentityparsing value of "never" (which is the
              default) suppresses any call to the -externalentitycommand script.  Pleace  notice,
              that,   if   the   document   also   doesn't   have   an   internal   subset,   the
              -startdoctypedeclcommand and enddoctypedeclcommand scripts, if set, are not called.

       -billionLaughsAttackProtectionMaximumAmplification  <float>
              <URL:                          https://en.wikipedia.org/wiki/Billion_laughs_attack>
              ⟨https://en.wikipedia.org/wiki/Billion_laughs_attack⟩  This  option  together  with
              -billionLaughsAttackProtectionActivationThreshold gives  control  over  the  parser
              limits that protects against billion laugh attacks ().  This option expects a float
              >= 1.0 as argument. You should never need to use this option, because  the  default
              value  (100.0)  should  work  for any real data.  If you ever need to increase this
              value for non-attack payload, please report.

       -billionLaughsAttackProtectionActivationThreshold  <long>
              <URL:                          https://en.wikipedia.org/wiki/Billion_laughs_attack>
              ⟨https://en.wikipedia.org/wiki/Billion_laughs_attack⟩  This  option  together  with
              -billionLaughsAttackProtectionMaximumAmplification gives control  over  the  parser
              limits  that  protects  against  billion  laugh  attacks ().  This option expects a
              positiv integer as argument. You should never need to use this option, because  the
              default  value  (8388608)  should  work  for  any  real  data.  If you ever need to
              increase this value for non-attack payload, please report.

 COMMAND METHODS
       parser configure option value ?option value?

              Sets configuration options for the parser. Every command option, except  -namespace
              can be set or modified with this method.

       parser cget ?-handlerset name? option

              Return the current configuration value option for the parser.

              If  the  -handlerset option is used, the configuration for the named handler set is
              returned.

       parser currentmarkup

              Returns the current markup as found in the XML, if called from within  one  of  its
              markup    event    handler    script   (-elementstartcommand,   -elementendcommand,
              -commentcommand and -processinginstructioncommand). Otherwise it return  the  empty
              string.

       parser delete

              Deletes  the  parser and the parser command. A parser cannot be deleted from within
              one of its handler callbacks (neither directly nor indirectly) and will raise a tcl
              error in this case.

       parser free

              Another name to call the method delete, see there.

       parser                                                                                 get
       -specifiedattributecount|-idattributeindex|-currentbytecount|-currentlinenumber|-currentcolumnnumber|-currentbyteindex

              -specifiedattributecount

                     Returns  the  number of the attribute/value pairs passed in last call to the
                     elementstartcommand  that  were  specified  in  the  start-tag  rather  than
                     defaulted.  Each  attribute/value pair counts as 2; thus this corresponds to
                     an index into the attribute list passed to the elementstartcommand.

              -idattributeindex

                     Returns  the  index  of  the  ID  attribute  passed  in  the  last  call  to
                     XML_StartElementHandler,   or   -1  if  there  is  no  ID  attribute.   Each
                     attribute/value pair counts as 2; thus this corresponds to an index into the
                     attributes list passed to the elementstartcommand.

              -currentbytecount

                     Return  the number of bytes in the current event.  Returns 0 if the event is
                     in an internal entity.

              -currentlinenumber

                     Returns the line number of the current parse location.

              -currentcolumnnumber

                     Returns the column number of the current parse location.

              -currentbyteindex

                     Returns the byte index of the current parse location.

              Only one value may be requested at a time.

       parser parse data

              Parses the XML string data. The event callback scripts will  be  called,  as  there
              triggering  events  happens.  This  method  cannot  be  used from within a callback
              (neither directly nor indirectly) of the parser to be used and will raise an  error
              in this case.

       parser parsechannel channelID

              Reads the XML data out of the tcl channel channelID (starting at the current access
              position, without any seek) up to the end of file condition and parses  that  data.
              The channel encoding is respected. Use the helper proc tDOM::xmlOpenFile out of the
              tDOM script library to open a file, if you want to use  this  method.  This  method
              cannot  be  used  from  within  a callback (neither directly nor indirectly) of the
              parser to be used and will raise an error in this case.

       parser parsefile filename

              Reads the XML data directly out of the file with the filename filename  and  parses
              that data. This is done with low level file operations. The XML data must be in US-
              ASCII, ISO-8859-1, UTF-8 or UTF-16 encoding. If applicable,  this  is  the  fastest
              way,  to parse XML data. This method cannot be used from within a callback (neither
              directly nor indirectly) of the parser to be used and will raise an error  in  this
              case.

       parser reset

              Resets  the  parser in preparation for parsing another document. A parser cannot be
              reset from within one of its handler callbacks (neither  directly  nor  indirectly)
              and will raise a tcl error in this cases.

Callback Command Return Codes

       A  script  invoked  for any of the parser callback commands, such as -elementstartcommand,
       -elementendcommand, etc, may return  an  error  code  other  than  "ok"  or  "error".  All
       callbacks may in addition return "break" or "continue".

       If  a  callback  script  returns  an "error" error code then processing of the document is
       terminated and the error is propagated in the usual fashion.

       If a callback script returns a "break" error code then all  further  processing  of  every
       handler  script  out  of  this Tcl handler set is suppressed for the further parsing. This
       does not influence any other handler set.

       If a callback script returns a "continue"  error  code  then  processing  of  the  current
       element, and its children, ceases for every handler script out of this Tcl handler set and
       processing continues with the next (sibling) element. This does not  influence  any  other
       handler set.

       If  a  callback  script returns a "return" error code then parsing is canceled altogether,
       but no error is raised.

SEE ALSO

       expatapi, tdom

KEYWORDS

       SAX, push, pushparser