Ubuntu Manpage: dom - Create an in-memory DOM tree from XML

NAME

       dom - Create an in-memory DOM tree from XML

SYNOPSIS

       package require tdom

       dom method ?arg arg ...?
_________________________________________________________________

DESCRIPTION

This command provides the creation of DOM trees in memory. In the usual case a string containing a XML
information is parsed and converted into a DOM tree. Other possible parse input may be HTML or JSON. The
method indicates a specific subcommand.

The valid methods are:

dom parse ?options? ?data?
Parses the XML information and builds up the DOM tree in memory providing a Tcl object command to
this DOM document object. Example:

dom parse $xml doc
$doc documentElement root

parses the XML in the variable xml, creates the DOM tree in memory, make a reference to the
document object, visible in Tcl as a document object command, and assigns this new object name to
the variable doc. When doc gets freed, the DOM tree and the associated Tcl command object
(document and all node objects) are freed automatically.

set document [dom parse $xml]
set root [$document documentElement]

parses the XML in the variable xml, creates the DOM tree in memory, make a reference to the
document object, visible in Tcl as a document object command, and returns this new object name,
which is then stored in document. To free the underlying DOM tree and the associative Tcl object
commands (document + nodes + fragment nodes) the document object command has to be explicitly
deleted by:

$document delete

rename $document ""

The valid options are:

-simple
If -simple is specified, a simple but fast parser is used (conforms not fully to XML
recommendation). That should double parsing and DOM generation speed. The encoding of the
data is not transformed inside the parser. The simple parser does not respect any encoding
information in the XML declaration. It skips over the internal DTD subset and ignores any
information in it. Therefore it doesn't include defaulted attribute values into the tree,
even if the according attribute declaration is in the internal subset. It also doesn't
expand internal or external entity references other than the predefined entities and
character references.

-html If -html is specified, a fast HTML parser is used, which tries to even parse badly formed
HTML into a DOM tree.

-html5 This option is only available if tDOM was build with --enable-html5. Try the featureinfo
method if you need to know if this feature is build in. If -html5 is specified, the gumbo
lib html5 parser (https://github.com/google/gumbo-parser) is used to build the DOM tree.
This is, as far as it goes, XML namespace-aware. Since this probably isn't wanted by a lot
of users and adds only burden for no good in a lot of use cases -html5 can be combined with
-ignorexmlns, in which case all nodes and attributes in the DOM tree are not in an XML
namespace. All tag and attribute names in the DOM tree will be lower case, even for foreign
elements not in the xhtml, svg or mathml namespace. The DOM tree may include nodes, that
the parser inserted because they are implied by the context (as <head>, <tbody>, etc.).

-json If -json is specified, the data is expected to be a valid JSON string (according to RFC
7159). The command returns an ordinary DOM document with nesting token inside the JSON data
translated into tree hierarchy. If a JSON array value is itself an object or array then
container element nodes named (in a default build) arraycontainer or objectcontainer,
respectively, are inserted into the tree. The JSON serialization of this document (with the
domDoc method asJSON) is the same JSON information as the data, preserving JSON datatypes,
allowing non-unique member names of objects while preserving their order and the full range
of JSON string values. JSON datatype handling is done with an additional property
"sticking" at the doc and tree nodes. This property isn't contained in an XML serialization
of the document. If you need to store the JSON data represented by a document, store the
JSON serialization and parse it back from there. Apart from this JSON type information the
returned doc command or handle is an ordinary DOM doc, which may be investigated or
modified with the full range of the doc and node methods. Please note that the element node
names and the text node values within the tree may be outside of what the appropriate XML
productions allow.

-jsonroot <document element name>
If given makes the given element name the document element of the resulting doc. The parsed
content of the JSON string will be the childs of this document element node.

-jsonmaxnesting integer
This option only has effect if used together with the -json option. The current
implementation uses a recursive descent JSON parser. In order to avoid using excess stack
space, any JSON input that has more than a certain levels of nesting is considered invalid.
The default maximum nesting is 2000. The option -jsonmaxnesting allows the user to adjust
that.

-- The option -- marks the end of options. While respected in general this option is only
needed in case of parsing JSON data, which may start with a "-".

-keepEmpties
If -keepEmpties is specified then text nodes which contain only whitespaces will be part of
the resulting DOM tree. In default case (-keepEmpties not given) those empty text nodes are
removed at parsing time.

-keepCDATA
If -keepCDATA is specified then CDATA sections aren't added to the tree as text nodes (and,
if necessary, combined with sibling text nodes into one text node) as without this option
but are added as CDATA_SECTION_NODEs to the tree. Please note that the resulting tree isn't
prepared for XPath selects or to be the source or the stylesheet of an XSLT transformation.
If not combined with -keepEmpties only not whitespace only CDATA sections will be added to
the resulting DOM tree.

-channel <channel-ID>
If -channel <channel-ID> is specified, the input to be parsed is read from the specified
channel. The encoding setting of the channel (via fconfigure -encoding) is respected, ie
the data read from the channel are converted to UTF-8 according to the encoding settings
before the data is parsed.

-baseurl <baseURI>
If -baseurl <baseURI> is specified, the baseURI is used as the base URI of the document.
External entities references in the document are resolved relative to this base URI. This
base URI is also stored within the DOM tree.

-feedbackAfter <#bytes>
If -feedbackAfter <#bytes> is specified, the tcl command given by -feedbackcmd is evaluated
at the first element start within the document (or an external entity) after the start of
the document or external entity or the last such call after #bytes. For backward
compatibility if no -feedbackcmd is given but there is a tcl proc named
::dom::domParseFeedback this proc is used as -feedbackcmd. If there isn't such a proc and
-feedbackAfter is used it is an error to not also use -feedbackcmd. If the called script
raises error, then parsing will be aborted, the dom parse call returns error, with the
script error msg as error msg. If the called script return -code break, the parsing will
abort and the dom parse call will return the empty string.

-feedbackcmd <script>
If -feedbackcmd <script> is specified, the script script is evaluated at the first element
start within the document (or an external entity) after the start of the document or
external entity or the last such call after #bytes value given by the -feedbackAfter
option. If -feedbackAfter isn't given, using this option doesn't has any effect. If the
called script raises error, then parsing will be aborted, the dom parse call returns error,
with the script error msg as error msg. If the called script return -code break, the
parsing will abort and the dom parse call will return the empty string.

-externalentitycommand <script>
If -externalentitycommand <script> is specified, the specified tcl script is called to
resolve any external entities of the document. The actual evaluated command consists of
this option followed by three arguments: the base uri, the system identifier of the entity
and the public identifier of the entity. The base uri and the public identifier may be the
empty list. The script has to return a tcl list consisting of three elements. The first
element of this list signals how the external entity is returned to the processor.
Currently the two allowed types are "string" and "channel". The second element of the list
has to be the (absolute) base URI of the external entity to be parsed. The third element
of the list are data, either the already read data out of the external entity as string in
the case of type "string", or the name of a tcl channel, in the case of type "channel".
Note that if the script returns a tcl channel, it will not be closed by the processor. It
must be closed separately if it is no longer needed.

-useForeignDTD <boolean>
If <boolean> is true and the document does not have an external subset, the parser will
call the -externalentitycommand script with empty values for the systemId and publicID
arguments. Please note that if the document also doesn't have an internal subset, the
-startdoctypedeclcommand and -enddoctypedeclcommand scripts, if set, are not called. The
-useForeignDTD respects

-paramentityparsing <always|never|notstandalone>
The -paramentityparsing option controls, if the parser tries to resolve the external
entities (including the external DTD subset) of the document while building the DOM tree.
-paramentityparsing requires an argument, which must be either "always", "never", or
"notstandalone". The value "always" means that the parser tries to resolves (recursively)
all external entities of the XML source. This is the default in case -paramentityparsing is
omitted. The value "never" means that only the given XML source is parsed and no external
entity (including the external subset) will be resolved and parsed. The value
"notstandalone" means, that all external entities will be resolved and parsed, with the
exception of documents, which explicitly states standalone="yes" in their XML declaration.

-ignorexmlns
It is recommended, that you only use this option with the -html5 option. If this option is
given, no node within the created DOM tree will be internally marked as placed into an XML
Namespace, even if there is a default namespace in scope for un-prefixed elements or even
if the element has a defined namespace prefix. One consequence is that XPath node
expressions on such a DOM tree doesn't work as may be expected. Prefixed element nodes
can't be selected naively and element nodes without prefix will be seen by XPath
expressions as if they are not in any namespace (no matter if they are in fact should be in
a default namespace). If you need to inject prefixed node names into an XPath expression
use the '%' syntax described in the documentation of the of the

domNode
command method >selectNodes.

dom createDocument docElemName ?objVar?
Creates a new DOM document object with one element node with node name docElemName. The objVar
controls the memory handling as explained above.

dom createDocumentNS uri docElemName ?objVar?
Creates a new DOM document object with one element node with node name docElemName. Uri gives the
namespace of the document element to create. The objVar controls the memory handling as explained
above.

dom createDocumentNode ?objVar?
Creates a new 'empty' DOM document object without any element node. objVar controls the memory
handling as explained above.

dom createNodeCmd ?-returnNodeCmd? ?-tagName name? ?-jsonType jsonType? ?-namespace URI?
(element|comment|text|cdata|pi)Node commandName
This method creates Tcl commands, which in turn create tDOM nodes. Tcl commands created by this
command are only available inside a script given to the domNode methods appendFromScript or
insertBeforeFromScript. If a command created with createNodeCmd is invoked in any other context,
it will return error. The created command commandName replaces any existing command or procedure
with that name. If the commandName includes any namespace qualifiers, it is created in the
specified namespace. The -tagName option is only allowed for the elementNode type. The -jsonType
option is only allowed for elementNode and textNode types.

If such command is invoked inside a script given as argument to the domNode method
appendFromScript or insertBeforeFromScript it creates a new node and appends this node at the end
of the child list of the invoking element node. If the option -returnNodeCmd was given, the
command returns the created node as Tcl command. If this option was omitted, the command returns
nothing. Each command creates always the same type of node. Which type of node is created by the
command is determined by the first argument to the createNodeCmd. The syntax of the created
command depends on the type of the node it creates.

If the command type to create is elementNode, the created command will create an element node, if
called. Without the -tagName option the tag name of the created node is commandName without
namespace qualifiers. If the -tagName option was given then the created command the created
elements will have this tag name. If the -jsonType option was given then the created node elements
will have the given JSON type. If the -namespace option is given the created element node will be
XML namespaced and in the namespace given by the option. The element name will be literal as given
either by the command name or the -tagname option, if that was given. An appropriate XML namespace
declaration will be automatically added, to bind the prefix (if the element name has one) or the
default namespace (if the element name hasn't a prefix) to the namespace if such a binding isn't
in scope.

The syntax of the created command is:

elementNodeCmd ?attributeName attributeValue ...? ?script?
elementNodeCmd ?-attributeName attributeValue ...? ?script?
elementNodeCmd name_value_list script

The command syntax allows three different ways to specify the attributes of the resulting element.
These could be specified with attributeName attributeValue argument pairs, in an "option style"
way with -attriubteName attributeValue argument pairs (the '-' character is only syntactical sugar
and will be stripped off) or as a Tcl list with elements interpreted as attribute name and the
corresponding attribute value. The attribute name elements in the list may have a leading '-'
character, which will be stripped off.

Every elementNodeCmd accepts an optional Tcl script as last argument. This script is evaluated as
recursive appendFromScript script with the node created by the elementNodeCmd as parent of all
nodes created by the script.

If the first argument of the method is textNode, the command will create a text node. If the
-jsonType option was given then the created text node will have that JSON type. The syntax of the
created command is:

textNodeCmd ?-disableOutputEscaping? data

If the optional flag -disableOutputEscaping is given, the escaping of the ampersand character (&)
and the left angle bracket (<) inside the data is disabled. You should use this flag carefully.

If the first argument of the method is commentNode or cdataNode the command will create an comment
node or CDATA section node. The syntax of the created command is:

nodeCmd data

If the first argument of the method is piNode, the command will create a processing instruction
node. The syntax of the created command is:

piNodeCmd target data

dom setStoreLineColumn ?boolean?
If switched on, the DOM nodes will contain line and column position information for the original
XML document after parsing. The default is not to store line and column position information.

dom setNameCheck ?boolean?
If NameCheck is true, every method which expects an XML Name, a full qualified name or a
processing instructing target will check, if the given string is valid according to its production
rule. For commands created with the createNodeCmd method to be used in the context of
appendFromScript the status of the flag at creation time decides. If NameCheck is true at creation
time, the command will check its arguments, otherwise not. The setNameCheck set this flag. It
returns the current NameCheck flag state. The default state for NameCheck is true.

dom setTextCheck ?boolean?
If TextCheck is true, every command which expects XML Chars, a comment, a CDATA section value or a
processing instructing value will check, if the given string is valid according to its production
rule. For commands created with the createNodeCmd method to be used in the context of
appendFromScript the status of the flag at creation time decides. If TextCheck is true at creation
time, the command will check its arguments, otherwise not.The setTextCheck method sets this flag.
It returns the current TextCheck flag state. The default state for TextCheck is true.

dom setObjectCommands ?(automatic|token|command)?
Controls if documents and nodes are created as tcl commands or as token to be used with the
domNode and domDoc commands. If the mode is 'automatic', then methods used at tcl commands will
create tcl commands and methods used at doc or node tokes will create tokens. If the mode is
'command' then always tcl commands will be created. If the mode is 'token', then always token will
be created. The method returns the current mode. This method is an experimental interface.

dom isName name
Returns 1 if name is a valid XML Name according to production 5 of the XML 1.0 recommendation.
This means that name is a valid XML element or attribute name. Otherwise it returns 0.

dom isPIName name
Returns 1 if name is a valid XML processing instruction target according to production 17 of the
XML 1.0 recommendation. Otherwise it returns 0.

dom isNCName name
Returns 1 if name is a valid NCName according to production 4 of the of the Namespaces in XML
recommendation. Otherwise it returns 0.

dom isQName name
Returns 1 if name is a valid QName according to production 6 of the of the Namespaces in XML
recommendation. Otherwise it returns 0.

dom isCharData string
Returns 1 if every character in string is a valid XML Char according to production 2 of the XML
1.0 recommendation. Otherwise it returns 0.

dom clearString string
Returns the string given as argument cleared out from any characters not allowed as XML parsed
character data.

dom isBMPCharData string
Returns 1 if every character in string is a valid XML Char with a Unicode code point within the
Basic Multilingual Plane (that means, that every character within the string is at most 3 bytes
long). Otherwise it returns 0.

dom isComment string
Returns 1 if string is a valid comment according to production 15 of the XML 1.0 recommendation.
Otherwise it returns 0.

dom isCDATA string
Returns 1 if string is valid according to production 20 of the XML 1.0 recommendation. Otherwise
it returns 0.

dom isPIValue string
Returns 1 if string is valid according to production 16 of the XML 1.0 recommendation. Otherwise
it returns 0.

dom featureinfo feature
This method provides information about the used build options and the expat version. The valid
values for the feature argument are:

expatversion
Returns the version of the underlyling expat version as string, something like
"exapt_2.1.0". This is what the expat API function XML_ExpatVersion() returns.

expatmajorversion
Returns the major version of the at build time used expat version as integer.

expatminorversion
Returns the minor version of the at build time used expat version as integer.

expatmicroversion
Returns the micro version of the at build time used expat version as integer.

dtd Returns as boolean if build with --enable-dtd.

ns Returns as boolean if build with --enable-ns.

unknown
Returns as boolean if build with --enable-unknown.

tdomalloc
Returns as boolean if build with --enable-tdomalloc.

lessns Returns as boolean if build with --enable-lessns.

TCL_UTF_MAX
Returns the TCL_UTF_MAX value of the tcl core, tDOM was build with as integer

html5 Returns as boolean, if build with --enable-html5.

versionhash
Returns the fossil repository version hash.

pullparser
Returns as boolean if the pullparser command is build in.

schema Returns as boolean if the tDOM schema features are build in.

KEYWORDS

       XML, DOM, document, node, parsing

Tcl                                                                                                    dom(3tcl)