oracular (3) XML::Easy::Text.3pm.gz

Provided by: libxml-easy-perl_0.011-4build1_amd64 bug

NAME

       XML::Easy::Text - XML parsing and serialisation

SYNOPSIS

           use XML::Easy::Text qw(
               xml10_read_content_object xml10_read_element
               xml10_read_document xml10_read_extparsedent_object);

           $content = xml10_read_content_object($text);
           $element = xml10_read_element($text);
           $element = xml10_read_document($text);
           $content = xml10_read_extparsedent_object($text);

           use XML::Easy::Text qw(
               xml10_write_content xml10_write_element
               xml10_write_document xml10_write_extparsedent);

           $text = xml10_write_content($content);
           $text = xml10_write_element($element);
           $text = xml10_write_document($element, "UTF-8");
           $text = xml10_write_extparsedent($content, "UTF-8");

DESCRIPTION

       This module supplies functions that parse and serialise XML data according to the XML 1.0 specification.

       This module is oriented towards the use of XML to represent data for interchange purposes, rather than
       the use of XML as markup of principally textual data.  It does not perform any schema processing, and
       does not interpret DTDs or any other kind of schema.  It adheres strictly to the XML specification, in
       all its awkward details, except for the aforementioned DTDs.

       XML data in memory is represented using a tree of XML::Easy::Content and XML::Easy::Element objects.
       Such a tree encapsulates all the structure and data content of an XML element or document, without any
       irrelevant detail resulting from the textual syntax.  These node trees are readily manipulated by the
       functions in XML::Easy::NodeBasics.

       The functions of this module are implemented in C for performance, with a pure Perl backup version (which
       has good performance compared to other pure Perl parsers) for systems that can't handle XS modules.

FUNCTIONS

       All functions "die" on error.

   Parsing
       These function take textual XML and extract the abstract XML content.  In the terminology of the XML
       specification, they constitute a non-validating processor: they check for well-formedness of the XML, but
       not for adherence of the content to any schema.

       The inputs (to be parsed) for these functions are always character strings.  XML text is frequently
       encoded using UTF-8, or some other Unicode encoding, so that it can contain characters from the full
       Unicode repertoire.  In that case, something must perform UTF-8 decoding (or decoding of some other
       character encoding) to convert the octets of a file to the characters on which these functions operate.
       A Perl I/O layer can do the job (see perlio), or it can be performed explicitly using the "decode"
       function in the Encode module.

       xml10_read_content_object(TEXT)
           TEXT must be a character string.  It is parsed against the content production of the XML 1.0 grammar;
           i.e., as a sequence of the kind of matter that can appear between the start-tag and end-tag of an
           element.  Returns a reference to an XML::Easy::Content object.

           Normally one would not want to use this function directly, but prefer the higher-level
           "xml10_read_document" function.  This function exists for the construction of custom XML parsers in
           situations that don't match the full XML grammar.

       xml10_read_content_twine(TEXT)
           Performs the same parsing job as "xml10_read_content_object", but returns the resulting content chunk
           in the form of twine (see "Twine" in XML::Easy::NodeBasics) rather than a content object.

           The returned array must not be subsequently modified.  If possible, it will be marked as read-only in
           order to prevent modification.

       xml10_read_content(TEXT)
           Deprecated alias for "xml10_read_content_twine".

       xml10_read_element(TEXT)
           TEXT must be a character string.  It is parsed against the element production of the XML 1.0 grammar;
           i.e., as an item bracketed by tags and containing content that may recursively include other
           elements.  Returns a reference to an XML::Easy::Element object.

           Normally one would not want to use this function directly, but prefer the higher-level
           "xml10_read_document" function.  This function exists for the construction of custom XML parsers in
           situations that don't match the full XML grammar.

       xml10_read_document(TEXT)
           TEXT must be a character string.  It is parsed against the document production of the XML 1.0
           grammar; i.e., as a root element (possibly containing subelements) optionally preceded and followed
           by non-content matter, possibly headed by an XML declaration.  (A document type declaration is not
           accepted; this module does not process schemata.)  Returns a reference to an XML::Easy::Element
           object which represents the root element.  Nothing is returned relating to the XML declaration or
           other non-content matter.

           This is the most likely function to use to process incoming XML data.  Beware that the encoding
           declaration in the XML declaration, if any, does not affect the interpretation of the input as a
           sequence of characters.

       xml10_read_extparsedent_object(TEXT)
           TEXT must be a character string.  It is parsed against the extParsedEnt production of the XML 1.0
           grammar; i.e., as a sequence of content (containing character data and subelements), possibly headed
           by a text declaration (which is similar to, but not the same as, an XML declaration).  Returns a
           reference to an XML::Easy::Content object.

           This is a relatively obscure part of the XML grammar, used when a subpart of a document is stored in
           a separate file.  You're more likely to require the "xml10_read_document" function.

       xml10_read_extparsedent_twine(TEXT)
           Performs the same parsing job as "xml10_read_extparsedent_object", but returns the resulting content
           chunk in the form of twine (see "Twine" in XML::Easy::NodeBasics) rather than a content object.

           The returned array must not be subsequently modified.  If possible, it will be marked as read-only in
           order to prevent modification.

       xml10_read_extparsedent(TEXT)
           Deprecated alias for "xml10_read_extparsedent_twine".

   Serialisation
       These function take abstract XML data and serialise it as textual XML.  They do not perform indentation,
       default attribute suppression, or any other schema-dependent processing.

       The outputs of these functions are always character strings.  XML text is frequently encoded using UTF-8,
       or some other Unicode encoding, so that it can contain characters from the full Unicode repertoire.  In
       that case, something must perform UTF-8 encoding (or encoding of some other character encoding) to
       convert the characters generated by these functions to the octets of a file.  A Perl I/O layer can do the
       job (see perlio), or it can be performed explicitly using the "encode" function in the Encode module.

       xml10_write_content(CONTENT)
           CONTENT must be a reference to either an XML::Easy::Content object or a twine array (see "Twine" in
           XML::Easy::NodeBasics).  The XML 1.0 textual representation of that content is returned.

       xml10_write_element(ELEMENT)
           ELEMENT must be a reference to an XML::Easy::Element object.  The XML 1.0 textual representation of
           that element is returned.

       xml10_write_document(ELEMENT[, ENCODING])
           ELEMENT must be a reference to an XML::Easy::Element object.  The XML 1.0 textual form of a document
           with that element as the root element is returned.  The document includes an XML declaration.  If
           ENCODING is supplied, it must be a valid character encoding name, and the XML declaration specifies
           it in an encoding declaration.  (The returned string consists of unencoded characters regardless of
           the encoding specified.)

       xml10_write_extparsedent(CONTENT[, ENCODING])
           CONTENT must be a reference to either an XML::Easy::Content object or a twine array (see "Twine" in
           XML::Easy::NodeBasics).  The XML 1.0 textual form of an external parsed entity encapsulating that
           content is returned.  If ENCODING is supplied, it must be a valid character encoding name, and the
           returned entity includes a text declaration that specifies the encoding name in an encoding
           declaration.  (The returned string consists of unencoded characters regardless of the encoding
           specified.)

SEE ALSO

       XML::Easy::NodeBasics, XML::Easy::Syntax, <http://www.w3.org/TR/REC-xml/>

AUTHOR

       Andrew Main (Zefram) <zefram@fysh.org>

       Copyright (C) 2008, 2009 PhotoBox Ltd

       Copyright (C) 2009, 2010, 2011, 2017 Andrew Main (Zefram) <zefram@fysh.org>

LICENSE

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl
       itself.