Ubuntu Manpage: XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()

Provided by: libxml-libxml-simple-perl_0.95-1_all

NAME

       XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()

INHERITANCE

        XML::LibXML::Simple
          is a Exporter

SYNOPSIS

         my $xml  = ...;  # filename, fh, string, or XML::LibXML-node

       Imperative:

         use XML::LibXML::Simple   qw(XMLin);
         my $data = XMLin $xml, %options;

       Or the Object Oriented way:

         use XML::LibXML::Simple   ();
         my $xs   = XML::LibXML::Simple->new(%options);
         my $data = $xs->XMLin($xml, %options);

DESCRIPTION

       This module is a blunt rewrite of XML::Simple (by Grant McLean) to use the XML::LibXML
       parser for XML structures, where the original uses plain Perl or SAX parsers.

METHODS

   Constructors
       XML::LibXML::Simple->new(%options)
           Instantiate an object, which can be used to call XMLin() on.  You can provide %options
           to this constructor (to be reused for each call to XMLin) and with each call of XMLin
           (to be used once)

           For descriptions of the %options see the "DETAILS" section of this manual page.

   Translators
       $obj->XMLin($xmldata, %options)
           For $xmldata and descriptions of the %options see the "DETAILS" section of this manual
           page.

FUNCTIONS

       The functions "XMLin" (exported implictly) and "xml_in" (exported on request) simply call
       "<XML::LibXML::Simple-"new->XMLin() >> with the provided parameters.

DETAILS

   Parameter $xmldata
       As first parameter to XMLin() must provide the XML message to be translated into a Perl
       structure.  Choose one of the following:

       A filename
           If the filename contains no directory components, "XMLin()" will look for the file in
           each directory in the SearchPath (see OPTIONS below) and in the current directory.
           eg:

             $data = XMLin('/etc/params.xml', %options);

           Note, the filename "-" (dash) can be used to parse from STDIN.

       undef
           If there is no XML specifier, "XMLin()" will check the script directory and each of
           the SearchPath directories for a file with the same name as the script but with the
           extension '.xml'.  Note: if you wish to specify options, you must specify the value
           'undef'.  eg:

             $data = XMLin(undef, ForceArray => 1);

       A string of XML
           A string containing XML (recognised by the presence of '<' and '>' characters) will be
           parsed directly.  eg:

             $data = XMLin('<opt username="bob" password="flurp" />', %options);

       An IO::Handle object
           In this case, XML::LibXML::Parser will read the XML data directly from the provided
           file.

             $fh  = IO::File->new('/etc/params.xml');
             $data = XMLin($fh, %options);

       An XML::LibXML::Document or ::Element
           [Not available in XML::Simple] When you have a pre-parsed XML::LibXML node, you can
           pass that.

   Parameter %options
       XML::LibXML::Simple supports most options defined by XML::Simple, so the interface is
       quite compatible.  Minor changes apply.  This explanation is extracted from the
       XML::Simple manual-page.

       •   check out "ForceArray" because you'll almost certainly want to turn it on

       •   make sure you know what the "KeyAttr" option does and what its default value is
           because it may surprise you otherwise.

       •   Option names are case in-sensitive so you can use the mixed case versions shown here;
           you can add underscores between the words (eg: key_attr) if you like.

       In alphabetic order:

       ContentKey => 'keyname' # seldom used
           When text content is parsed to a hash value, this option let's you specify a name for
           the hash key to override the default 'content'.  So for example:

             XMLin('<opt one="1">Two</opt>', ContentKey => 'text')

           will parse to:

             { one => 1, text => 'Two' }

           instead of:

             { one => 1, content => 'Two' }

           You can also prefix your selected key name with a '-' character to have "XMLin()" try
           a little harder to eliminate unnecessary 'content' keys after array folding.  For
           example:

             XMLin(
               '<opt><item name="one">First</item><item name="two">Second</item></opt>',
               KeyAttr => {item => 'name'},
               ForceArray => [ 'item' ],
               ContentKey => '-content'
             )

           will parse to:

             {
                item => {
                 one =>  'First'
                 two =>  'Second'
               }
             }

           rather than this (without the '-'):

             {
               item => {
                 one => { content => 'First' }
                 two => { content => 'Second' }
               }
             }

       ForceArray => 1 # important
           This option should be set to '1' to force nested elements to be represented as arrays
           even when there is only one.  Eg, with ForceArray enabled, this XML:

               <opt>
                 <name>value</name>
               </opt>

           would parse to this:

               { name => [ 'value' ] }

           instead of this (the default):

               { name => 'value' }

           This option is especially useful if the data structure is likely to be written back
           out as XML and the default behaviour of rolling single nested elements up into
           attributes is not desirable.

           If you are using the array folding feature, you should almost certainly enable this
           option.  If you do not, single nested elements will not be parsed to arrays and
           therefore will not be candidates for folding to a hash.  (Given that the default value
           of 'KeyAttr' enables array folding, the default value of this option should probably
           also have been enabled as well).

       ForceArray => [ names ] # important
           This alternative (and preferred) form of the 'ForceArray' option allows you to specify
           a list of element names which should always be forced into an array representation,
           rather than the 'all or nothing' approach above.

           It is also possible to include compiled regular expressions in the list --any element
           names which match the pattern will be forced to arrays.  If the list contains only a
           single regex, then it is not necessary to enclose it in an arrayref.  Eg:

             ForceArray => qr/_list$/

       ForceContent => 1 # seldom used
           When "XMLin()" parses elements which have text content as well as attributes, the text
           content must be represented as a hash value rather than a simple scalar.  This option
           allows you to force text content to always parse to a hash value even when there are
           no attributes.  So for example:

             XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)

           will parse to:

             {
               x => {         content => 'text1' },
               y => { a => 2, content => 'text2' }
             }

           instead of:

             {
               x => 'text1',
               y => { 'a' => 2, 'content' => 'text2' }
             }

       GroupTags => { grouping tag => grouped tag } # handy
           You can use this option to eliminate extra levels of indirection in your Perl data
           structure.  For example this XML:

             <opt>
              <searchpath>
                <dir>/usr/bin</dir>
                <dir>/usr/local/bin</dir>
                <dir>/usr/X11/bin</dir>
              </searchpath>
            </opt>

           Would normally be read into a structure like this:

             {
               searchpath => {
                  dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
               }
             }

           But when read in with the appropriate value for 'GroupTags':

             my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });

           It will return this simpler structure:

             {
               searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
             }

           The grouping element ("<searchpath>" in the example) must not contain any attributes
           or elements other than the grouped element.

           You can specify multiple 'grouping element' to 'grouped element' mappings in the same
           hashref.  If this option is combined with "KeyAttr", the array folding will occur
           first and then the grouped element names will be eliminated.

       KeepRoot => 1 # handy
           In its attempt to return a data structure free of superfluous detail and unnecessary
           levels of indirection, "XMLin()" normally discards the root element name.  Setting the
           'KeepRoot' option to '1' will cause the root element name to be retained.  So after
           executing this code:

             $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)

           You'll be able to reference the tempdir as "$config->{config}->{tempdir}" instead of
           the default "$config->{tempdir}".

       KeyAttr => [ list ] # important
           This option controls the 'array folding' feature which translates nested elements from
           an array to a hash.  It also controls the 'unfolding' of hashes to arrays.

           For example, this XML:

               <opt>
                 <user login="grep" fullname="Gary R Epstein" />
                 <user login="stty" fullname="Simon T Tyson" />
               </opt>

           would, by default, parse to this:

               {
                 user => [
                    { login    => 'grep',
                      fullname => 'Gary R Epstein'
                    },
                    { login    => 'stty',
                      fullname => 'Simon T Tyson'
                    }
                 ]
               }

           If the option 'KeyAttr => "login"' were used to specify that the 'login' attribute is
           a key, the same XML would parse to:

               {
                 user => {
                    stty => { fullname => 'Simon T Tyson' },
                    grep => { fullname => 'Gary R Epstein' }
                 }
               }

           The key attribute names should be supplied in an arrayref if there is more than one.
           "XMLin()" will attempt to match attribute names in the order supplied.

           Note 1: The default value for 'KeyAttr' is "['name', 'key', 'id']".  If you do not
           want folding on input or unfolding on output you must setting this option to an empty
           list to disable the feature.

           Note 2: If you wish to use this option, you should also enable the "ForceArray"
           option.  Without 'ForceArray', a single nested element will be rolled up into a scalar
           rather than an array and therefore will not be folded (since only arrays get folded).

       KeyAttr => { list } # important
           This alternative (and preferred) method of specifiying the key attributes allows more
           fine grained control over which elements are folded and on which attributes.  For
           example the option 'KeyAttr => { package => 'id' } will cause any package elements to
           be folded on the 'id' attribute.  No other elements which have an 'id' attribute will
           be folded at all.

           Two further variations are made possible by prefixing a '+' or a '-' character to the
           attribute name:

           The option 'KeyAttr => { user => "+login" }' will cause this XML:

               <opt>
                 <user login="grep" fullname="Gary R Epstein" />
                 <user login="stty" fullname="Simon T Tyson" />
               </opt>

           to parse to this data structure:

               {
                 user => {
                    stty => {
                       fullname => 'Simon T Tyson',
                       login    => 'stty'
                    },
                    grep => {
                       fullname => 'Gary R Epstein',
                       login    => 'grep'
                    }
                 }
               }

           The '+' indicates that the value of the key attribute should be copied rather than
           moved to the folded hash key.

           A '-' prefix would produce this result:

               {
                 user => {
                    stty => {
                       fullname => 'Simon T Tyson',
                       -login   => 'stty'
                    },
                    grep => {
                       fullname => 'Gary R Epstein',
                       -login    => 'grep'
                    }
                 }
               }

       NoAttr => 1 # handy
           When used with "XMLin()", any attributes in the XML will be ignored.

       NormaliseSpace => 0 | 1 | 2 # handy
           This option controls how whitespace in text content is handled.  Recognised values for
           the option are:

           "0" (default) whitespace is passed through unaltered (except of course for the
               normalisation of whitespace in attribute values which is mandated by the XML
               recommendation)

           "1" whitespace is normalised in any value used as a hash key (normalising means
               removing leading and trailing whitespace and collapsing sequences of whitespace
               characters to a single space)

           "2" whitespace is normalised in all text content

           Note: you can spell this option with a 'z' if that is more natural for you.

       Parser => OBJECT
           You may pass your own XML::LibXML object, in stead of having one created for you. This
           is useful when you need specific configuration on that object (See
           XML::LibXML::Parser) or have implemented your own extension to that object.

           The internally created parser object is configured in safe mode.  Read the
           XML::LibXML::Parser manual about security issues with certain parameter settings.  The
           default is unsafe!

       ParserOpts => HASH|ARRAY
           Pass parameters to the creation of a new internal parser object. You can overrule the
           options which will create a safe parser. It may be more readible to use the "Parser"
           parameter.

       SearchPath => [ list ] # handy
           If you pass "XMLin()" a filename, but the filename include no directory component, you
           can use this option to specify which directories should be searched to locate the
           file.  You might use this option to search first in the user's home directory, then in
           a global directory such as /etc.

           If a filename is provided to "XMLin()" but SearchPath is not defined, the file is
           assumed to be in the current directory.

           If the first parameter to "XMLin()" is undefined, the default SearchPath will contain
           only the directory in which the script itself is located.  Otherwise the default
           SearchPath will be empty.

       ValueAttr => [ names ] # handy
           Use this option to deal elements which always have a single attribute and no content.
           Eg:

             <opt>
               <colour value="red" />
               <size   value="XXL" />
             </opt>

           Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse to:

             {
               colour => 'red',
               size   => 'XXL'
             }

           instead of this (the default):

             {
               colour => { value => 'red' },
               size   => { value => 'XXL' }
             }

       NsExpand => 0  advised
           When name-spaces are used, the default behavior is to include the prefix in the key
           name.  However, this is very dangerous: the prefixes can be changed without a change
           of the XML message meaning.  Therefore, you can better use this "NsExpand" option.
           The downside, however, is that the labels get very long.

           Without this option:

             <record xmlns:x="http://xyz">
               <x:field1>42</x:field1>
             </record>
             <record xmlns:y="http://xyz">
               <y:field1>42</y:field1>
             </record>

           translates into

             { 'x:field1' => 42 }
             { 'y:field1' => 42 }

           but both source component have exactly the same meaning.  When "NsExpand" is used, the
           result is:

             { '{http://xyz}field1' => 42 }
             { '{http://xyz}field1' => 42 }

           Of course, addressing these fields is more work.  It is advised to implement it like
           this:

             my $ns = 'http://xyz';
             $data->{"{$ns}field1"};

       NsStrip => 0 sloppy coding
           [not available in XML::Simple] Namespaces are really important to avoid name
           collissions, but they are a bit of a hassle.  To do it correctly, use option
           "NsExpand".  To do it sloppy, use "NsStrip".  With this option set, the above example
           will return

             { field1 => 42 }
             { field1 => 42 }

EXAMPLES

       When "XMLin()" reads the following very simple piece of XML:

           <opt username="testuser" password="frodo"></opt>

       it returns the following data structure:

           {
             username => 'testuser',
             password => 'frodo'
           }

       The identical result could have been produced with this alternative XML:

           <opt username="testuser" password="frodo" />

       Or this (although see 'ForceArray' option for variations):

           <opt>
             <username>testuser</username>
             <password>frodo</password>
           </opt>

       Repeated nested elements are represented as anonymous arrays:

           <opt>
             <person firstname="Joe" lastname="Smith">
               <email>joe@smith.com</email>
               <email>jsmith@yahoo.com</email>
             </person>
             <person firstname="Bob" lastname="Smith">
               <email>bob@smith.com</email>
             </person>
           </opt>

           {
             person => [
               { email     => [ 'joe@smith.com', 'jsmith@yahoo.com' ],
                 firstname => 'Joe',
                 lastname  => 'Smith'
               },
               { email     => 'bob@smith.com',
                 firstname => 'Bob',
                 lastname  => 'Smith'
               }
             ]
           }

       Nested elements with a recognised key attribute are transformed (folded) from an array
       into a hash keyed on the value of that attribute (see the "KeyAttr" option):

           <opt>
             <person key="jsmith" firstname="Joe" lastname="Smith" />
             <person key="tsmith" firstname="Tom" lastname="Smith" />
             <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
           </opt>

           {
             person => {
                jbloggs => {
                   firstname => 'Joe',
                   lastname  => 'Bloggs'
                },
                tsmith  => {
                   firstname => 'Tom',
                   lastname  => 'Smith'
                },
                jsmith => {
                   firstname => 'Joe',
                   lastname => 'Smith'
                }
             }
           }

       The <anon> tag can be used to form anonymous arrays:

           <opt>
             <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
             <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
             <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
             <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
           </opt>

           {
             head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
             data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
                       [ 'R2C1', 'R2C2', 'R2C3' ],
                       [ 'R3C1', 'R3C2', 'R3C3' ]
                     ]
           }

       Anonymous arrays can be nested to arbirtrary levels and as a special case, if the
       surrounding tags for an XML document contain only an anonymous array the arrayref will be
       returned directly rather than the usual hashref:

           <opt>
             <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
             <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
             <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
           </opt>

           [
             [ 'Col 1', 'Col 2' ],
             [ 'R1C1', 'R1C2' ],
             [ 'R2C1', 'R2C2' ]
           ]

       Elements which only contain text content will simply be represented as a scalar.  Where an
       element has both attributes and text content, the element will be represented as a hashref
       with the text content in the 'content' key (see the "ContentKey" option):

         <opt>
           <one>first</one>
           <two attr="value">second</two>
         </opt>

         {
           one => 'first',
           two => { attr => 'value', content => 'second' }
         }

       Mixed content (elements which contain both text content and nested elements) will be not
       be represented in a useful way - element order and significant whitespace will be lost.
       If you need to work with mixed content, then XML::Simple is not the right tool for your
       job - check out the next section.

   Differences to XML::Simple
       In general, the output and the options are equivalent, although this module has some
       differences with XML::Simple to be aware of.

       only XMLin() is supported
           If you want to write XML then use a schema (for instance with XML::Compile). Do not
           attempt to create XML by hand!  If you still think you need it, then have a look at
           XMLout() as implemented by XML::Simple or any of a zillion template systems.

       no "variables" option
           IMO, you should use a templating system if you want variables filled-in in the input:
           it is not a task for this module.

       empty elements are not removed
           Being empty has a meaning which should not be ignored.

       ForceArray options
           There are a few small differences in the result of the "forcearray" option, because
           XML::Simple seems to behave inconsequently.

COPYRIGHTS

       The interface design and large parts of the documentation were taken from the XML::Simple
       module, written by Grant McLean <grantm@cpan.org>

       Copyrights of the perl code and the related documentation by 2008-2014 by [Mark Overmeer].
       For other contributors see ChangeLog.

       This program is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.  See http://www.perl.com/perl/misc/Artistic.html

NAME

INHERITANCE

SYNOPSIS

DESCRIPTION

METHODS

FUNCTIONS

DETAILS

EXAMPLES

SEE ALSO

COPYRIGHTS