Ubuntu Manpage: XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()

Provided by: libxml-libxml-simple-perl_0.93-1_all

NAME

       XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()

INHERITANCE

        XML::LibXML::Simple
          is a Exporter

SYNOPSIS

         use XML::LibXML::Simple   qw(XMLin);
         my $xml = XMLin <xml file or string>, OPTIONS;

       Or the Object Oriented way:

         use XML::LibXML::Simple   ();
         my $xs = XML::LibXML::Simple->new(OPTIONS);
         my $ref = $xs->XMLin(<xml file or string>, OPTIONS);

DESCRIPTION

       This module is a blunt rewrite of XML::Simple (by Grant McLean) to use the XML::LibXML parser for XML
       structures, where the original uses plain Perl or SAX parsers.

METHODS

   Constructors
       XML::LibXML::Simple->new(OPTIONS)
           Instantiate  an  object,  which  can  be  used  to  call XMLin() on.  You can provide OPTIONS to this
           constructor (to be reused for each call to XMLin) and with each call of XMLin (to be used once)

           For XML-DATA and descriptions of the OPTIONS see the "DETAILS" section of this manual page.

   Translators
       $obj->XMLin(XML-DATA, OPTIONS)
           For XML-DATA and descriptions of the OPTIONS see the "DETAILS" section of this manual page.

FUNCTIONS

       The  functions  "XMLin"  (exported  implictly)  and  "xml_in"   (exported   on   request)   simply   call
       "<XML::LibXML::Simple-"new->XMLin() >> with the provided parameters.

DETAILS

   Differences with XML::Simple
       In  general,  the  output  and the options are equivalent, although this module has some differences with
       XML::Simple to be aware of.

       only XMLin() is supported
           If you want to write XML then use a schema (for instance with XML::Compile). Do not attempt to create
           XML by hand!  If you still think you need it,  then  have  a  look  at  XMLout()  as  implemented  by
           XML::Simple or any of a zillion template systems.

       no "variables" option
           IMO,  you  should  use  a templating system if you want variables filled-in in the input: it is not a
           task for this module.

       empty elements are not removed
           Being empty has a meaning which should not be ignored.

       ForceArray options
           There are a few small differences in the result of the "forcearray" option, because XML::Simple seems
           to behave inconsequently.

   Parameter XML-DATA
       As first parameter to XMLin() must provide the XML message  to  be  translated  into  a  Perl  structure.
       Choose one of the following:

       A filename
           If  the filename contains no directory components, "XMLin()" will look for the file in each directory
           in the SearchPath (see OPTIONS below) and in the current directory.  eg:

             $ref = XMLin('/etc/params.xml');

           Note, the filename "-" (dash) can be used to parse from STDIN.

       undef
           If there is no XML specifier, "XMLin()" will check the script directory and each  of  the  SearchPath
           directories  for a file with the same name as the script but with the extension '.xml'.  Note: if you
           wish to specify options, you must specify the value 'undef'.  eg:

             $ref = XMLin(undef, ForceArray => 1);

       A string of XML
           A string containing XML (recognised by the presence  of  '<'  and  '>'  characters)  will  be  parsed
           directly.  eg:

             $ref = XMLin('<opt username="bob" password="flurp" />');

       An IO::Handle object
           An IO::Handle object will be read to EOF and its contents parsed. eg:

             $fh = IO::File->new('/etc/params.xml');
             $ref = XMLin($fh);

   OPTIONS
       XML::LibXML::Simple  supports  most options defined by XML::Simple, so the interface is quite compatible.
       Minor changes apply.  This explanation is extracted from the XML::Simple manual-page.

       •   check out "ForceArray" because you'll almost certainly want to turn it on

       •   make sure you know what the "KeyAttr" option does and what  its  default  value  is  because  it  may
           surprise you otherwise.

       •   Option  names  are  case  in-sensitive so you can use the mixed case versions shown here; you can add
           underscores between the words (eg: key_attr) if you like.

       In alphabetic order:

       ContentKey => 'keyname' # seldom used
           When text content is parsed to a hash value, this option let's you specify a name for the hash key to
           override the default 'content'.  So for example:

             XMLin('<opt one="1">Two</opt>', ContentKey => 'text')

           will parse to:

             { one => 1, text => 'Two' }

           instead of:

             { one => 1, content => 'Two' }

           You can also prefix your selected key name with a '-' character to have "XMLin()" try a little harder
           to eliminate unnecessary 'content' keys after array folding.  For example:

             XMLin(
               '<opt><item name="one">First</item><item name="two">Second</item></opt>',
               KeyAttr => {item => 'name'},
               ForceArray => [ 'item' ],
               ContentKey => '-content'
             )

           will parse to:

             {
                item => {
                 one =>  'First'
                 two =>  'Second'
               }
             }

           rather than this (without the '-'):

             {
               item => {
                 one => { content => 'First' }
                 two => { content => 'Second' }
               }
             }

       ForceArray => 1 # important
           This option should be set to '1' to force nested elements to be represented as arrays even when there
           is only one.  Eg, with ForceArray enabled, this XML:

               <opt>
                 <name>value</name>
               </opt>

           would parse to this:

               { name => [ 'value' ] }

           instead of this (the default):

               { name => 'value' }

           This option is especially useful if the data structure is likely to be written back out  as  XML  and
           the default behaviour of rolling single nested elements up into attributes is not desirable.

           If  you  are using the array folding feature, you should almost certainly enable this option.  If you
           do not, single nested elements will not be parsed to arrays and therefore will not be candidates  for
           folding  to  a  hash.   (Given that the default value of 'KeyAttr' enables array folding, the default
           value of this option should probably also have been enabled as well).

       ForceArray => [ names ] # important
           This alternative (and preferred) form of the 'ForceArray' option allows you  to  specify  a  list  of
           element  names  which  should  always be forced into an array representation, rather than the 'all or
           nothing' approach above.

           It is also possible to include compiled regular expressions in the list  --any  element  names  which
           match the pattern will be forced to arrays.  If the list contains only a single regex, then it is not
           necessary to enclose it in an arrayref.  Eg:

             ForceArray => qr/_list$/

       ForceContent => 1 # seldom used
           When  "XMLin()"  parses elements which have text content as well as attributes, the text content must
           be represented as a hash value rather than a simple scalar.  This option allows  you  to  force  text
           content to always parse to a hash value even when there are no attributes.  So for example:

             XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)

           will parse to:

             {
               x => {         content => 'text1' },
               y => { a => 2, content => 'text2' }
             }

           instead of:

             {
               x => 'text1',
               y => { 'a' => 2, 'content' => 'text2' }
             }

       GroupTags => { grouping tag => grouped tag } # handy
           You  can  use  this option to eliminate extra levels of indirection in your Perl data structure.  For
           example this XML:

             <opt>
              <searchpath>
                <dir>/usr/bin</dir>
                <dir>/usr/local/bin</dir>
                <dir>/usr/X11/bin</dir>
              </searchpath>
            </opt>

           Would normally be read into a structure like this:

             {
               searchpath => {
                  dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
               }
             }

           But when read in with the appropriate value for 'GroupTags':

             my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });

           It will return this simpler structure:

             {
               searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
             }

           The grouping element ("<searchpath>" in the example) must not  contain  any  attributes  or  elements
           other than the grouped element.

           You  can  specify  multiple 'grouping element' to 'grouped element' mappings in the same hashref.  If
           this option is combined with "KeyAttr", the array folding will  occur  first  and  then  the  grouped
           element names will be eliminated.

       KeepRoot => 1 # handy
           In  its  attempt  to  return  a  data  structure free of superfluous detail and unnecessary levels of
           indirection, "XMLin()" normally discards the root element name.  Setting the 'KeepRoot' option to '1'
           will cause the root element name to be retained.  So after executing this code:

             $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)

           You'll be able to reference the tempdir as  "$config->{config}->{tempdir}"  instead  of  the  default
           "$config->{tempdir}".

       KeyAttr => [ list ] # important
           This  option controls the 'array folding' feature which translates nested elements from an array to a
           hash.  It also controls the 'unfolding' of hashes to arrays.

           For example, this XML:

               <opt>
                 <user login="grep" fullname="Gary R Epstein" />
                 <user login="stty" fullname="Simon T Tyson" />
               </opt>

           would, by default, parse to this:

               {
                 user => [
                    { login    => 'grep',
                      fullname => 'Gary R Epstein'
                    },
                    { login    => 'stty',
                      fullname => 'Simon T Tyson'
                    }
                 ]
               }

           If the option 'KeyAttr => "login"' were used to specify that the 'login' attribute is a key, the same
           XML would parse to:

               {
                 user => {
                    stty => { fullname => 'Simon T Tyson' },
                    grep => { fullname => 'Gary R Epstein' }
                 }
               }

           The key attribute names should be supplied in an arrayref if there is more than one.  "XMLin()"  will
           attempt to match attribute names in the order supplied.

           Note  1:  The  default value for 'KeyAttr' is "['name', 'key', 'id']".  If you do not want folding on
           input or unfolding on output you must setting this option to an empty list to disable the feature.

           Note 2: If you wish to use this option, you should also  enable  the  "ForceArray"  option.   Without
           'ForceArray',  a  single  nested  element  will  be  rolled up into a scalar rather than an array and
           therefore will not be folded (since only arrays get folded).

       KeyAttr => { list } # important
           This alternative (and preferred) method of specifiying the key attributes allows  more  fine  grained
           control over which elements are folded and on which attributes.  For example the option 'KeyAttr => {
           package  =>  'id'  }  will  cause  any package elements to be folded on the 'id' attribute.  No other
           elements which have an 'id' attribute will be folded at all.

           Two further variations are made possible by prefixing a '+' or a '-' character to the attribute name:

           The option 'KeyAttr => { user => "+login" }' will cause this XML:

               <opt>
                 <user login="grep" fullname="Gary R Epstein" />
                 <user login="stty" fullname="Simon T Tyson" />
               </opt>

           to parse to this data structure:

               {
                 user => {
                    stty => {
                       fullname => 'Simon T Tyson',
                       login    => 'stty'
                    },
                    grep => {
                       fullname => 'Gary R Epstein',
                       login    => 'grep'
                    }
                 }
               }

           The '+' indicates that the value of the key attribute should be  copied  rather  than  moved  to  the
           folded hash key.

           A '-' prefix would produce this result:

               {
                 user => {
                    stty => {
                       fullname => 'Simon T Tyson',
                       -login   => 'stty'
                    },
                    grep => {
                       fullname => 'Gary R Epstein',
                       -login    => 'grep'
                    }
                 }
               }

       NoAttr => 1 # handy
           When used with "XMLin()", any attributes in the XML will be ignored.

       NormaliseSpace => 0 | 1 | 2 # handy
           This  option  controls  how  whitespace in text content is handled.  Recognised values for the option
           are:

           "0" (default) whitespace is passed through unaltered (except  of  course  for  the  normalisation  of
               whitespace in attribute values which is mandated by the XML recommendation)

           "1" whitespace  is normalised in any value used as a hash key (normalising means removing leading and
               trailing whitespace and collapsing sequences of whitespace characters to a single space)

           "2" whitespace is normalised in all text content

           Note: you can spell this option with a 'z' if that is more natural for you.

       Parser => OBJECT
           You may pass your own XML::LibXML object, in stead of having one created for you. This is useful when
           you need specific configuration on that object (See XML::LibXML::Parser) or have implemented your own
           extension to that object.

           The internally created parser object is configured in safe mode.  Read the XML::LibXML::Parser manual
           about security issues with certain parameter settings.  The default is unsafe!

       ParserOpts => HASH|ARRAY
           Pass parameters to the creation of a new internal parser object. You can overrule the  options  which
           will create a safe parser. It may be more readible to use the "Parser" parameter.

       SearchPath => [ list ] # handy
           If  you  pass "XMLin()" a filename, but the filename include no directory component, you can use this
           option to specify which directories should be searched to locate the file.  You might use this option
           to search first in the user's home directory, then in a global directory such as /etc.

           If a filename is provided to "XMLin()" but SearchPath is not defined, the file is assumed  to  be  in
           the current directory.

           If  the  first  parameter  to  "XMLin()"  is  undefined, the default SearchPath will contain only the
           directory in which the script itself is located.  Otherwise the default SearchPath will be empty.

       ValueAttr => [ names ] # handy
           Use this option to deal elements which always have a single attribute and no content.  Eg:

             <opt>
               <colour value="red" />
               <size   value="XXL" />
             </opt>

           Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse to:

             {
               colour => 'red',
               size   => 'XXL'
             }

           instead of this (the default):

             {
               colour => { value => 'red' },
               size   => { value => 'XXL' }
             }

       NsExpand => 0  advised
           When name-spaces are used, the default behavior is to include the prefix in the key  name.   However,
           this  is  very  dangerous:  the  prefixes can be changed without a change of the XML message meaning.
           Therefore, you can better use this "NsExpand" option.  The downside, however, is that the labels  get
           very long.

           Without this option:

             <record xmlns:x="http://xyz">
               <x:field1>42</x:field1>
             </record>
             <record xmlns:y="http://xyz">
               <y:field1>42</y:field1>
             </record>

           translates into

             { 'x:field1' => 42 }
             { 'y:field1' => 42 }

           but both source component have exactly the same meaning.  When "NsExpand" is used, the result is:

             { '{http://xyz}field1' => 42 }
             { '{http://xyz}field1' => 42 }

           Of course, addressing these fields is more work.  It is advised to implement it like this:

             my $ns = 'http://xyz';
             $data->{"{$ns}field1"};

       NsStrip => 0 sloppy coding
           [not  available  in  XML::Simple] Namespaces are really important to avoid name collissions, but they
           are a bit of a hassle.  To do it correctly, use option "NsExpand".  To do it sloppy,  use  "NsStrip".
           With this option set, the above example will return

             { field1 => 42 }
             { field1 => 42 }

EXAMPLES

       When "XMLin()" reads the following very simple piece of XML:

           <opt username="testuser" password="frodo"></opt>

       it returns the following data structure:

           {
             username => 'testuser',
             password => 'frodo'
           }

       The identical result could have been produced with this alternative XML:

           <opt username="testuser" password="frodo" />

       Or this (although see 'ForceArray' option for variations):

           <opt>
             <username>testuser</username>
             <password>frodo</password>
           </opt>

       Repeated nested elements are represented as anonymous arrays:

           <opt>
             <person firstname="Joe" lastname="Smith">
               <email>joe@smith.com</email>
               <email>jsmith@yahoo.com</email>
             </person>
             <person firstname="Bob" lastname="Smith">
               <email>bob@smith.com</email>
             </person>
           </opt>

           {
             person => [
               { email     => [ 'joe@smith.com', 'jsmith@yahoo.com' ],
                 firstname => 'Joe',
                 lastname  => 'Smith'
               },
               { email     => 'bob@smith.com',
                 firstname => 'Bob',
                 lastname  => 'Smith'
               }
             ]
           }

       Nested  elements with a recognised key attribute are transformed (folded) from an array into a hash keyed
       on the value of that attribute (see the "KeyAttr" option):

           <opt>
             <person key="jsmith" firstname="Joe" lastname="Smith" />
             <person key="tsmith" firstname="Tom" lastname="Smith" />
             <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
           </opt>

           {
             person => {
                jbloggs => {
                   firstname => 'Joe',
                   lastname  => 'Bloggs'
                },
                tsmith  => {
                   firstname => 'Tom',
                   lastname  => 'Smith'
                },
                jsmith => {
                   firstname => 'Joe',
                   lastname => 'Smith'
                }
             }
           }

       The <anon> tag can be used to form anonymous arrays:

           <opt>
             <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
             <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
             <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
             <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
           </opt>

           {
             head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
             data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
                       [ 'R2C1', 'R2C2', 'R2C3' ],
                       [ 'R3C1', 'R3C2', 'R3C3' ]
                     ]
           }

       Anonymous arrays can be nested to arbirtrary levels and as a special case, if the surrounding tags for an
       XML document contain only an anonymous array the arrayref will be returned directly rather than the usual
       hashref:

           <opt>
             <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
             <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
             <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
           </opt>

           [
             [ 'Col 1', 'Col 2' ],
             [ 'R1C1', 'R1C2' ],
             [ 'R2C1', 'R2C2' ]
           ]

       Elements which only contain text content will simply be represented as a scalar.  Where  an  element  has
       both  attributes  and text content, the element will be represented as a hashref with the text content in
       the 'content' key (see the "ContentKey" option):

         <opt>
           <one>first</one>
           <two attr="value">second</two>
         </opt>

         {
           one => 'first',
           two => { attr => 'value', content => 'second' }
         }

       Mixed content (elements which contain both text content and nested elements) will be not  be  represented
       in  a useful way - element order and significant whitespace will be lost.  If you need to work with mixed
       content, then XML::Simple is not the right tool for your job - check out the next section.

COPYRIGHTS

       The interface design and large parts of the documentation were taken from the XML::Simple module, written
       by Grant McLean <grantm@cpan.org>

       Copyrights of the perl code and the related documentation by 2008-2013  by  [Mark  Overmeer].  For  other
       contributors see ChangeLog.

       This  program  is  free  software;  you can redistribute it and/or modify it under the same terms as Perl
       itself.  See http://www.perl.com/perl/misc/Artistic.html

perl v5.14.2                                       2013-03-02                           XML::LibXML::Simple(3pm)

NAME

INHERITANCE

SYNOPSIS

DESCRIPTION

METHODS

FUNCTIONS

DETAILS

EXAMPLES

SEE ALSO

COPYRIGHTS