Ubuntu Manpage: Locale::Po4a::Xml - convert XML documents and derivates from/to PO files

NAME

       Locale::Po4a::Xml - convert XML documents and derivates from/to PO files

DESCRIPTION

       The po4a (PO for anything) project goal is to ease translations (and more interestingly, the maintenance
       of translations) using gettext tools on areas where they were not expected like documentation.

       Locale::Po4a::Xml is a module to help the translation of XML documents into other [human] languages. It
       can also be used as a base to build modules for XML-based documents.

TRANSLATING WITH PO4A::XML

       This module can be used directly to handle generic XML documents.  This will extract all tag's content,
       and no attributes, since it's where the text is written in most XML based documents.

       There are some options (described in the next section) that can customize this behavior.  If this doesn't
       fit to your document format you're encouraged to write your own module derived from this, to describe
       your format's details.  See the section WRITING DERIVATE MODULES below, for the process description.

OPTIONS ACCEPTED BY THIS MODULE

       The global debug option causes this module to show the excluded strings, in order to see if it skips
       something important.

       These are this module's particular options:

       nostrip
           Prevents it to strip the spaces around the extracted strings.

       wrap
           Canonicalizes  the string to translate, considering that whitespaces are not important, and wraps the
           translated document. This option can be overridden by custom  tag  options.  See  the  "tags"  option
           below.

       unwrap_attributes
           Attributes are wrapped by default. This option disables wrapping.

       caseinsensitive
           It  makes  the  tags and attributes searching to work in a case insensitive way.  If it's defined, it
           will treat <BooK>laNG and <BOOK>Lang as <book>lang.

       escapequotes
           Escape quotes in output strings.  Necessary, for example, for creating string resources  for  use  by
           Android build tools.

           See also: https://developer.android.com/guide/topics/resources/string-resource.html

       includeexternal
           When  defined,  external  entities  are  included in the generated (translated) document, and for the
           extraction of strings.  If it's not defined, you will have to translate external entities  separately
           as independent documents.

       ontagerror
           This  option  defines the behavior of the module when it encounters invalid XML syntax (a closing tag
           which does not match the last opening tag, or a tag's attribute without  value).   It  can  take  the
           following values:

           fail
               This is the default value.  The module will exit with an error.

           warn
               The module will continue, and will issue a warning.

           silent
               The module will continue without any warnings.

           Be careful when using this option.  It is generally recommended to fix the input file.

       tagsonly
           Extracts  only  the  specified  tags  in  the "tags" option.  Otherwise, it will extract all the tags
           except the ones specified.

           Note: This option is deprecated.

       doctype
           String that will try to match with the first line of the  document's  doctype  (if  defined).  If  it
           doesn't, a warning will indicate that the document might be of a bad type.

       addlang
           String  indicating  the  path (e.g. <bbb><aaa>) of a tag where a lang="..." attribute shall be added.
           The language will be defined as the basename of the PO file without any .po extension.

       tags
           Space-separated list of tags you want to translate or skip.  By default, the specified tags  will  be
           excluded,  but  if  you use the "tagsonly" option, the specified tags will be the only ones included.
           The tags must be in the form <aaa>, but you can join some (<bbb><aaa>) to say that the content of the
           tag <aaa> will only be translated when it's into a <bbb> tag.

           You can also specify some tag options by putting some characters in front of the tag  hierarchy.  For
           example, you can put 'w' (wrap) or 'W' (don't wrap) to override the default behavior specified by the
           global "wrap" option.

           Example: W<chapter><title>

           Note: This option is deprecated.  You should use the translated and untranslated options instead.

       attributes
           Space-separated  list  of  tag's attributes you want to translate.  You can specify the attributes by
           their name (for example, "lang"), but you can prefix it with a tag hierarchy, to  specify  that  this
           attribute  will  only  be  translated  when  it's  in  the specified tag. For example: <bbb><aaa>lang
           specifies that the lang attribute will only be translated if it's in an <aaa>  tag,  and  it's  in  a
           <bbb> tag.

       foldattributes
           Do  not  translate  attributes  in  inline  tags.   Instead,  replace  all  attributes  of  a  tag by
           po4a-id=<id>.

           This is useful when  attributes  shall  not  be  translated,  as  this  simplifies  the  strings  for
           translators, and avoids typos.

       customtag
           Space-separated  list of tags which should not be treated as tags.  These tags are treated as inline,
           and do not need to be closed.

       break
           Space-separated list of tags which should break  the  sequence.   By  default,  all  tags  break  the
           sequence.

           The  tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag (<aaa>) should only
           be considered when it's within another tag (<bbb>).

       inline
           Space-separated list of tags which should be treated as inline.   By  default,  all  tags  break  the
           sequence.

           The  tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag (<aaa>) should only
           be considered when it's within another tag (<bbb>).

       placeholder
           Space-separated list of tags which should be treated as placeholders.  Placeholders do not break  the
           sequence, but the content of placeholders is translated separately.

           The location of the placeholder in its block will be marked with a string similar to:

             <placeholder type=\"footnote\" id=\"0\"/>

           The  tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag (<aaa>) should only
           be considered when it's within another tag (<bbb>).

       nodefault
           Space separated list of tags that the module should not try to set by default in any category.

       cpp Support C preprocessor directives.   When  this  option  is  set,  po4a  will  consider  preprocessor
           directives  as  paragraph separators.  This is important if the XML file must be preprocessed because
           otherwise the directives may be inserted in the middle of lines if po4a consider  it  belong  to  the
           current  paragraph,  and  they  won't  be  recognized  by  the  preprocessor.  Note: the preprocessor
           directives must only appear between tags (they must not break a tag).

       translated
           Space-separated list of tags you want to translate.

           The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag (<aaa>) should  only
           be considered when it's within another tag (<bbb>).

           You  can  also specify some tag options by putting some characters in front of the tag hierarchy. For
           example, you can put 'w' (wrap) or 'W' (don't wrap) to override the default behavior specified by the
           global "wrap" option.

           Example: W<chapter><title>

       untranslated
           Space-separated list of tags you do not want to translate.

           The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag (<aaa>) should  only
           be considered when it's within another tag (<bbb>).

       defaulttranslateoption
           The  default  categories for tags that are not in any of the translated, untranslated, break, inline,
           or placeholder.

           This is a set of letters:

           w   Tags should be translated and content can be re-wrapped.

           W   Tags should be translated and content should not be re-wrapped.

           i   Tags should be translated inline.

           p   Tags should be translated as placeholders.

WRITING DERIVATE MODULES

   DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
       The simplest customization is to define which tags and attributes you want the parser to translate.  This
       should be done in the initialize function.  First you  should  call  the  main  initialize,  to  get  the
       command-line options, and then, append your custom definitions to the options hash.  If you want to treat
       some new options from command line, you should define them before calling the main initialize:

         $self->{options}{'new_option'}='';
         $self->SUPER::initialize(%options);
         $self->{options}{'_default_translated'}.=' <p> <head><title>';
         $self->{options}{'attributes'}.=' <p>lang id';
         $self->{options}{'_default_inline'}.=' <br>';
         $self->treat_options;

       You   should   use   the   _default_inline,  _default_break,  _default_placeholder,  _default_translated,
       _default_untranslated, and _default_attributes options in derivated modules. This allow users to override
       the default behavior defined in your module with command line options.

   OVERRIDING THE found_string FUNCTION
       Another simple step is to override the function "found_string", which receives the extracted strings from
       the parser, in order to translate them.  There you can control which strings you want to  translate,  and
       perform transformations to them before or after the translation itself.

       It receives the extracted text, the reference on where it was, and a hash that contains extra information
       to control what strings to translate, how to translate them and to generate the comment.

       The content of these options depends on the kind of string it is (specified in an entry of this hash):

       type="tag"
           The  found  string  is the content of a translatable tag. The entry "tag_options" contains the option
           characters in front of the tag hierarchy in the module "tags" option.

       type="attribute"
           Means that the found string is the value of a translatable attribute. The entry "attribute"  has  the
           name of the attribute.

       It must return the text that will replace the original in the translated document. Here's a basic example
       of this function:

         sub found_string {
           my ($self,$text,$ref,$options)=@_;
           $text = $self->translate($text,$ref,"type ".$options->{'type'},
             'wrap'=>$self->{options}{'wrap'});
           return $text;
         }

       There's another simple example in the new Dia module, which only filters some strings.

   MODIFYING TAG TYPES (TODO)
       This  is  a  more  complex  one,  but it enables a (almost) total customization.  It's based on a list of
       hashes, each one defining a tag type's behavior. The list should be sorted so that the most general  tags
       are  after  the  most concrete ones (sorted first by the beginning and then by the end keys). To define a
       tag type you'll have to make a hash with the following keys:

       beginning
           Specifies the beginning of the tag, after the "<".

       end Specifies the end of the tag, before the ">".

       breaking
           It says if this is a breaking tag class.  A non-breaking (inline) tag is one that  can  be  taken  as
           part of the content of another tag.  It can take the values false (0), true (1) or undefined.  If you
           leave  this undefined, you'll have to define the f_breaking function that will say whether a concrete
           tag of this class is a breaking tag or not.

       f_breaking
           It's a function that will tell if the next tag is a breaking one or not.  It should be defined if the
           breaking option is not.

       f_extract
           If you leave this key undefined, the generic extraction function will have to extract the tag itself.
           It's useful for tags that can have other tags or special structures in them, so that the main  parser
           doesn't  get  mad.   This function receives a boolean that says if the tag should be removed from the
           input stream or not.

       f_translate
           This function receives the tag (in the get_string_until() format)  and  returns  the  translated  tag
           (translated attributes or all needed transformations) as a single string.

INTERNAL FUNCTIONS used to write derivated parsers

   WORKING WITH TAGS
       get_path()
           This  function  returns  the  path  to  the  current  tag  from  the  document's  root,  in  the form
           <html><body><p>.

           An additional array of tags (without brackets) can be passed as argument.  These  path  elements  are
           added to the end of the current path.

       tag_type()
           This  function  returns  the  index  from  the  tag_types list that fits to the next tag in the input
           stream, or -1 if it's at the end of the input file.

       extract_tag($$)
           This function returns the next tag from the input stream without the beginning and end, in  an  array
           form, to maintain the references from the input file.  It has two parameters: the type of the tag (as
           returned by tag_type) and a boolean, that indicates if it should be removed from the input stream.

       get_tag_name(@)
           This  function  returns  the  name  of  the  tag passed as an argument, in the array form returned by
           extract_tag.

       breaking_tag()
           This function returns a boolean that says if the next tag in the input stream is a  breaking  tag  or
           not (inline tag).  It leaves the input stream intact.

       treat_tag()
           This  function  translates  the  next  tag  from  the  input  stream.   Using  each tag type's custom
           translation functions.

       tag_in_list($@)
           This function returns a string value that says if the first argument (a tag hierarchy) matches any of
           the tags from the second argument (a list of tags or  tag  hierarchies).  If  it  doesn't  match,  it
           returns  0.  Else, it returns the matched tag's options (the characters in front of the tag) or 1 (if
           that tag doesn't have options).

   WORKING WITH ATTRIBUTES
       treat_attributes(@)
           This function handles the translation of the tags'  attributes.  It  receives  the  tag  without  the
           beginning  /  end  marks,  and  then it finds the attributes, and it translates the translatable ones
           (specified by the module option "attributes").  This returns a plain string with the translated tag.

   WORKING WITH THE MODULE OPTIONS
       treat_options()
           This function fills the internal structures that contain the tags, attributes and  inline  data  with
           the options of the module (specified in the command-line or in the initialize function).

   GETTING TEXT FROM THE INPUT DOCUMENT
       get_string_until($%)
           This function returns an array with the lines (and references) from the input document until it finds
           the first argument.  The second argument is an options hash. Value 0 means disabled (the default) and
           1, enabled.

           The valid options are:

           include
               This makes the returned array to contain the searched text

           remove
               This removes the returned stream from the input

           unquoted
               This ensures that the searched text is outside any quotes

       skip_spaces(\@)
           This  function  receives  as  argument  the  reference  to  a  paragraph  (in  the format returned by
           get_string_until), skips his heading spaces and returns them as a simple string.

       join_lines(@)
           This function returns a simple  string  with  the  text  from  the  argument  array  (discarding  the
           references).

STATUS OF THIS MODULE

       This module can translate tags and attributes.

TODO LIST

       DOCTYPE (ENTITIES)

       There  is a minimal support for the translation of entities. They are translated as a whole, and tags are
       not taken into account. Multilines entities are not supported and entities are  always  rewrapped  during
       the translation.

       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure inside the $self hash?)

AUTHORS

        Jordi Vilalta <jvprat@gmail.com>
        Nicolas François <nicolas.francois@centraliens.net>

COPYRIGHT AND LICENSE

        Copyright (c) 2004 by Jordi Vilalta  <jvprat@gmail.com>
        Copyright (c) 2008-2009 by Nicolas François <nicolas.francois@centraliens.net>

       This  program  is free software; you may redistribute it and/or modify it under the terms of GPL (see the
       COPYING file).

Po4a Tools                                         2017-08-26                             Locale::Po4a::Xml(3pm)

NAME

DESCRIPTION

TRANSLATING WITH PO4A::XML

OPTIONS ACCEPTED BY THIS MODULE

WRITING DERIVATE MODULES

INTERNAL FUNCTIONS used to write derivated parsers

STATUS OF THIS MODULE

TODO LIST

SEE ALSO

AUTHORS

COPYRIGHT AND LICENSE