Provided by: po4a_0.47-2_all bug

NAME

       po4a - framework to translate documentation and other materials

Introduction

       The po4a (PO for anything) project goal is to ease translations (and more interestingly, the maintenance
       of translations) using gettext tools on areas where they were not expected like documentation.

Table of content

       This document is organized as follow:

       1 Why should I use po4a? What is it good for?
           This  introducing  chapter explains the motivation of the project and its philosophy. You should read
           it first if you are in the process of evaluating po4a for your own translations.

       2 How to use po4a?
           This chapter is a sort of reference manual, trying to answer the users' questions and to give  you  a
           better understanding of the whole process. This introduces how to do things with po4a and serve as an
           introduction to the documentation of the specific tools.

           HOWTO begin a new translation?
           HOWTO change the translation back to a documentation file?
           HOWTO update a po4a translation?
           HOWTO convert a pre-existing translation to po4a?
           HOWTO add extra text to translations (like translator's name)?
           HOWTO do all this in one program invocation?
           HOWTO customize po4a?
       3 How does it work?
           This chapter gives you a brief overview of the po4a internals, so that you may feel more confident to
           help  us maintaining and improving it. It may also help you understanding why it does not do what you
           expected, and how to solve your problems.

       4 FAQ
           This chapter groups the Frequently Asked Questions. In fact, most of the questions for now  could  be
           formulated  that  way:  "Why  is it designed this way, and not that one?" If you think po4a isn't the
           right answer to documentation translation, you should consider reading this section. If it  does  not
           answer  your question, please contact us on the <po4a-devel@lists.alioth.debian.org> mailing list. We
           love feedback.

       5 Specific notes about modules
           This chapter presents the specificities of each module from  the  translator  and  original  author's
           point  of  view.  Read  this  to  learn  the syntax you will encounter when translating stuff in this
           module, or the rules you should follow in your original document to make translators' life easier.

           Actually, this section is not really part of this document. Instead, it is placed  in  each  module's
           documentation.  This  helps  ensuring that the information is up to date by keeping the documentation
           and the code together.

Why should I use po4a? What it is good for?

       I like the idea of open-source software, making it possible for everybody  to  access  software  and  its
       source  code.  But  being  French,  I'm  well aware that the licensing is not the only restriction to the
       openness of software: non-translated free software is useless for non-English speakers, and we still have
       some work to make it available to really everybody out there.

       The perception of this situation by the open-source actors did  dramatically  improve  recently.  We,  as
       translators,  won  the  first  battle  and  convinced  everybody  of  the  translations'  importance. But
       unfortunately, it was the easy part. Now, we have to do the job and actually translate all this stuff.

       Actually, open-source software themselves benefit of a rather decent level of translation, thanks to  the
       wonderful  gettext tool suite. It is able to extract the strings to translate from the program, present a
       uniform format to translators, and then use the result of their works at run time to  display  translated
       messages to the user.

       But  the  situation  is  rather  different  when  it  comes  to  documentation. Too often, the translated
       documentation is not visible enough (not distributed as a part of the program), only partial, or  not  up
       to  date.  This  last situation is by far the worst possible one. Outdated translation can turn out to be
       worse than no translation at all to the users by describing old program behavior which  are  not  in  use
       anymore.

   The problem to solve
       Translating  documentation is not very difficult in itself. Texts are far longer than the messages of the
       program and thus take longer to be achieved, but no technical skill  is  really  needed  to  do  so.  The
       difficult part comes when you have to maintain your work. Detecting which parts did change and need to be
       updated  is  very  difficult,  error-prone  and highly unpleasant. I guess that this explains why so much
       translated documentation out there are outdated.

   The po4a answers
       So, the whole point of po4a is to make the documentation translation maintainable. The idea is  to  reuse
       the  gettext  methodology  to  this  new  field. Like in gettext, texts are extracted from their original
       locations in order to be presented in a uniform format to the translators. The  classical  gettext  tools
       help them updating their works when a new release of the original comes out. But to the difference of the
       classical  gettext model, the translations are then re-injected in the structure of the original document
       so that they can be processed and distributed just like the English version.

       Thanks to this, discovering which parts of the document were changed and  need  an  update  becomes  very
       easy.  Another  good  point  is  that  the  tools will make almost all the work when the structure of the
       original document gets fundamentally reorganized and when some  chapters  are  moved  around,  merged  or
       split.  By  extracting the text to translate from the document structure, it also keeps you away from the
       text formatting complexity and reduces your chances to get  a  broken  document  (even  if  it  does  not
       completely prevent you to do so).

       Please  also  see  the  FAQ  below  in  this  document  for  a  more  complete list of the advantages and
       disadvantages of this approach.

   Supported formats
       Currently, this approach has been successfully implemented to several kinds of text formatting formats:

       man

       The good old manual pages' format, used by so much programs out there. The po4a support is  very  welcome
       here  since  this  format  is  somewhat  difficult  to  use  and not really friendly to the newbies.  The
       Locale::Po4a::Man(3pm) module also supports the mdoc format, used by the BSD man  pages  (they  are  also
       quite common on Linux).

       pod

       This  is the Perl Online Documentation format. The language and extensions themselves are documented that
       way, as well as most of the existing Perl scripts. It makes easy to keep the documentation close  to  the
       actual  code by embedding them both in the same file. It makes programmer life easier, but unfortunately,
       not the translator one.

       sgml

       Even if somewhat superseded by XML nowadays, this format is still used rather often for  documents  which
       are  more  than  a few screens long. It allows you to make complete books. Updating the translation of so
       long documents can reveal to be a real nightmare. diff reveals often useless when the original  text  was
       re-indented after update. Fortunately, po4a can help you in that process.

       Currently,  only  the  DebianDoc and DocBook DTD are supported, but adding support to a new one is really
       easy. It is even possible to use po4a on an unknown SGML DTD without changing the code by  providing  the
       needed information on the command line. See Locale::Po4a::Sgml(3pm) for details.

       TeX / LaTeX

       The  LaTeX  format  is a major documentation format used in the Free Software world and for publications.
       The  Locale::Po4a::LaTeX(3pm)  module  was  tested  with  the  Python  documentation,  a  book  and  some
       presentations.

       texinfo

       All  the  GNU  documentation  is  written in this format (that's even one of the requirement to become an
       official GNU project).  The support for Locale::Po4a::Texinfo(3pm) in po4a is  still  at  the  beginning.
       Please report bugs and feature requests.

       xml

       The XML format is a base format for many documentation formats.

       Currently, the DocBook DTD is supported by po4a. See Locale::Po4a::Docbook(3pm) for details.

       others

       Po4a  can  also  handle  some  more rare or specialized formats, such as the documentation of compilation
       options for the 2.4.x kernels or the diagrams produced by the dia tool. Adding a new one  is  often  very
       easy   and   the   main   task   is   to   come   up   with   a   parser   of  your  target  format.  See
       Locale::Po4a::TransTractor(3pm) for more information about this.

   Unsupported formats
       Unfortunately, po4a still lacks support for several documentation formats.

       There is a whole bunch of other formats we would like to support in  po4a,  and  not  only  documentation
       ones.  Indeed,  we  aim at plugging all "market holes" left by the classical gettext tools.  It encompass
       package descriptions (deb and rpm), package installation scripts questions, package changelogs,  and  all
       specialized file formats used by the programs such as game scenarios or wine resource files.

How to use po4a?

       This  chapter  is  a  sort  of  reference manual, trying to answer the users' questions and to give you a
       better understanding of the whole process. This introduces how to do things with po4a  and  serve  as  an
       introduction to the documentation of the specific tools.

   Graphical overview
       The  following schema gives an overview of the process of translating documentation using po4a. Do not be
       afraid by its apparent complexity, it comes from the fact that the whole  process  is  represented  here.
       Once you converted your project to po4a, only the right part of the graphic is relevant.

       Note that master.doc is taken as an example for the documentation to be translated and translation.doc is
       the  corresponding  translated  text.   The suffix could be .pod, .xml, or .sgml depending on its format.
       Each part of the picture will be detailed in the next sections.

                                          master.doc
                                              |
                                              V
            +<-----<----+<-----<-----<--------+------->-------->-------+
            :           |                     |                        :
       {translation}    |         { update of master.doc }             :
            :           |                     |                        :
          XX.doc        |                     V                        V
       (optional)       |                 master.doc ->-------->------>+
            :           |                   (new)                      |
            V           V                     |                        |
         [po4a-gettextize]   doc.XX.po--->+   |                        |
                 |            (old)       |   |                        |
                 |              ^         V   V                        |
                 |              |     [po4a-updatepo]                  |
                 V              |           |                          V
          translation.pot       ^           V                          |
                 |              |         doc.XX.po                    |
                 |              |         (fuzzy)                      |
          { translation }       |           |                          |
                 |              ^           V                          V
                 |              |     {manual editing}                 |
                 |              |           |                          |
                 V              |           V                          V
             doc.XX.po --->---->+<---<---- doc.XX.po   addendum     master.doc
             (initial)                   (up-to-date) (optional)   (up-to-date)
                 :                          |            |             |
                 :                          V            |             |
                 +----->----->----->------> +            |             |
                                            |            |             |
                                            V            V             V
                                            +------>-----+------<------+
                                                         |
                                                         V
                                                  [po4a-translate]
                                                         |
                                                         V
                                                       XX.doc
                                                   (up-to-date)

       On the left part, the conversion of a translation not using po4a to this system is shown. On the  top  of
       the  right  part, the action of the original author is depicted (updating the documentation).  The middle
       of the right part is where the automatic actions of po4a are depicted. The new  material  are  extracted,
       and  compared  against  the  exiting  translation.  Parts  which  didn't  change  are found, and previous
       translation is used. Parts which where partially modified are also connected to the previous translation,
       but with a specific marker indicating that the translation must be updated.  The  bottom  of  the  figure
       shows how a formatted document is built.

       Actually,  as a translator, the only manual operation you have to do is the part marked {manual editing}.
       Yeah, I'm sorry, but po4a helps you translate.  It does not translate anything for you...

   HOWTO begin a new translation?
       This section presents the needed steps required to begin a new translation  with  po4a.  The  refinements
       involved in converting an existing project to this system are detailed in the relevant section.

       To begin a new translation using po4a, you have to do the following steps:

       - Extract  the  text  which  have  to  be  translated  from the original <master.doc> document into a new
         translation template <translation.pot> file (the gettext format). For  that,  use  the  po4a-gettextize
         program this way:

           $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>

         <format>  is  naturally  the  format used in the master.doc document. As expected, the output goes into
         translation.pot.  Please refer to po4a-gettextize(1) for more details about the existing options.

       - Actually translate what should be translated. For that, you have to rename the POT file for example  to
         doc.XX.po (where XX is the ISO639 code of the language you are translating to, e.g. fr for French), and
         edit the resulting file. It is often a good idea to not name the file XX.po to avoid confusion with the
         translation  of  the program messages, but this your call.  Don't forget to update the PO file headers,
         they are important.

         The actual translation can be done using the Emacs' or Vi's PO mode, Lokalize (KDE based),  Gtranslator
         (GNOME based) or whichever program you prefer to use for them (e.g. Virtaal).

         If  you  wish  to  learn  more about this, you definitively need to refer to the gettext documentation,
         available in the gettext-doc package.

   HOWTO change the translation back to a documentation file?
       Once you're done with the translation, you want to get the translated documentation and distribute it  to
       users  along  with  the original one.  For that, use the po4a-translate(1) program like that (where XX is
       the language code):

         $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>

       As before, <format> is the format used in the master.doc document. But this time, the  PO  file  provided
       with the -p flag is part of the input. This is your translation. The output goes into XX.doc.

       Please refer to po4a-translate(1) for more details.

   HOWTO update a po4a translation?
       To  update  your  translation  when  the  original  master.doc file has changed, use the po4a-updatepo(1)
       program like that:

         $ po4a-updatepo -f <format> -m <new_master.doc> -p <old_doc.XX.po>

       (Please refer to po4a-updatepo(1) for more details)

       Naturally, the new paragraph in the document won't get magically translated in  the  PO  file  with  this
       operation,  and  you'll  need  to  update  the  PO  file  manually.  Likewise, you may have to rework the
       translation for paragraphs which were modified a bit. To make sure you won't miss any of them,  they  are
       marked  as  "fuzzy"  during  the process and you have to remove this marker before the translation can be
       used by po4a-translate.  As for the initial translation, the best is to use your favorite PO editor here.

       Once your PO file is up-to-date again, without any untranslated or fuzzy string left, you can generate  a
       translated documentation file, as explained in the previous section.

   HOWTO convert a pre-existing translation to po4a?
       Often,  you  used to translate manually the document happily until a major reorganization of the original
       master.doc document happened. Then, after some unpleasant tries with diff or similar tools, you  want  to
       convert  to po4a.  But of course, you don't want to loose your existing translation in the process. Don't
       worry, this case is also handled by po4a tools and is called gettextization.

       The key here is to have the same structure in the translated document and in the original one so that the
       tools can match the content accordingly.

       If you are lucky (i.e., if the structures of both documents perfectly match), it will work seamlessly and
       you will be set in a few seconds. Otherwise, you may understand why this process has such an  ugly  name,
       and  you'd  better be prepared to some grunt work here. In any case, remember that it is the price to pay
       to get the comfort of po4a afterward. And the good point is that you have to do so only once.

       I cannot emphasis this too much. In order to ease the process, it is thus important  that  you  find  the
       exact  version  which  were used to do the translation. The best situation is when you noted down the VCS
       revision used for the translation and you didn't modify it in the translation process, so  that  you  can
       use it.

       It  won't work well when you use the updated original text with the old translation. It remains possible,
       but is harder and really should be avoided if possible. In fact, I guess that if you  fail  to  find  the
       original  text again, the best solution is to find someone to do the gettextization for you (but, please,
       not me ;).

       Maybe I'm too dramatic here. Even  when  things  go  wrong,  it  remains  ways  faster  than  translating
       everything  again.  I was able to gettextize the existing French translation of the Perl documentation in
       one day, even though things did went wrong. That  was  more  than  two  megabytes  of  text,  and  a  new
       translation would have lasted months or more.

       Let  me  explain  the  basis  of the procedure first and I will come back on hints to achieve it when the
       process goes wrong. To ease comprehension, let's use above example once again.

       Once you have the old master.doc again which matches with the translation XX.doc, the gettextization  can
       be done directly to the PO file doc.XX.po without manual translation of translation.pot file:

        $ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>

       When  you're lucky, that's it. You converted your old translation to po4a and can begin with the updating
       task right away. Just follow the procedure explained a few section ago to synchronize your PO  file  with
       the newest original document, and update the translation accordingly.

       Please  note that even when things seem to work properly, there is still room for errors in this process.
       The point is that po4a is unable to understand the text to make  sure  that  the  translation  match  the
       original.  That's  why  all  strings  are marked as "fuzzy" in the process. You should check each of them
       carefully before removing those markers.

       Often the document structures  don't  match  exactly,  preventing  po4a-gettextize  from  doing  its  job
       properly. At that point, the whole game is about editing the files to get their damn structures matching.

       It  may  help  to  read  the section Gettextization: how does it work? below.  Understanding the internal
       process will help you to make this work. The good point is that po4a-gettextize is rather  verbose  about
       what went wrong when it happens. First, it pinpoints where in the documents the structures' discrepancies
       are.  You  will  learn the strings that don't match, their positions in the text, and the type of each of
       them. Moreover, the PO file generated so far will be dumped to gettextization.failed.po.

       -   Remove all extra parts of the translations, such as the section in which you give the translator name
           and thank every people who contributed to the translation. Addenda, which are described in  the  next
           section, will allow you to re-add them afterward.

       -   Do not hesitate to edit both the original and the translation. The most important thing is to get the
           PO  file. You will be able to update it afterward. That being said, editing the translation should be
           preferred when both are possible since it makes things easier when the gettextization is done.

       -   If needed, kill some parts of the original if they happen to not be  translated.  When  synchronizing
           the PO with the document afterward, they will come back from themselves.

       -   If  you  changed  the  structure  a  bit  (to merge two paragraphs, or split another one), undo those
           changes. If there are issues in the original, you should inform the original author. Fixing  them  in
           your  translation  only  fixes  them  for a part of the community. And moreover, it's impossible when
           using po4a ;)

       -   Sometimes, the paragraph content does match, but their types  don't.  Fixing  it  is  rather  format-
           dependant. In POD and man, it often comes from the fact that one of the two contains a line beginning
           with  a  white  space where the other doesn't. In those formats, such paragraph cannot be wrapped and
           thus become a different type. Just remove the space and you are fine. It may also be a  typo  in  the
           tag name.

           Likewise,  two  paragraphs  may  get  merged  together  in POD when the separating line contains some
           spaces, or when there is no empty line between the =item line and the content of the item.

       -   Sometimes, there is a desynchronization between the files, and the translation  is  attached  to  the
           wrong  original  paragraph.  It  is  the  sign  that  the real problem was before in the files. Check
           gettextization.failed.po to see when the desynchronization begins, and fix it there.

       -   Sometimes, you get the strong feeling that po4a ate some parts of the text, either  the  original  or
           the translation. gettextization.failed.po indicates that both of them where gently matching, and then
           the  gettextization  fails because it tried to match one paragraph with the one after (or before) the
           right one, as if the right one disappeared. Curse po4a as  I  did  when  it  first  happened  to  me.
           Generously.

           This  unfortunate  situation  happens  when the same paragraph is repeated over the document. In that
           case, no new entry is created in the PO file, but a new  reference  is  added  to  the  existing  one
           instead.

           So,  when  the  same paragraph appears twice in the original but both are not translated in the exact
           same way each time, you will get the feeling that a paragraph of the original disappeared. Just  kill
           the  new  translation.  If  you  prefer to kill the first translation instead when the second one was
           actually better, remove the second one from where it is and put the first one in  the  place  of  the
           second one.

           In  the  contrary, if two similar but different paragraphs were translated in the exact same way, you
           will get the feeling that a paragraph of the translation disappeared. A solution is to add  a  stupid
           string  to  the  original  paragraph  (such  as  "I'm different"). Don't be afraid, those things will
           disappear during the synchronization, and when the added text is short  enough,  gettext  will  match
           your  translation  to  the  existing  text  (marking it as fuzzy, but you don't really care since all
           strings are fuzzy after gettextization).

       Hopefully, those tips will help you making your gettextization work and obtain your precious PO file. You
       are now ready to synchronize your file and begin your translation. Please note that on large text, it may
       happen that the first synchronization takes a long time.

       For example, the first po4a-updatepo of the Perl documentation's French translation (5.5 Mb PO file) took
       about two days full on a 1Ghz G5 computer.  Yes, 48 hours. But the subsequent ones only take a  dozen  of
       seconds  on  my  old laptop. This is because the first time, most of the msgid of the PO file don't match
       any of the POT file ones. This forces gettext to search  for  the  closest  one  using  a  costly  string
       proximity algorithm.

   HOWTO add extra text to translations (like translator's name)?
       Because  of  the  gettext  approach,  doing  this  becomes more difficult in po4a than it was when simply
       editing a new file along the original one. But it remains possible, thanks to the so-called addenda.

       It may help the comprehension to consider addenda as a sort of patches applied to the localized  document
       after  processing.  They are rather different from the usual patches (they have only one line of context,
       which can embed Perl regular expression, and they can only add new text without removing  any),  but  the
       functionalities are the same.

       Their  goal  is to allow the translator to add extra content to the document which is not translated from
       the original document. The most common usage is to add a section about the  translation  itself,  listing
       contributors and explaining how to report bug against the translation.

       An  addendum must be provided as a separate file. The first line constitutes a header indicating where in
       the produced document they should be placed. The rest of the addendum file will be added verbatim at  the
       determined position of the resulting document.

       The  header  has  a  pretty rigid syntax: It must begin with the string PO4A-HEADER:, followed by a semi-
       colon (;) separated list of key=value fields. White spaces ARE important. Note that you  cannot  use  the
       semi-colon char (;) in the value, and that quoting it doesn't help.

       Again, it sounds scary, but the examples given below should help you to find how to write the header line
       you  need.  To illustrate the discussion, assume we want to add a section called "About this translation"
       after the "About this document" one.

       Here are the possible header keys:

       position (mandatory)
           a regexp. The addendum will be placed near the line matching this regexp.  Note that  we're  speaking
           about  the  translated document here, not the original. If more than a line match this expression (or
           none), the addition will fail. It is indeed better to report an error than inserting the addendum  at
           the wrong location.

           This  line is called position point in the following. The point where the addendum is added is called
           insertion point. Those two points are near one from another, but not equal. For example, if you  want
           to insert a new section, it is easier to put the position point on the title of the preceding section
           and  explain  po4a  where  the  section ends (remember that position point is given by a regexp which
           should match a unique line).

           The localization of the insertion point with regard to the position point is controlled by the  mode,
           beginboundary and endboundary fields, as explained below.

           In our case, we would have:

                position=<title>About this document</title>

       mode (mandatory)
           It can be either the string before or after, specifying the position of the addendum, relative to the
           position point.

           Since we want the new section to be placed below the one we are matching, we have:

                mode=after

       beginboundary (used only when mode=after, and mandatory in that case)
       endboundary (idem)
           regexp matching the end of the section after which the addendum goes.

           When  mode=after,  the  insertion  point  is  after the position point, but not directly after! It is
           placed at the end of the section beginning at the position point, i.e.,  after  or  before  the  line
           matched by the ???boundary argument, depending on whether you used beginboundary or endboundary.

           In our case, we can choose to indicate the end of the section we match by adding:

              endboundary=</section>

           or to indicate the beginning of the next section by indicating:

              beginboundary=<section>

           In  both  case,  our addendum will be placed after the </section> and before the <section>. The first
           one is better since it will work even if the document gets reorganized.

           Both forms exist because documentation formats are different. In some of them, there is a way to mark
           the end of a section (just like the </section> we just used), while some other don't explicitly  mark
           the end of section (like in man). In the former case, you want to make a boundary matching the end of
           a  section,  so  that  the  insertion  point  comes  after it. In the latter case, you want to make a
           boundary matching the beginning of the next section, so that the insertion point  comes  just  before
           it.

       This can seem obscure, but hopefully, the next examples will enlighten you.

        To sum up the example we used so far, in order to add a section called "About this translation" after
       the "About this document" one in a SGML document, you can use either of those header lines:
          PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
          PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>

        If you want to add something after the following nroff section:
           .SH "AUTHORS"

         you  should  put  a position matching this line, and a beginboundary matching the beginning of the next
         section (i.e., ^\.SH). The addendum will then be added after the position point and immediately  before
         the first line matching the beginboundary. That is to say:

          PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH

        If you want to add something into a section (like after "Copyright Big Dude") instead of adding a whole
       section, give a position matching this line, and give a beginboundary matching any line.
          PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^

       If you want to add something at the end of the document, give a position matching any line of your
       document (but only one line. Po4a won't proceed if it's not unique), and give an endboundary matching
       nothing. Don't use simple strings here like "EOF", but prefer those which have less chance to be in your
       document.
          PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary

       In any case, remember that these are regexp. For example, if you want to match the end of a nroff section
       ending with the line

         .fi

       don't  use  .fi  as endboundary, because it will match with "the[ fi]le", which is obviously not what you
       expect. The correct endboundary in that case is: ^\.fi$.

       If the addendum doesn't go where you expected, try to pass the -vv argument to the tools,  so  that  they
       explain you what they do while placing the addendum.

       More detailed example

       Original document (POD formatted):

        |=head1 NAME
        |
        |dummy - a dummy program
        |
        |=head1 AUTHOR
        |
        |me

       Then,  the following addendum will ensure that a section (in French) about the translator is added at the
       end of the file. (in French, "TRADUCTEUR" means "TRANSLATOR", and "moi" means "me")

        |PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
        |
        |=head1 TRADUCTEUR
        |
        |moi

       In order to put your addendum before the AUTHOR, use the following header:

        PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1

       This works because  the  next  line  matching  the  beginboundary  /^=head1/  after  the  section  "NAME"
       (translated  to  "NOM" in French), is the one declaring the authors. So, the addendum will be put between
       both sections.

   HOWTO do all this in one program invocation?
       The use of po4a proved to be a bit error prone for the  users  since  you  have  to  call  two  different
       programs  in  the  right  order (po4a-updatepo and then po4a-translate), each of them needing more than 3
       arguments. Moreover, it was difficult with this system to use only one PO file  for  all  your  documents
       when more than one format was used.

       The  po4a(1)  program  was  designed  to  solve those difficulties. Once your project is converted to the
       system, you write a simple configuration file explaining where your translation files are (PO  and  POT),
       where the original documents are, their formats and where their translations should be placed.

       Then,  calling  po4a(1)  on  this  file  ensures  that the PO files are synchronized against the original
       document, and that the translated document are generated properly. Of course, you will want to call  this
       program  twice:  once  before  editing  the PO file to update them and once afterward to get a completely
       updated translated document. But you only need to remember one command line.

   HOWTO customize po4a?
       po4a modules have options (specified with the -o option) that can be used to change the module behavior.

       It is also possible to customize a module or new / derivative / modified modules by putting a  module  in
       lib/Locale/Po4a/,  and  adding  lib  to  the  paths specified by the PERLLIB or PERL5LIB environment. For
       example:

          PERLLIB=$PWD/lib po4a --previous po4a/po4a.cfg

       Note: the actual name of the lib directory is not important.

How does it work?

       This chapter gives you a brief overview of the po4a internals, so that you may  feel  more  confident  to
       help  us  maintaining  and  improving  it. It may also help you understanding why it does not do what you
       expected, and how to solve your problems.

   What's the big picture here?
       The po4a architecture is object oriented (in Perl. Isn't that neat?). The common ancestor to  all  parser
       classes  is  called  TransTractor.  This  strange name comes from the fact that it is at the same time in
       charge of translating document and extracting strings.

       More formally, it takes a document to translate plus a PO file containing  the  translations  to  use  as
       input  while producing two separate outputs: Another PO file (resulting of the extraction of translatable
       strings from the input document), and a translated document (with the same structure than the input  one,
       but  with  all  translatable  strings  replaced  with  content  of  the  input  PO).  Here is a graphical
       representation of this:

          Input document --\                             /---> Output document
                            \      TransTractor::       /       (translated)
                             +-->--   parse()  --------+
                            /                           \
          Input PO --------/                             \---> Output PO
                                                                (extracted)

       This little bone is the core of all the po4a architecture. If you  omit  the  input  PO  and  the  output
       document,  you  get  po4a-gettextize.  If  you  provide  both  input and disregard the output PO, you get
       po4a-translate.

       TransTractor::parse() is a virtual function implemented by each module. Here is a little example to  show
       you how it works. It parses a list of paragraphs, each of them beginning with <p>.

         1 sub parse {
         2   PARAGRAPH: while (1) {
         3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
         4     $my $first=1;
         5     while (($line,$lref)=$document->shiftline() && defined($line)) {
         6       if ($line =~ m/<p>/ && !$first--; ) {
         7         $document->unshiftline($line,$lref);
         8
         9         $paragraph =~ s/^<p>//s;
        10         $document->pushline("<p>".$document->translate($paragraph,$pararef));
        11
        12         next PARAGRAPH;
        13       } else {
        14         $paragraph .= $line;
        15         $pararef = $lref unless(length($pararef));
        16       }
        17     }
        18     return; # Did not got a defined line? End of input file.
        19   }
        20 }

       On  line 6, we encounter <p> for the second time. That's the signal of the next paragraph. We should thus
       put the just obtained line back into the original document (line 7) and push the paragraph built  so  far
       into  the  outputs. After removing the leading <p> of it on line 9, we push the concatenation of this tag
       with the translation of the rest of the paragraph.

       This translate() function is very cool. It pushes its argument into the output PO file  (extraction)  and
       returns  its  translation  as  found  in  the input PO file (translation). Since it's used as part of the
       argument of pushline(), this translation lands into the output document.

       Isn't that cool? It is possible to build a complete po4a module in less than 20 lines when the format  is
       simple enough...

       You can learn more about this in Locale::Po4a::TransTractor(3pm).

   Gettextization: how does it work?
       The  idea  here  is  to take the original document and its translation, and to say that the Nth extracted
       string from the translation is the translation of the Nth extracted string from the original. In order to
       work, both files must share exactly the same structure. For example, if  the  files  have  the  following
       structure,  it is very unlikely that the 4th string in translation (of type 'chapter') is the translation
       of the 4th string in original (of type 'paragraph').

           Original         Translation

         chapter            chapter
           paragraph          paragraph
           paragraph          paragraph
           paragraph        chapter
         chapter              paragraph
           paragraph          paragraph

       For that, po4a parsers are used on both the original and the translation files to extract PO  files,  and
       then a third PO file is built from them taking strings from the second as translation of strings from the
       first.  In  order  to check that the strings we put together are actually the translations of each other,
       document parsers in po4a should put information about the syntactical type of extracted  strings  in  the
       document  (all  existing ones do so, yours should also). Then, this information is used to make sure that
       both documents have the same syntax. In the previous example, it would allow us to detect that  string  4
       is a paragraph in one case, and a chapter title in another case and to report the problem.

       In  theory,  it would be possible to detect the problem, and resynchronize the files afterward (just like
       diff does). But what we should do of the few strings before desynchronizations is not clear, and it would
       produce bad results some times. That's why the current implementation don't try to resynchronize anything
       and verbosely fail when something goes wrong, requiring manual modification of files to fix the problem.

       Even with these precautions, things can go wrong very easily here. That's why  all  translations  guessed
       this way are marked fuzzy to make sure that the translator reviews and checks them.

   Addendum: How does it work?
       Well,  that's  pretty  easy  here.  The  translated document is not written directly to disk, but kept in
       memory until all the addenda are applied. The algorithms involved here  are  rather  straightforward.  We
       look  for a line matching the position regexp, and insert the addendum before it if we're in mode=before.
       If not, we search for the next line matching the boundary and insert the addendum after this line if it's
       an endboundary or before this line if it's a beginboundary.

FAQ

       This chapter groups the Frequently Asked Questions. In fact, most of  the  questions  for  now  could  be
       formulated  that  way: "Why is it designed this way, and not that one?" If you think po4a isn't the right
       answer to documentation translation, you should consider reading this section. If it does not answer your
       question, please contact us on the <po4a-devel@lists.alioth.debian.org> mailing list. We love feedback.

   Why to translate each paragraph separately?
       Yes, in po4a, each paragraph is translated separately  (in  fact,  each  module  decides  this,  but  all
       existing modules do so, and yours should also).  There are two main advantages to this approach:

       • When  the  technical  parts  of  the document are hidden from the scene, the translator can't mess with
         them. The fewer markers we present to the translator the less error he can do.

       • Cutting the document helps in isolating the changes to the original  document.  When  the  original  is
         modified, finding what parts of the translation need to be updated is eased by this process.

       Even  with  these  advantages,  some people don't like the idea of translating each paragraph separately.
       Here are some of the answers I can give to their fear:

       • This approach proved successfully in the KDE project and allows people there  to  produce  the  biggest
         corpus of translated and up to date documentation I know.

       • The  translators  can  still  use the context to translate, since the strings in the PO file are in the
         same order than in the original document. Translating sequentially is thus  rather  comparable  whether
         you  use po4a or not.  And in any case, the best way to get the context remains to convert the document
         to a printable format since the text formatting ones are not really readable, IMHO.

       • This approach is the one used by professional translators. I agree, that they have  somewhat  different
         goals  than  open-source  translators. The maintenance is for example often less critical to them since
         the content changes rarely.

   Why not to split on sentence level (or smaller)?
       Professional translator tools sometimes split the document at the sentence level in order to maximize the
       reusability of previous translations and speed up their process.  The problem is that the  same  sentence
       may have several translations, depending on the context.

       Paragraphs  are  by  definition  longer  than  sentences.  It  will hopefully ensure that having the same
       paragraph in two documents will have the same meaning (and translation), regardless  of  the  context  in
       each case.

       Splitting  on  smaller  parts  than the sentence would be very bad. It would be a bit long to explain why
       here, but interested reader can refer to the Locale::Maketext::TPJ13(3pm) man page (which comes with  the
       Perl  documentation),  for  example.  To  make short, each language has its specific syntactic rules, and
       there is no way to build sentences by aggregating parts of sentences working for all  existing  languages
       (or even for the 5 of the 10 most spoken ones, or even less).

   Why not put the original as comment along with translation (or the other way around)?
       At  the  first  glance,  gettext doesn't seem to be adapted to all kind of translations.  For example, it
       didn't seemed adapted to debconf, the interface all Debian packages use for their  interaction  with  the
       user  during installation. In that case, the texts to translate were pretty short (a dozen lines for each
       package), and it was difficult to put the translation in a specialized file since it has to be  available
       before the package installation.

       That's  why the debconf developer decided to implement another solution, where translations are placed in
       the same file than the original. This is rather appealing. One would even want to do this  for  XML,  for
       example. It would look like that:

        <section>
         <title lang="en">My title</title>
         <title lang="fr">Mon titre</title>

         <para>
          <text lang="en">My text.</text>
          <text lang="fr">Mon texte.</text>
         </para>
        </section>

       But  it  was  so problematic that a PO-based approach is now used. Only the original can be edited in the
       file, and the translations must take place in PO files extracted from the  master  template  (and  placed
       back at package compilation time). The old system was deprecated because of several issues:

       •   maintenance problems

           If several translators provide a patch at the same time, it gets hard to merge them together.

           How  will  you detect changes to the original, which need to be applied to the translations? In order
           to use diff, you have to note which version of the original you translated. I.e., you need a PO  file
           in your file ;)

       •   encoding problems

           This  solution  is  viable when only European languages are involved, but the introduction of Korean,
           Russian and/or Arab really complicate the picture.  UTF could be a solution, but there are still some
           problems with it.

           Moreover, such problems are hard to detect (i.e., only Korean readers will detect that  the  encoding
           of Korean is broken [because of the Russian translator])

       gettext solves all those problems together.

   But gettext wasn't designed for that use!
       That's  true,  but  until  now  nobody  came with a better solution. The only known alternative is manual
       translation, with all the maintenance issues.

   What about the other translation tools for documentation using gettext?
       As far as I know, there are only two of them:

       poxml
           This is the tool developed by KDE people to handle DocBook XML. AFAIK, it was the  first  program  to
           extract strings to translate from documentation to PO files, and inject them back after translation.

           It  can  only  handle  XML,  and only a particular DTD. I'm quite unhappy with the handling of lists,
           which end in one big msgid. When the list become big, the chunk becomes harder to shallow.

       po-debiandoc
           This program done by Denis Barbier is a sort of precursor of the po4a SGML module, which more or less
           deprecates it. As the name says, it handles  only  the  DebianDoc  DTD,  which  is  more  or  less  a
           deprecated DTD.

       The  main advantages of po4a over them are the ease of extra content addition (which is even worse there)
       and the ability to achieve gettextization.

   Educating developers about translation
       When you try to translate documentation or programs, you face three kinds of problems;  linguistics  (not
       everybody  speaks  two  languages),  technical  (that's  why  po4a  exists) and relational/human. Not all
       developers understand the necessity of translating stuff. Even when good willed, they may ignore  how  to
       ease  the  work  of  translators.  To  help  with that, po4a comes with lot of documentation which can be
       referred to.

       Another important point is that each translated file begins with a short comment indicating what the file
       is, how to use it. This should help the poor developers flooded with tons of files in different languages
       they hardly speak, and help them dealing correctly with it.

       In the po4a project, translated documents are not source files anymore. Since SGML files  are  habitually
       source files, it's an easy mistake. That's why all files present this header:

        |       *****************************************************
        |       *           GENERATED FILE, DO NOT EDIT             *
        |       * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
        |       *****************************************************
        |
        | This file was generated by po4a-translate(1). Do not store it (in VCS,
        | for example), but store the PO file used as source file by po4a-translate.
        |
        | In fact, consider this as a binary, and the PO file as a regular source file:
        | If the PO gets lost, keeping this translation up-to-date will be harder ;)

       Likewise,  gettext's  regular  PO  files only need to be copied to the po/ directory. But this is not the
       case of the ones manipulated by po4a. The major risk  here  is  that  a  developer  erases  the  existing
       translation  of  his  program with the translation of his documentation. (Both of them can't be stored in
       the same PO file, because the program  needs  to  install  its  translation  as  an  mo  file  while  the
       documentation  only  uses  its  translation at compile time). That's why the PO files produced by the po-
       debiandoc module contain the following header:

        #
        #  ADVISES TO DEVELOPERS:
        #    - you do not need to manually edit POT or PO files.
        #    - this file contains the translation of your debconf templates.
        #      Do not replace the translation of your program with this !!
        #        (or your translators will get very upset)
        #
        #  ADVISES TO TRANSLATORS:
        #    If you are not familiar with the PO format, gettext documentation
        #     is worth reading, especially sections dedicated to this format.
        #    For example, run:
        #         info -n '(gettext)PO Files'
        #         info -n '(gettext)Header Entry'
        #
        #    Some information specific to po-debconf are available at
        #            /usr/share/doc/po-debconf/README-trans
        #         or http://www.debian.org/intl/l10n/po-debconf/README-trans
        #

   SUMMARY of the advantages of the gettext based approach
       • The translations are not stored along  with  the  original,  which  makes  it  possible  to  detect  if
         translations become out of date.

       • The  translations are stored in separate files from each other, which prevents translators of different
         languages from interfering, both when submitting their patch and at the file encoding level.

       • It is based internally on gettext (but po4a offers a very simple interface so that you  don't  need  to
         understand the internals to use it).  That way, we don't have to re-implement the wheel, and because of
         their wide use, we can think that these tools are more or less bug free.

       • Nothing  changed  for  the end-user (beside the fact translations will hopefully be better maintained).
         The resulting documentation file distributed is exactly the same.

       • No need for translators to learn a new file syntax and their favorite PO file editor  (like  Emacs'  PO
         mode, Lokalize or Gtranslator) will work just fine.

       • gettext  offers a simple way to get statistics about what is done, what should be reviewed and updated,
         and what is still to do. Some example can be found at those addresses:

          - http://kv-53.narod.ru/kaider1.png
          - http://www.debian.org/intl/l10n/

       But everything isn't green, and this approach also has some disadvantages we have to deal with.

       • Addenda are... strange at the first glance.

       • You can't adapt the translated text to your preferences, like splitting a paragraph here,  and  joining
         two  other ones there. But in some sense, if there is an issue with the original, it should be reported
         as a bug anyway.

       • Even with an easy interface, it remains a new tool people have to learn.

         One of my dreams would be to integrate somehow po4a to Gtranslator or Lokalize. When an  SGML  file  is
         opened, the strings are automatically extracted.  When it's saved a translated SGML file can be written
         to  disk. If we manage to do an MS Word (TM) module (or at least RTF) professional translators may even
         use it.

AUTHORS

        Denis Barbier <barbier,linuxfr.org>
        Martin Quinson (mquinson#debian.org)

Po4a Tools                                         2016-01-05                                            PO4A(7)