Provided by: po4a_0.52-1_all bug

NAME

       po4a - framework to translate documentation and other materials

Introduction

       The po4a (PO for anything) project goal is to ease translations (and more interestingly, the maintenance
       of translations) using gettext tools on areas where they were not expected like documentation.

Table of content

       This document is organized as follow:

       1 Why should I use po4a? What is it good for?
           This  introducing  chapter explains the motivation of the project and its philosophy. You should read
           it first if you are in the process of evaluating po4a for your own translations.

       2 How to use po4a?
           This chapter is a sort of reference manual, trying to answer the users' questions and to give  you  a
           better understanding of the whole process. This introduces how to do things with po4a and serve as an
           introduction to the documentation of the specific tools.

           HOWTO begin a new translation?
           HOWTO change the translation back to a documentation file?
           HOWTO update a po4a translation?
           HOWTO convert a pre-existing translation to po4a?
           HOWTO add extra text to translations (like translator's name)?
           HOWTO do all this in one program invocation?
           HOWTO customize po4a?
       3 How does it work?
           This chapter gives you a brief overview of the po4a internals, so that you may feel more confident to
           help  us maintaining and improving it. It may also help you understanding why it does not do what you
           expected, and how to solve your problems.

       4 FAQ
           This chapter groups the Frequently Asked Questions. In fact, most of the questions for now  could  be
           formulated  that  way:  "Why  is it designed this way, and not that one?" If you think po4a isn't the
           right answer to documentation translation, you should consider reading this section. If it  does  not
           answer  your question, please contact us on the <po4a-devel@lists.alioth.debian.org> mailing list. We
           love feedback.

       5 Specific notes about modules
           This chapter presents the specificities of each module from  the  translator  and  original  author's
           point  of  view.  Read  this  to  learn  the syntax you will encounter when translating stuff in this
           module, or the rules you should follow in your original document to make translators' life easier.

           Actually, this section is not really part of this document. Instead, it is placed  in  each  module's
           documentation.  This  helps  ensuring that the information is up to date by keeping the documentation
           and the code together.

Why should I use po4a? What is it good for?

       I like the idea of open-source software, making it possible for everybody  to  access  software  and  its
       source  code.  But  being  French,  I'm  well aware that the licensing is not the only restriction to the
       openness of software: non-translated free software is useless for non-English speakers, and we still have
       some work to make it available to really everybody out there.

       The perception of this situation by the open-source actors did  dramatically  improve  recently.  We,  as
       translators,  won  the  first  battle  and  convinced  everybody  of  the  translations'  importance. But
       unfortunately, it was the easy part. Now, we have to do the job and actually translate all this stuff.

       Actually, open-source software themselves benefit of a rather decent level of translation, thanks to  the
       wonderful  gettext tool suite. It is able to extract the strings to translate from the program, present a
       uniform format to translators, and then use the result of their works at run time to  display  translated
       messages to the user.

       But  the  situation  is  rather  different  when  it  comes  to  documentation. Too often, the translated
       documentation is not visible enough (not distributed as a part of the program), only partial, or  not  up
       to  date.  This  last situation is by far the worst possible one. Outdated translation can turn out to be
       worse than no translation at all to the users by describing old program behavior which  are  not  in  use
       anymore.

   The problem to solve
       Translating  documentation is not very difficult in itself. Texts are far longer than the messages of the
       program and thus take longer to be achieved, but no technical skill  is  really  needed  to  do  so.  The
       difficult part comes when you have to maintain your work. Detecting which parts did change and need to be
       updated  is  very  difficult,  error-prone  and highly unpleasant. I guess that this explains why so much
       translated documentation out there are outdated.

   The po4a answers
       So, the whole point of po4a is to make the documentation translation maintainable. The idea is  to  reuse
       the  gettext  methodology  to  this  new  field. Like in gettext, texts are extracted from their original
       locations in order to be presented in a uniform format to the translators. The  classical  gettext  tools
       help them updating their works when a new release of the original comes out. But to the difference of the
       classical  gettext model, the translations are then re-injected in the structure of the original document
       so that they can be processed and distributed just like the English version.

       Thanks to this, discovering which parts of the document were changed and  need  an  update  becomes  very
       easy.  Another  good  point  is  that  the  tools will make almost all the work when the structure of the
       original document gets fundamentally reorganized and when some  chapters  are  moved  around,  merged  or
       split.  By  extracting the text to translate from the document structure, it also keeps you away from the
       text formatting complexity and reduces your chances to get  a  broken  document  (even  if  it  does  not
       completely prevent you to do so).

       Please  also  see  the  FAQ  below  in  this  document  for  a  more  complete list of the advantages and
       disadvantages of this approach.

   Supported formats
       Currently, this approach has been successfully implemented to several kinds of text formatting formats:

       man

       The good old manual pages' format, used by so much programs out there. The po4a support is  very  welcome
       here  since  this  format  is  somewhat  difficult  to  use  and not really friendly to the newbies.  The
       Locale::Po4a::Man(3pm) module also supports the mdoc format, used by the BSD man  pages  (they  are  also
       quite common on Linux).

       pod

       This  is the Perl Online Documentation format. The language and extensions themselves are documented that
       way, as well as most of the existing Perl scripts. It makes easy to keep the documentation close  to  the
       actual  code by embedding them both in the same file. It makes programmer life easier, but unfortunately,
       not the translator one.

       sgml

       Even if somewhat superseded by XML nowadays, this format is still used rather often for  documents  which
       are  more  than  a few screens long. It allows you to make complete books. Updating the translation of so
       long documents can reveal to be a real nightmare. diff reveals often useless when the original  text  was
       re-indented after update. Fortunately, po4a can help you in that process.

       Currently,  only  the  DebianDoc and DocBook DTD are supported, but adding support to a new one is really
       easy. It is even possible to use po4a on an unknown SGML DTD without changing the code by  providing  the
       needed information on the command line. See Locale::Po4a::Sgml(3pm) for details.

       TeX / LaTeX

       The  LaTeX  format  is a major documentation format used in the Free Software world and for publications.
       The  Locale::Po4a::LaTeX(3pm)  module  was  tested  with  the  Python  documentation,  a  book  and  some
       presentations.

       texinfo

       All  the  GNU  documentation  is  written in this format (that's even one of the requirement to become an
       official GNU project).  The support for Locale::Po4a::Texinfo(3pm) in po4a is  still  at  the  beginning.
       Please report bugs and feature requests.

       xml

       The XML format is a base format for many documentation formats.

       Currently, the DocBook DTD is supported by po4a. See Locale::Po4a::Docbook(3pm) for details.

       others

       Po4a  can  also  handle  some  more rare or specialized formats, such as the documentation of compilation
       options for the 2.4+ Linux kernels or the diagrams produced by the dia tool. Adding a new  one  is  often
       very   easy   and   the   main   task   is  to  come  up  with  a  parser  of  your  target  format.  See
       Locale::Po4a::TransTractor(3pm) for more information about this.

   Unsupported formats
       Unfortunately, po4a still lacks support for several documentation formats.

       There is a whole bunch of other formats we would like to support in  po4a,  and  not  only  documentation
       ones.  Indeed,  we  aim at plugging all "market holes" left by the classical gettext tools.  It encompass
       package descriptions (deb and rpm), package installation scripts questions, package changelogs,  and  all
       specialized file formats used by the programs such as game scenarios or wine resource files.

How to use po4a?

       This  chapter  is  a  sort  of  reference manual, trying to answer the users' questions and to give you a
       better understanding of the whole process. This introduces how to do things with po4a  and  serve  as  an
       introduction to the documentation of the specific tools.

   Graphical overview
       The  following schema gives an overview of the process of translating documentation using po4a. Do not be
       afraid by its apparent complexity, it comes from the fact that the whole  process  is  represented  here.
       Once you converted your project to po4a, only the right part of the graphic is relevant.

       Note that master.doc is taken as an example for the documentation to be translated and translation.doc is
       the  corresponding  translated  text.   The suffix could be .pod, .xml, or .sgml depending on its format.
       Each part of the picture will be detailed in the next sections.

                                          master.doc
                                              |
                                              V
            +<-----<----+<-----<-----<--------+------->-------->-------+
            :           |                     |                        :
       {translation}    |         { update of master.doc }             :
            :           |                     |                        :
          XX.doc        |                     V                        V
        (optional)      |                 master.doc ->-------->------>+
            :           |                   (new)                      |
            V           V                     |                        |
         [po4a-gettextize]   doc.XX.po -->+   |                        |
                 |            (old)       |   |                        |
                 |              ^         V   V                        |
                 |              |     [po4a-updatepo]                  |
                 V              |           |                          V
          translation.pot       ^           V                          |
                 |              |        doc.XX.po                     |
                 |              |         (fuzzy)                      |
          { translation }       |           |                          |
                 |              ^           V                          V
                 |              |     {manual editing}                 |
                 |              |           |                          |
                 V              |           V                          V
             doc.XX.po --->---->+<---<-- doc.XX.po    addendum     master.doc
             (initial)                 (up-to-date)  (optional)   (up-to-date)
                 :                          |            |             |
                 :                          V            |             |
                 +----->----->----->------> +            |             |
                                            |            |             |
                                            V            V             V
                                            +------>-----+------<------+
                                                         |
                                                         V
                                                  [po4a-translate]
                                                         |
                                                         V
                                                       XX.doc
                                                    (up-to-date)

       On the left part, the conversion of a translation not using po4a to this system is shown. On the  top  of
       the  right  part, the action of the original author is depicted (updating the documentation).  The middle
       of the right part is where the automatic actions of po4a are depicted. The new  material  are  extracted,
       and  compared  against  the  exiting  translation.  Parts  which  didn't  change  are found, and previous
       translation is used. Parts which where partially modified are also connected to the previous translation,
       but with a specific marker indicating that the translation must be updated.  The  bottom  of  the  figure
       shows how a formatted document is built.

       Actually,  as a translator, the only manual operation you have to do is the part marked {manual editing}.
       Yeah, I'm sorry, but po4a helps you translate.  It does not translate anything for you...

   HOWTO begin a new translation?
       This section presents the needed steps required to begin a new translation  with  po4a.  The  refinements
       involved in converting an existing project to this system are detailed in the relevant section.

       To begin a new translation using po4a, you have to do the following steps:

       - Extract  the  text  which  have  to  be  translated  from the original <master.doc> document into a new
         translation template <translation.pot> file (the gettext format). For  that,  use  the  po4a-gettextize
         program this way:

           $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>

         <format>  is  naturally  the  format used in the master.doc document. As expected, the output goes into
         translation.pot.  Please refer to po4a-gettextize(1) for more details about the existing options.

       - Actually translate what should be translated. For that, you have to rename the POT file for example  to
         doc.XX.po  (where XX is the ISO 639-1 code of the language you are translating to, e.g. fr for French),
         and edit the resulting file. It is often a good idea to not name the file XX.po to avoid confusion with
         the translation of the program messages, but this your call.   Don't  forget  to  update  the  PO  file
         headers, they are important.

         The  actual translation can be done using the Emacs' or Vi's PO mode, Lokalize (KDE based), Gtranslator
         (GNOME based) or whichever program you prefer to use for them (e.g. Virtaal).

         If you wish to learn more about this, you definitively need to  refer  to  the  gettext  documentation,
         available in the gettext-doc package.

   HOWTO change the translation back to a documentation file?
       Once  you're done with the translation, you want to get the translated documentation and distribute it to
       users along with the original one.  For that, use the po4a-translate(1) program like that  (where  XX  is
       the language code):

         $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>

       As  before,  <format> is the format used in the master.doc document.  But this time, the PO file provided
       with the -p flag is part of the input.  This is your translation. The output goes into XX.doc.

       Please refer to po4a-translate(1) for more details.

   HOWTO update a po4a translation?
       To update your translation when the original  master.doc  file  has  changed,  use  the  po4a-updatepo(1)
       program like that:

         $ po4a-updatepo -f <format> -m <new_master.doc> -p <old_doc.XX.po>

       (Please refer to po4a-updatepo(1) for more details)

       Naturally,  the  new  paragraph  in  the document won't get magically translated in the PO file with this
       operation, and you'll need to update the  PO  file  manually.  Likewise,  you  may  have  to  rework  the
       translation  for  paragraphs which were modified a bit. To make sure you won't miss any of them, they are
       marked as "fuzzy" during the process and you have to remove this marker before  the  translation  can  be
       used by po4a-translate.  As for the initial translation, the best is to use your favorite PO editor here.

       Once  your PO file is up-to-date again, without any untranslated or fuzzy string left, you can generate a
       translated documentation file, as explained in the previous section.

   HOWTO convert a pre-existing translation to po4a?
       Often, you used to translate manually the document happily until a major reorganization of  the  original
       master.doc  document  happened. Then, after some unpleasant tries with diff or similar tools, you want to
       convert to po4a.  But of course, you don't want to loose your existing translation in the process.  Don't
       worry, this case is also handled by po4a tools and is called gettextization.

       The key here is to have the same structure in the translated document and in the original one so that the
       tools can match the content accordingly.

       If you are lucky (i.e., if the structures of both documents perfectly match), it will work seamlessly and
       you  will  be set in a few seconds. Otherwise, you may understand why this process has such an ugly name,
       and you'd better be prepared to some grunt work here. In any case, remember that it is the price  to  pay
       to get the comfort of po4a afterward. And the good point is that you have to do so only once.

       I  cannot  emphasize  this too much. In order to ease the process, it is thus important that you find the
       exact version which were used to do the translation. The best situation is when you noted  down  the  VCS
       revision  used  for  the translation and you didn't modify it in the translation process, so that you can
       use it.

       It won't work well when you use the updated original text with the old translation. It remains  possible,
       but  is  harder  and  really should be avoided if possible. In fact, I guess that if you fail to find the
       original text again, the best solution is to find someone to do the gettextization for you (but,  please,
       not me ;).

       Maybe  I'm  too  dramatic  here.  Even  when  things  go  wrong,  it remains ways faster than translating
       everything again. I was able to gettextize the existing French translation of the Perl  documentation  in
       one  day,  even  though  things  did  went  wrong.  That  was  more than two megabytes of text, and a new
       translation would have lasted months or more.

       Let me explain the basis of the procedure first and I will come back on hints  to  achieve  it  when  the
       process goes wrong. To ease comprehension, let's use above example once again.

       Once  you have the old master.doc again which matches with the translation XX.doc, the gettextization can
       be done directly to the PO file doc.XX.po without manual translation of translation.pot file:

        $ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>

       When you're lucky, that's it. You converted your old translation to po4a and can begin with the  updating
       task  right  away. Just follow the procedure explained a few section ago to synchronize your PO file with
       the newest original document, and update the translation accordingly.

       Please note that even when things seem to work properly, there is still room for errors in this  process.
       The  point  is  that  po4a  is  unable to understand the text to make sure that the translation match the
       original. That's why all strings are marked as "fuzzy" in the process. You  should  check  each  of  them
       carefully before removing those markers.

       Often  the  document  structures  don't  match  exactly,  preventing  po4a-gettextize  from doing its job
       properly. At that point, the whole game is about editing the files to get their damn structures matching.

       It may help to read the section Gettextization: how does it  work?  below.   Understanding  the  internal
       process  will  help you to make this work. The good point is that po4a-gettextize is rather verbose about
       what went wrong when it happens. First, it pinpoints where in the documents the structures' discrepancies
       are. You will learn the strings that don't match, their positions in the text, and the type  of  each  of
       them. Moreover, the PO file generated so far will be dumped to gettextization.failed.po.

       -   Remove all extra parts of the translations, such as the section in which you give the translator name
           and  thank  every people who contributed to the translation. Addenda, which are described in the next
           section, will allow you to re-add them afterward.

       -   Do not hesitate to edit both the original and the translation. The most important thing is to get the
           PO file. You will be able to update it afterward. That being said, editing the translation should  be
           preferred when both are possible since it makes things easier when the gettextization is done.

       -   If  needed,  kill  some parts of the original if they happen to not be translated. When synchronizing
           the PO with the document afterward, they will come back from themselves.

       -   If you changed the structure a bit (to merge two  paragraphs,  or  split  another  one),  undo  those
           changes.  If  there are issues in the original, you should inform the original author. Fixing them in
           your translation only fixes them for a part of the community.  And  moreover,  it's  impossible  when
           using po4a ;)

       -   Sometimes,  the  paragraph  content  does  match,  but their types don't. Fixing it is rather format-
           dependant. In POD and man, it often comes from the fact that one of the two contains a line beginning
           with a white space where the other doesn't. In those formats, such paragraph cannot  be  wrapped  and
           thus  become  a  different type. Just remove the space and you are fine. It may also be a typo in the
           tag name.

           Likewise, two paragraphs may get merged together in  POD  when  the  separating  line  contains  some
           spaces, or when there is no empty line between the =item line and the content of the item.

       -   Sometimes,  there  is  a  desynchronization between the files, and the translation is attached to the
           wrong original paragraph. It is the sign that the  real  problem  was  before  in  the  files.  Check
           gettextization.failed.po to see when the desynchronization begins, and fix it there.

       -   Sometimes,  you  get  the strong feeling that po4a ate some parts of the text, either the original or
           the translation. gettextization.failed.po indicates that both of them where gently matching, and then
           the gettextization fails because it tried to match one paragraph with the one after (or  before)  the
           right  one,  as  if  the  right  one  disappeared.  Curse po4a as I did when it first happened to me.
           Generously.

           This unfortunate situation happens when the same paragraph is repeated over  the  document.  In  that
           case,  no  new  entry  is  created  in  the PO file, but a new reference is added to the existing one
           instead.

           So, when the same paragraph appears twice in the original but both are not translated  in  the  exact
           same  way each time, you will get the feeling that a paragraph of the original disappeared. Just kill
           the new translation. If you prefer to kill the first translation instead  when  the  second  one  was
           actually better, replace the first one with the second.

           In  the  contrary, if two similar but different paragraphs were translated in the exact same way, you
           will get the feeling that a paragraph of the translation disappeared. A solution is to add  a  stupid
           string  to  the  original  paragraph  (such  as  "I'm different"). Don't be afraid, those things will
           disappear during the synchronization, and when the added text is short  enough,  gettext  will  match
           your  translation  to  the  existing  text  (marking it as fuzzy, but you don't really care since all
           strings are fuzzy after gettextization).

       Hopefully, those tips will help you making your gettextization work and obtain your precious PO file. You
       are now ready to synchronize your file and begin your translation. Please note that on large text, it may
       happen that the first synchronization takes a long time.

       For example, the first po4a-updatepo of the Perl documentation's French translation (5.5 Mb PO file) took
       about two days full on a 1Ghz G5 computer.  Yes, 48 hours. But the subsequent ones only take a  dozen  of
       seconds  on  my  old laptop. This is because the first time, most of the msgid of the PO file don't match
       any of the POT file ones. This forces gettext to search  for  the  closest  one  using  a  costly  string
       proximity algorithm.

   HOWTO add extra text to translations (like translator's name)?
       Because  of  the  gettext  approach,  doing  this  becomes more difficult in po4a than it was when simply
       editing a new file along the original one. But it remains possible, thanks to the so-called addenda.

       It may help the comprehension to consider addenda as a sort of patches applied to the localized  document
       after  processing.  They are rather different from the usual patches (they have only one line of context,
       which can embed Perl regular expression, and they can only add new text without removing  any),  but  the
       functionalities are the same.

       Their  goal  is to allow the translator to add extra content to the document which is not translated from
       the original document. The most common usage is to add a section about the  translation  itself,  listing
       contributors and explaining how to report bug against the translation.

       An  addendum must be provided as a separate file. The first line constitutes a header indicating where in
       the produced document they should be placed. The rest of the addendum file will be added verbatim at  the
       determined position of the resulting document.

       The  header  has  a  pretty rigid syntax: It must begin with the string PO4A-HEADER:, followed by a semi-
       colon (;) separated list of key=value fields. White spaces ARE important. Note that you  cannot  use  the
       semi-colon char (;) in the value, and that quoting it doesn't help.

       Again, it sounds scary, but the examples given below should help you to find how to write the header line
       you  need.  To illustrate the discussion, assume we want to add a section called "About this translation"
       after the "About this document" one.

       Here are the possible header keys:

       position (mandatory)
           a Perl regexp. The addendum will be placed near the line  matching  this  regexp.   Note  that  we're
           speaking  about  the  translated  document  here,  not  the  original. If more than a line match this
           expression (or none), the addition will fail. It is indeed better to report an error  than  inserting
           the addendum at the wrong location.

           This  line is called position point in the following. The point where the addendum is added is called
           insertion point. Those two points are near one from another, but not equal. For example, if you  want
           to insert a new section, it is easier to put the position point on the title of the preceding section
           and  explain  po4a  where  the  section ends (remember that position point is given by a regexp which
           should match a unique line).

           The localization of the insertion point with regard to the position point is controlled by the  mode,
           beginboundary and endboundary fields, as explained below.

           In our case, we would have:

                position=<title>About this document</title>

       mode (mandatory)
           It can be either the string before or after, specifying the position of the addendum, relative to the
           position  point.  In case before is given the insertion point will placed exactly before the position
           point. The after behaviour is detailed bellow.

           Since we want the new section to be placed below the one we are matching, we have:

                mode=after

       beginboundary (used only when mode=after, and mandatory in that case)
       endboundary (idem)
           regexp matching the end of the section after which the addendum goes.

           When mode=after, the insertion point is after the position point,  but  not  directly  after!  It  is
           placed  at  the  end  of  the section beginning at the position point, i.e., after or before the line
           matched by the ???boundary argument, depending on whether you used beginboundary or endboundary.

           In our case, we can choose to indicate the end of the section we match by adding:

              endboundary=</section>

           or to indicate the beginning of the next section by indicating:

              beginboundary=<section>

           In both case, our addendum will be placed after the </section> and before the  <section>.  The  first
           one is better since it will work even if the document gets reorganized.

           Both forms exist because documentation formats are different. In some of them, there is a way to mark
           the  end of a section (just like the </section> we just used), while some other don't explicitly mark
           the end of section (like in man). In the former case, you want to make a boundary matching the end of
           a section, so that the insertion point comes after it. In  the  latter  case,  you  want  to  make  a
           boundary  matching  the  beginning of the next section, so that the insertion point comes just before
           it.

       This can seem obscure, but hopefully, the next examples will enlighten you.

        To sum up the example we used so far, in order to add a section called "About this translation" after
       the "About this document" one in a SGML document, you can use either of those header lines:
          PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
          PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>

        If you want to add something after the following nroff section:
           .SH "AUTHORS"

         you should put a position matching this line, and a beginboundary matching the beginning  of  the  next
         section  (i.e., ^\.SH). The addendum will then be added after the position point and immediately before
         the first line matching the beginboundary. That is to say:

          PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH

        If you want to add something into a section (like after "Copyright Big Dude") instead of adding a whole
       section, give a position matching this line, and give a beginboundary matching any line.
          PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^

        If you want to add something at the end of the document, give a position matching any line of your
       document (but only one line. Po4a won't proceed if it's not unique), and give an endboundary matching
       nothing. Don't use simple strings here like "EOF", but prefer those which have less chance to be in your
       document.
          PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary

       In any case, remember that these are regexp. For example, if you want to match the end of a nroff section
       ending with the line

         .fi

       don't use .fi as endboundary, because it will match with "the[ fi]le", which is obviously  not  what  you
       expect. The correct endboundary in that case is: ^\.fi$.

       If  the  addendum  doesn't go where you expected, try to pass the -vv argument to the tools, so that they
       explain you what they do while placing the addendum.

       More detailed example

       Original document (POD formatted):

        |=head1 NAME
        |
        |dummy - a dummy program
        |
        |=head1 AUTHOR
        |
        |me

       Then, the following addendum will ensure that a section (in French) about the translator is added at  the
       end of the file. (in French, "TRADUCTEUR" means "TRANSLATOR", and "moi" means "me")

        |PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
        |
        |=head1 TRADUCTEUR
        |
        |moi
        |

       In order to put your addendum before the AUTHOR, use the following header:

        PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1

       This  works  because  the  next  line  matching  the  beginboundary  /^=head1/  after  the section "NAME"
       (translated to "NOM" in French), is the one declaring the authors. So, the addendum will be  put  between
       both  sections.  Note that if later some other section will be added between NAME and AUTHOR sections, it
       will break this example making the addenda to be added before this newly added section.

       To avoid this you may accomplish the same using mode=before:

        PO4A-HEADER:mode=before;position=^=head1 AUTEUR

   HOWTO do all this in one program invocation?
       The use of po4a proved to be a bit error prone for the  users  since  you  have  to  call  two  different
       programs  in  the  right  order (po4a-updatepo and then po4a-translate), each of them needing more than 3
       arguments. Moreover, it was difficult with this system to use only one PO file  for  all  your  documents
       when more than one format was used.

       The  po4a(1)  program  was  designed  to  solve those difficulties. Once your project is converted to the
       system, you write a simple configuration file explaining where your translation files are (PO  and  POT),
       where the original documents are, their formats and where their translations should be placed.

       Then,  calling  po4a(1)  on  this  file  ensures  that the PO files are synchronized against the original
       document, and that the translated document are generated properly. Of course, you will want to call  this
       program  twice:  once  before  editing the PO files to update them and once afterward to get a completely
       updated translated document. But you only need to remember one command line.

   HOWTO customize po4a?
       po4a modules have options (specified with the -o option) that can be used to change the module behavior.

       You can also edit the source code of the existing modules or even write your own modules.  To  make  them
       visible  to  po4a,  copy your modules into a path called "/bli/blah/blu/lib/Locale/Po4a/" and then adding
       the path "/bli/blah/blu" in the "PERLIB" or "PERL5LIB" environment variable. For example:

          PERLLIB=$PWD/lib po4a --previous po4a/po4a.cfg

       Note: the actual name of the lib directory is not important.

How does it work?

       This chapter gives you a brief overview of the po4a internals, so that you may  feel  more  confident  to
       help  us  maintaining  and  improving  it. It may also help you understanding why it does not do what you
       expected, and how to solve your problems.

   What's the big picture here?
       The po4a architecture is object oriented (in Perl. Isn't that neat?). The common ancestor to  all  parser
       classes  is  called  TransTractor.  This  strange name comes from the fact that it is at the same time in
       charge of translating document and extracting strings.

       More formally, it takes a document to translate plus a PO file containing  the  translations  to  use  as
       input  while producing two separate outputs: Another PO file (resulting of the extraction of translatable
       strings from the input document), and a translated document (with the same structure than the input  one,
       but  with  all  translatable  strings  replaced  with  content  of  the  input  PO).  Here is a graphical
       representation of this:

          Input document --\                             /---> Output document
                            \      TransTractor::       /       (translated)
                             +-->--   parse()  --------+
                            /                           \
          Input PO --------/                             \---> Output PO
                                                                (extracted)

       This little bone is the core of all the po4a architecture. If you  omit  the  input  PO  and  the  output
       document,  you  get  po4a-gettextize.  If  you  provide  both  input and disregard the output PO, you get
       po4a-translate.

       TransTractor::parse() is a virtual function implemented by each module. Here is a little example to  show
       you how it works. It parses a list of paragraphs, each of them beginning with <p>.

         1 sub parse {
         2   PARAGRAPH: while (1) {
         3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
         4     $my $first=1;
         5     while (($line,$lref)=$document->shiftline() && defined($line)) {
         6       if ($line =~ m/<p>/ && !$first--; ) {
         7         $document->unshiftline($line,$lref);
         8
         9         $paragraph =~ s/^<p>//s;
        10         $document->pushline("<p>".$document->translate($paragraph,$pararef));
        11
        12         next PARAGRAPH;
        13       } else {
        14         $paragraph .= $line;
        15         $pararef = $lref unless(length($pararef));
        16       }
        17     }
        18     return; # Did not got a defined line? End of input file.
        19   }
        20 }

       On  line 6, we encounter <p> for the second time. That's the signal of the next paragraph. We should thus
       put the just obtained line back into the original document (line 7) and push the paragraph built  so  far
       into  the  outputs. After removing the leading <p> of it on line 9, we push the concatenation of this tag
       with the translation of the rest of the paragraph.

       This translate() function is very cool. It pushes its argument into the output PO file  (extraction)  and
       returns  its  translation  as  found  in  the input PO file (translation). Since it's used as part of the
       argument of pushline(), this translation lands into the output document.

       Isn't that cool? It is possible to build a complete po4a module in less than 20 lines when the format  is
       simple enough...

       You can learn more about this in Locale::Po4a::TransTractor(3pm).

   Gettextization: how does it work?
       The  idea  here  is  to take the original document and its translation, and to say that the Nth extracted
       string from the translation is the translation of the Nth extracted string from the original. In order to
       work, both files must share exactly the same structure. For example, if  the  files  have  the  following
       structure,  it is very unlikely that the 4th string in translation (of type 'chapter') is the translation
       of the 4th string in original (of type 'paragraph').

           Original         Translation

         chapter            chapter
           paragraph          paragraph
           paragraph          paragraph
           paragraph        chapter
         chapter              paragraph
           paragraph          paragraph

       For that, po4a parsers are used on both the original and the translation files to extract PO  files,  and
       then a third PO file is built from them taking strings from the second as translation of strings from the
       first.  In  order  to check that the strings we put together are actually the translations of each other,
       document parsers in po4a should put information about the syntactical type of extracted  strings  in  the
       document  (all  existing ones do so, yours should also). Then, this information is used to make sure that
       both documents have the same syntax. In the previous example, it would allow us to detect that  string  4
       is a paragraph in one case, and a chapter title in another case and to report the problem.

       In  theory,  it would be possible to detect the problem, and resynchronize the files afterward (just like
       diff does). But what we should do of the few strings before desynchronizations is not clear, and it would
       produce bad results some times. That's why the current implementation don't try to resynchronize anything
       and verbosely fail when something goes wrong, requiring manual modification of files to fix the problem.

       Even with these precautions, things can go wrong very easily here. That's why  all  translations  guessed
       this way are marked fuzzy to make sure that the translator reviews and checks them.

   Addendum: How does it work?
       Well,  that's  pretty  easy  here.  The  translated document is not written directly to disk, but kept in
       memory until all the addenda are applied. The algorithms involved here  are  rather  straightforward.  We
       look  for a line matching the position regexp, and insert the addendum before it if we're in mode=before.
       If not, we search for the next line matching the boundary and insert the addendum after this line if it's
       an endboundary or before this line if it's a beginboundary.

FAQ

       This chapter groups the Frequently Asked Questions. In fact, most of  the  questions  for  now  could  be
       formulated  that  way: "Why is it designed this way, and not that one?" If you think po4a isn't the right
       answer to documentation translation, you should consider reading this section. If it does not answer your
       question, please contact us on the <po4a-devel@lists.alioth.debian.org> mailing list. We love feedback.

   Why to translate each paragraph separately?
       Yes, in po4a, each paragraph is translated separately  (in  fact,  each  module  decides  this,  but  all
       existing modules do so, and yours should also).  There are two main advantages to this approach:

       • When  the  technical  parts  of  the document are hidden from the scene, the translator can't mess with
         them. The fewer markers we present to the translator the less error he can do.

       • Cutting the document helps in isolating the changes to the original  document.  When  the  original  is
         modified, finding what parts of the translation need to be updated is eased by this process.

       Even  with  these  advantages,  some people don't like the idea of translating each paragraph separately.
       Here are some of the answers I can give to their fear:

       • This approach proved successfully in the KDE project and allows people there  to  produce  the  biggest
         corpus of translated and up to date documentation I know.

       • The  translators  can  still  use the context to translate, since the strings in the PO file are in the
         same order than in the original document. Translating sequentially is thus  rather  comparable  whether
         you  use po4a or not.  And in any case, the best way to get the context remains to convert the document
         to a printable format since the text formatting ones are not really readable, IMHO.

       • This approach is the one used by professional translators. I agree, that they have  somewhat  different
         goals  than  open-source  translators. The maintenance is for example often less critical to them since
         the content changes rarely.

   Why not to split on sentence level (or smaller)?
       Professional translator tools sometimes split the document at the sentence level in order to maximize the
       reusability of previous translations and speed up their process.  The problem is that the  same  sentence
       may have several translations, depending on the context.

       Paragraphs  are  by  definition  longer  than  sentences.  It  will hopefully ensure that having the same
       paragraph in two documents will have the same meaning (and translation), regardless  of  the  context  in
       each case.

       Splitting  on  smaller  parts  than the sentence would be very bad. It would be a bit long to explain why
       here, but interested reader can refer to the Locale::Maketext::TPJ13(3pm) man page (which comes with  the
       Perl  documentation),  for  example.  To  make short, each language has its specific syntactic rules, and
       there is no way to build sentences by aggregating parts of sentences working for all  existing  languages
       (or even for the 5 of the 10 most spoken ones, or even less).

   Why not put the original as comment along with translation (or the other way around)?
       At  the  first  glance,  gettext doesn't seem to be adapted to all kind of translations.  For example, it
       didn't seem adapted to debconf, the interface all Debian packages use for their interaction with the user
       during installation. In that case, the texts to translate were pretty  short  (a  dozen  lines  for  each
       package),  and it was difficult to put the translation in a specialized file since it has to be available
       before the package installation.

       That's why the debconf developer decided to implement another solution, where translations are placed  in
       the  same  file  than the original. This is rather appealing. One would even want to do this for XML, for
       example. It would look like that:

        <section>
         <title lang="en">My title</title>
         <title lang="fr">Mon titre</title>

         <para>
          <text lang="en">My text.</text>
          <text lang="fr">Mon texte.</text>
         </para>
        </section>

       But it was so problematic that a PO-based approach is now used. Only the original can be  edited  in  the
       file,  and  the  translations  must take place in PO files extracted from the master template (and placed
       back at package compilation time). The old system was deprecated because of several issues:

       •   maintenance problems

           If several translators provide a patch at the same time, it gets hard to merge them together.

           How will you detect changes to the original, which need to be applied to the translations?  In  order
           to  use diff, you have to note which version of the original you translated. I.e., you need a PO file
           in your file ;)

       •   encoding problems

           This solution is viable when only European languages are involved, but the  introduction  of  Korean,
           Russian and/or Arab really complicate the picture.  UTF could be a solution, but there are still some
           problems with it.

           Moreover,  such  problems are hard to detect (i.e., only Korean readers will detect that the encoding
           of Korean is broken [because of the Russian translator])

       gettext solves all those problems together.

   But gettext wasn't designed for that use!
       That's true, but until now nobody came with a better solution.  The  only  known  alternative  is  manual
       translation, with all the maintenance issues.

   What about the other translation tools for documentation using gettext?
       As far as I know, there are only two of them:

       poxml
           This  is  the  tool developed by KDE people to handle DocBook XML. AFAIK, it was the first program to
           extract strings to translate from documentation to PO files, and inject them back after translation.

           It can only handle XML, and only a particular DTD. I'm quite unhappy  with  the  handling  of  lists,
           which end in one big msgid. When the list become big, the chunk becomes harder to swallow.

       po-debiandoc
           This program done by Denis Barbier is a sort of precursor of the po4a SGML module, which more or less
           deprecates  it.  As  the  name  says,  it  handles  only  the  DebianDoc DTD, which is more or less a
           deprecated DTD.

       The main advantages of po4a over them are the ease of extra content addition (which is even worse  there)
       and the ability to achieve gettextization.

   Educating developers about translation
       When  you  try to translate documentation or programs, you face three kinds of problems; linguistics (not
       everybody speaks two languages), technical  (that's  why  po4a  exists)  and  relational/human.  Not  all
       developers  understand  the necessity of translating stuff. Even when good willed, they may ignore how to
       ease the work of translators. To help with that, po4a comes  with  lot  of  documentation  which  can  be
       referred to.

       Another important point is that each translated file begins with a short comment indicating what the file
       is, how to use it. This should help the poor developers flooded with tons of files in different languages
       they hardly speak, and help them dealing correctly with it.

       In the po4a project, translated documents are not source files anymore, in the sense that these files are
       not  the  preferred form of the work for making modifications to it. Since this is rather unconventional,
       that's a source of easy mistakes. That's why all files present this header:

        |       *****************************************************
        |       *           GENERATED FILE, DO NOT EDIT             *
        |       * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
        |       *****************************************************
        |
        | This file was generated by po4a-translate(1). Do not store it (in VCS,
        | for example), but store the PO file used as source file by po4a-translate.
        |
        | In fact, consider this as a binary, and the PO file as a regular source file:
        | If the PO gets lost, keeping this translation up-to-date will be harder ;)

       Likewise, gettext's regular PO files only need to be copied to the po/ directory. But  this  is  not  the
       case  of  the  ones  manipulated  by  po4a.  The  major risk here is that a developer erases the existing
       translation of his program with the translation of his documentation. (Both of them can't  be  stored  in
       the  same  PO  file,  because  the  program  needs  to  install  its  translation as an mo file while the
       documentation only uses its translation at compile time). That's why the PO files  produced  by  the  po-
       debiandoc module contain the following header:

        #
        #  ADVISES TO DEVELOPERS:
        #    - you do not need to manually edit POT or PO files.
        #    - this file contains the translation of your debconf templates.
        #      Do not replace the translation of your program with this !!
        #        (or your translators will get very upset)
        #
        #  ADVISES TO TRANSLATORS:
        #    If you are not familiar with the PO format, gettext documentation
        #     is worth reading, especially sections dedicated to this format.
        #    For example, run:
        #         info -n '(gettext)PO Files'
        #         info -n '(gettext)Header Entry'
        #
        #    Some information specific to po-debconf are available at
        #            /usr/share/doc/po-debconf/README-trans
        #         or http://www.debian.org/intl/l10n/po-debconf/README-trans
        #

   SUMMARY of the advantages of the gettext based approach
       • The  translations  are  not  stored  along  with  the  original,  which  makes it possible to detect if
         translations become out of date.

       • The translations are stored in separate files from each other, which prevents translators of  different
         languages from interfering, both when submitting their patch and at the file encoding level.

       • It  is  based  internally on gettext (but po4a offers a very simple interface so that you don't need to
         understand the internals to use it).  That way, we don't have to re-implement the wheel, and because of
         their wide use, we can think that these tools are more or less bug free.

       • Nothing changed for the end-user (beside the fact translations will hopefully  be  better  maintained).
         The resulting documentation file distributed is exactly the same.

       • No  need  for  translators to learn a new file syntax and their favorite PO file editor (like Emacs' PO
         mode, Lokalize or Gtranslator) will work just fine.

       • gettext offers a simple way to get statistics about what is done, what should be reviewed and  updated,
         and what is still to do. Some example can be found at those addresses:

          - http://kv-53.narod.ru/kaider1.png
          - http://www.debian.org/intl/l10n/

       But everything isn't green, and this approach also has some disadvantages we have to deal with.

       • Addenda are... strange at the first glance.

       • You  can't  adapt the translated text to your preferences, like splitting a paragraph here, and joining
         two other ones there. But in some sense, if there is an issue with the original, it should be  reported
         as a bug anyway.

       • Even with an easy interface, it remains a new tool people have to learn.

         One  of  my  dreams would be to integrate somehow po4a to Gtranslator or Lokalize. When a documentation
         file is opened, the strings are automatically extracted, and a translated file + po file can be written
         to disk. If we manage to do an MS Word (TM) module (or at least RTF) professional translators may  even
         use it.

AUTHORS

        Denis Barbier <barbier,linuxfr.org>
        Martin Quinson (mquinson#debian.org)

Po4a Tools                                         2017-08-26                                            PO4A(7)