Provided by: po4a_0.57-2_all bug

NAME

       po4a - framework to translate documentation and other materials

Introduction

       The po4a (PO for anything) project goal is to ease translations (and more interestingly, the maintenance
       of translations) using gettext tools on areas where they were not expected like documentation.

Table of content

       This document is organized as follow:

       1 Why should I use po4a? What is it good for?
           This  introducing  chapter explains the motivation of the project and its philosophy. You should read
           it first if you are in the process of evaluating po4a for your own translations.

       2 How to use po4a?
           This chapter is a sort of reference manual, trying to answer the users' questions and to give  you  a
           better understanding of the whole process. This introduces how to do things with po4a and serve as an
           introduction to the documentation of the specific tools.

           HOWTO begin a new translation?
           HOWTO change the translation back to a documentation file?
           HOWTO update a po4a translation?
           HOWTO convert a pre-existing translation to po4a?
           HOWTO add extra text to translations (like translator's name)?
           HOWTO do all this in one program invocation?
           HOWTO customize po4a?
       3 How does it work?
           This chapter gives you a brief overview of the po4a internals, so that you may feel more confident to
           help  us maintaining and improving it. It may also help you understanding why it does not do what you
           expected, and how to solve your problems.

       4 FAQ
           This chapter groups the Frequently Asked Questions. In fact, most of the questions for now  could  be
           formulated  that  way:  "Why  is it designed this way, and not that one?" If you think po4a isn't the
           right answer to documentation translation, you should consider reading this section. If it  does  not
           answer your question, please contact us on the <devel@lists.po4a.org> mailing list. We love feedback.

       5 Specific notes about modules
           This  chapter  presents  the  specificities  of each module from the translator and original author's
           point of view. Read this to learn the syntax you  will  encounter  when  translating  stuff  in  this
           module, or the rules you should follow in your original document to make translators' life easier.

           Actually,  this  section  is not really part of this document. Instead, it is placed in each module's
           documentation. This helps ensuring that the information is up to date by  keeping  the  documentation
           and the code together.

Why should I use po4a? What is it good for?

       I  like  the  idea  of  open-source software, making it possible for everybody to access software and its
       source code. But being French, I'm well aware that the licensing is  not  the  only  restriction  to  the
       openness of software: non-translated free software is useless for non-English speakers, and we still have
       some work to make it available to really everybody out there.

       The  perception  of  this  situation  by the open-source actors did dramatically improve recently. We, as
       translators, won  the  first  battle  and  convinced  everybody  of  the  translations'  importance.  But
       unfortunately, it was the easy part. Now, we have to do the job and actually translate all this stuff.

       Actually,  open-source software themselves benefit of a rather decent level of translation, thanks to the
       wonderful gettext tool suite. It is able to extract the strings to translate from the program, present  a
       uniform  format  to translators, and then use the result of their works at run time to display translated
       messages to the user.

       But the situation is rather  different  when  it  comes  to  documentation.  Too  often,  the  translated
       documentation  is  not visible enough (not distributed as a part of the program), only partial, or not up
       to date. This last situation is by far the worst possible one. Outdated translation can turn  out  to  be
       worse  than  no  translation  at all to the users by describing old program behavior which are not in use
       anymore.

   The problem to solve
       Translating documentation is not very difficult in itself. Texts are far longer than the messages of  the
       program  and  thus  take  longer  to  be  achieved, but no technical skill is really needed to do so. The
       difficult part comes when you have to maintain your work. Detecting which parts did change and need to be
       updated is very difficult, error-prone and highly unpleasant. I guess that  this  explains  why  so  much
       translated documentation out there are outdated.

   The po4a answers
       So,  the  whole point of po4a is to make the documentation translation maintainable. The idea is to reuse
       the gettext methodology to this new field. Like in gettext,  texts  are  extracted  from  their  original
       locations  in  order  to be presented in a uniform format to the translators. The classical gettext tools
       help them updating their works when a new release of the original comes out. But to the difference of the
       classical gettext model, the translations are then re-injected in the structure of the original  document
       so that they can be processed and distributed just like the English version.

       Thanks  to  this,  discovering  which  parts of the document were changed and need an update becomes very
       easy. Another good point is that the tools will make almost all  the  work  when  the  structure  of  the
       original  document  gets  fundamentally  reorganized  and  when some chapters are moved around, merged or
       split. By extracting the text to translate from the document structure, it also keeps you away  from  the
       text  formatting  complexity  and  reduces  your  chances  to  get a broken document (even if it does not
       completely prevent you to do so).

       Please also see the FAQ below  in  this  document  for  a  more  complete  list  of  the  advantages  and
       disadvantages of this approach.

   Supported formats
       Currently, this approach has been successfully implemented to several kinds of text formatting formats:

       man

       The  good  old manual pages' format, used by so much programs out there. The po4a support is very welcome
       here since this format is somewhat difficult to  use  and  not  really  friendly  to  the  newbies.   The
       Locale::Po4a::Man(3pm)  module  also  supports  the mdoc format, used by the BSD man pages (they are also
       quite common on Linux).

       pod

       This is the Perl Online Documentation format. The language and extensions themselves are documented  that
       way,  as  well as most of the existing Perl scripts. It makes easy to keep the documentation close to the
       actual code by embedding them both in the same file. It makes programmer life easier, but  unfortunately,
       not the translator one.

       sgml

       Even  if  somewhat superseded by XML nowadays, this format is still used rather often for documents which
       are more than a few screens long. It allows you to make complete books. Updating the  translation  of  so
       long  documents  can reveal to be a real nightmare. diff reveals often useless when the original text was
       re-indented after update. Fortunately, po4a can help you in that process.

       Currently, only the DebianDoc and DocBook DTD are supported, but adding support to a new  one  is  really
       easy.  It  is even possible to use po4a on an unknown SGML DTD without changing the code by providing the
       needed information on the command line. See Locale::Po4a::Sgml(3pm) for details.

       TeX / LaTeX

       The LaTeX format is a major documentation format used in the Free Software world  and  for  publications.
       The  Locale::Po4a::LaTeX(3pm)  module  was  tested  with  the  Python  documentation,  a  book  and  some
       presentations.

       texinfo

       All the GNU documentation is written in this format (that's even one of  the  requirement  to  become  an
       official  GNU  project).   The  support for Locale::Po4a::Texinfo(3pm) in po4a is still at the beginning.
       Please report bugs and feature requests.

       xml

       The XML format is a base format for many documentation formats.

       Currently, the DocBook DTD is supported by po4a. See Locale::Po4a::Docbook(3pm) for details.

       others

       Po4a can also handle some more rare or specialized formats, such  as  the  documentation  of  compilation
       options  for  the  2.4+ Linux kernels or the diagrams produced by the dia tool. Adding a new one is often
       very  easy  and  the  main  task  is  to  come  up  with  a   parser   of   your   target   format.   See
       Locale::Po4a::TransTractor(3pm) for more information about this.

   Unsupported formats
       Unfortunately, po4a still lacks support for several documentation formats.

       There  is  a  whole  bunch  of other formats we would like to support in po4a, and not only documentation
       ones. Indeed, we aim at plugging all "market holes" left by the classical gettext  tools.   It  encompass
       package  descriptions  (deb and rpm), package installation scripts questions, package changelogs, and all
       specialized file formats used by the programs such as game scenarios or wine resource files.

How to use po4a?

       This chapter is a sort of reference manual, trying to answer the users'  questions  and  to  give  you  a
       better  understanding  of  the  whole process. This introduces how to do things with po4a and serve as an
       introduction to the documentation of the specific tools.

   Graphical overview
       The following schema gives an overview of the process of translating documentation using po4a. Do not  be
       afraid  by  its  apparent  complexity, it comes from the fact that the whole process is represented here.
       Once you converted your project to po4a, only the right part of the graphic is relevant.

       Note that master.doc is taken as an example for the documentation to be translated and translation.doc is
       the corresponding translated text.  The suffix could be .pod, .xml, or .sgml  depending  on  its  format.
       Each part of the picture will be detailed in the next sections.

                                          master.doc
                                              |
                                              V
            +<-----<----+<-----<-----<--------+------->-------->-------+
            :           |                     |                        :
       {translation}    |         { update of master.doc }             :
            :           |                     |                        :
          XX.doc        |                     V                        V
        (optional)      |                 master.doc ->-------->------>+
            :           |                   (new)                      |
            V           V                     |                        |
         [po4a-gettextize]   doc.XX.po -->+   |                        |
                 |            (old)       |   |                        |
                 |              ^         V   V                        |
                 |              |     [po4a-updatepo]                  |
                 V              |           |                          V
          translation.pot       ^           V                          |
                 |              |        doc.XX.po                     |
                 |              |         (fuzzy)                      |
          { translation }       |           |                          |
                 |              ^           V                          V
                 |              |     {manual editing}                 |
                 |              |           |                          |
                 V              |           V                          V
             doc.XX.po --->---->+<---<-- doc.XX.po    addendum     master.doc
             (initial)                 (up-to-date)  (optional)   (up-to-date)
                 :                          |            |             |
                 :                          V            |             |
                 +----->----->----->------> +            |             |
                                            |            |             |
                                            V            V             V
                                            +------>-----+------<------+
                                                         |
                                                         V
                                                  [po4a-translate]
                                                         |
                                                         V
                                                       XX.doc
                                                    (up-to-date)

       On  the  left part, the conversion of a translation not using po4a to this system is shown. On the top of
       the right part, the action of the original author is depicted (updating the documentation).   The  middle
       of  the  right  part is where the automatic actions of po4a are depicted. The new material are extracted,
       and compared against the  exiting  translation.  Parts  which  didn't  change  are  found,  and  previous
       translation  is used. Parts which were partially modified are also connected to the previous translation,
       but with a specific marker indicating that the translation must be updated.  The  bottom  of  the  figure
       shows how a formatted document is built.

       Actually,  as a translator, the only manual operation you have to do is the part marked {manual editing}.
       Yeah, I'm sorry, but po4a helps you translate.  It does not translate anything for you…

   HOWTO begin a new translation?
       This section presents the needed steps required to begin a new translation  with  po4a.  The  refinements
       involved in converting an existing project to this system are detailed in the relevant section.

       To begin a new translation using po4a, you have to do the following steps:

       - Extract  the  text  which  have  to  be  translated  from the original <master.doc> document into a new
         translation template <translation.pot> file (the gettext format). For  that,  use  the  po4a-gettextize
         program this way:

           $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>

         <format>  is  naturally  the  format used in the master.doc document. As expected, the output goes into
         translation.pot.  Please refer to po4a-gettextize(1) for more details about the existing options.

       - Actually translate what should be translated. For that, you have to rename the POT file for example  to
         doc.XX.po  (where XX is the ISO 639-1 code of the language you are translating to, e.g. fr for French),
         and edit the resulting file. It is often a good idea to not name the file XX.po to avoid confusion with
         the translation of the program messages, but this your call.   Don't  forget  to  update  the  PO  file
         headers, they are important.

         The  actual translation can be done using the Emacs' or Vi's PO mode, Lokalize (KDE based), Gtranslator
         (GNOME based) or whichever program you prefer to use for them (e.g. Virtaal).

         If you wish to learn more about this, you definitively need to  refer  to  the  gettext  documentation,
         available in the gettext-doc package.

   HOWTO change the translation back to a documentation file?
       Once  you're done with the translation, you want to get the translated documentation and distribute it to
       users along with the original one.  For that, use the po4a-translate(1) program like that  (where  XX  is
       the language code):

         $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>

       As  before,  <format> is the format used in the master.doc document.  But this time, the PO file provided
       with the -p flag is part of the input.  This is your translation. The output goes into XX.doc.

       Please refer to po4a-translate(1) for more details.

   HOWTO update a po4a translation?
       To update your translation when the original  master.doc  file  has  changed,  use  the  po4a-updatepo(1)
       program like that:

         $ po4a-updatepo -f <format> -m <new_master.doc> -p <old_doc.XX.po>

       (Please refer to po4a-updatepo(1) for more details)

       Naturally,  the  new  paragraph  in  the document won't get magically translated in the PO file with this
       operation, and you'll need to update the  PO  file  manually.  Likewise,  you  may  have  to  rework  the
       translation  for  paragraphs which were modified a bit. To make sure you won't miss any of them, they are
       marked as "fuzzy" during the process and you have to remove this marker before  the  translation  can  be
       used by po4a-translate.  As for the initial translation, the best is to use your favorite PO editor here.

       Once  your PO file is up-to-date again, without any untranslated or fuzzy string left, you can generate a
       translated documentation file, as explained in the previous section.

   HOWTO convert a pre-existing translation to po4a?
       Often, you used to translate manually the document happily until a major reorganization of  the  original
       master.doc  document  happened. Then, after some unpleasant tries with diff or similar tools, you want to
       convert to po4a.  But of course, you don't want to loose your existing translation in the process.  Don't
       worry, this case is also handled by po4a tools and is called gettextization.

       The key here is to have the same structure in the translated document and in the original one so that the
       tools can match the content accordingly.

       If you are lucky (i.e., if the structures of both documents perfectly match), it will work seamlessly and
       you  will  be set in a few seconds. Otherwise, you may understand why this process has such an ugly name,
       and you'd better be prepared to some grunt work here. In any case, remember that it is the price  to  pay
       to get the comfort of po4a afterward. And the good point is that you have to do so only once.

       I  cannot  emphasize  this too much. In order to ease the process, it is thus important that you find the
       exact version which were used to do the translation. The best situation is when you noted  down  the  VCS
       revision  used  for  the translation and you didn't modify it in the translation process, so that you can
       use it.

       It won't work well when you use the updated original text with the old translation. It remains  possible,
       but  is  harder  and  really should be avoided if possible. In fact, I guess that if you fail to find the
       original text again, the best solution is to find someone to do the gettextization for you (but,  please,
       not me ;).

       Maybe  I'm  too  dramatic  here.  Even  when  things  go  wrong,  it remains ways faster than translating
       everything again. I was able to gettextize the existing French translation of the Perl  documentation  in
       one  day,  even  though  things  did  went  wrong.  That  was  more than two megabytes of text, and a new
       translation would have lasted months or more.

       Let me explain the basis of the procedure first and I will come back on hints  to  achieve  it  when  the
       process goes wrong. To ease comprehension, let's use above example once again.

       Once  you have the old master.doc again which matches with the translation XX.doc, the gettextization can
       be done directly to the PO file doc.XX.po without manual translation of translation.pot file:

        $ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>

       When you're lucky, that's it. You converted your old translation to po4a and can begin with the  updating
       task  right  away. Just follow the procedure explained a few section ago to synchronize your PO file with
       the newest original document, and update the translation accordingly.

       Please note that even when things seem to work properly, there is still room for errors in this  process.
       The  point  is  that  po4a  is  unable to understand the text to make sure that the translation match the
       original. That's why all strings are marked as "fuzzy" in the process. You  should  check  each  of  them
       carefully before removing those markers.

       Often  the  document  structures  don't  match  exactly,  preventing  po4a-gettextize  from doing its job
       properly. At that point, the whole game is about editing the files to get their damn structures matching.

       It may help to read the section Gettextization: how does it  work?  below.   Understanding  the  internal
       process  will  help you to make this work. The good point is that po4a-gettextize is rather verbose about
       what went wrong when it happens. First, it pinpoints where in the documents the structures' discrepancies
       are. You will learn the strings that don't match, their positions in the text, and the type  of  each  of
       them. Moreover, the PO file generated so far will be dumped to gettextization.failed.po.

       -   Remove all extra parts of the translations, such as the section in which you give the translator name
           and  thank  every people who contributed to the translation. Addenda, which are described in the next
           section, will allow you to re-add them afterward.

       -   Do not hesitate to edit both the original and the translation. The most important thing is to get the
           PO file. You will be able to update it afterward. That being said, editing the translation should  be
           preferred when both are possible since it makes things easier when the gettextization is done.

       -   If  needed,  kill  some parts of the original if they happen to not be translated. When synchronizing
           the PO with the document afterward, they will come back from themselves.

       -   If you changed the structure a bit (to merge two  paragraphs,  or  split  another  one),  undo  those
           changes.  If  there are issues in the original, you should inform the original author. Fixing them in
           your translation only fixes them for a part of the community.  And  moreover,  it's  impossible  when
           using po4a ;)

       -   Sometimes,  the  paragraph  content  does  match,  but their types don't. Fixing it is rather format-
           dependent. In POD and man, it often comes from the fact that one of the two contains a line beginning
           with a white space where the other doesn't. In those formats, such paragraph cannot  be  wrapped  and
           thus  become  a  different type. Just remove the space and you are fine. It may also be a typo in the
           tag name.

           Likewise, two paragraphs may get merged together in  POD  when  the  separating  line  contains  some
           spaces, or when there is no empty line between the =item line and the content of the item.

       -   Sometimes,  there  is  a  desynchronization between the files, and the translation is attached to the
           wrong original paragraph. It is the sign that the  real  problem  was  before  in  the  files.  Check
           gettextization.failed.po to see when the desynchronization begins, and fix it there.

       -   Sometimes,  you  get  the strong feeling that po4a ate some parts of the text, either the original or
           the translation. gettextization.failed.po indicates that both of them were gently matching, and  then
           the  gettextization  fails because it tried to match one paragraph with the one after (or before) the
           right one, as if the right one disappeared. Curse po4a as  I  did  when  it  first  happened  to  me.
           Generously.

           This  unfortunate  situation  happens  when the same paragraph is repeated over the document. In that
           case, no new entry is created in the PO file, but a new  reference  is  added  to  the  existing  one
           instead.

           So,  when  the  same paragraph appears twice in the original but both are not translated in the exact
           same way each time, you will get the feeling that a paragraph of the original disappeared. Just  kill
           the  new  translation.  If  you  prefer to kill the first translation instead when the second one was
           actually better, replace the first one with the second.

           In the contrary, if two similar but different paragraphs were translated in the exact same  way,  you
           will  get  the feeling that a paragraph of the translation disappeared. A solution is to add a stupid
           string to the original paragraph (such as "I'm  different").  Don't  be  afraid,  those  things  will
           disappear  during  the  synchronization,  and when the added text is short enough, gettext will match
           your translation to the existing text (marking it as fuzzy, but  you  don't  really  care  since  all
           strings are fuzzy after gettextization).

       Hopefully, those tips will help you making your gettextization work and obtain your precious PO file. You
       are now ready to synchronize your file and begin your translation. Please note that on large text, it may
       happen that the first synchronization takes a long time.

       For example, the first po4a-updatepo of the Perl documentation's French translation (5.5 Mb PO file) took
       about  two  days full on a 1Ghz G5 computer.  Yes, 48 hours. But the subsequent ones only take a dozen of
       seconds on my old laptop. This is because the first time, most of the msgid of the PO  file  don't  match
       any  of  the  POT  file  ones.  This  forces  gettext to search for the closest one using a costly string
       proximity algorithm.

   HOWTO add extra text to translations (like translator's name)?
       Because of the gettext approach, doing this becomes more difficult  in  po4a  than  it  was  when  simply
       editing a new file along the original one. But it remains possible, thanks to the so-called addenda.

       It  may help the comprehension to consider addenda as a sort of patches applied to the localized document
       after processing. They are rather different from the usual patches (they have only one line  of  context,
       which  can  embed  Perl regular expression, and they can only add new text without removing any), but the
       functionalities are the same.

       Their goal is to allow the translator to add extra content to the document which is not  translated  from
       the  original  document.  The most common usage is to add a section about the translation itself, listing
       contributors and explaining how to report bug against the translation.

       An addendum must be provided as a separate file. The first line constitutes a header indicating where  in
       the  produced document they should be placed. The rest of the addendum file will be added verbatim at the
       determined position of the resulting document.

       The header line which specify context  has  a  pretty  rigid  syntax:  It  must  begin  with  the  string
       PO4A-HEADER:,  followed  by  a  semi-colon  (;)  separated  list  of  key=value  fields. White spaces ARE
       important. Note that you cannot use the semi-colon char (;) in the value, and  that  quoting  it  doesn't
       help.  Optionally, spaces ( ) may be inserted before key for readability.

       Although  this  context  search  may  be  considered  to  operate  roughly on each line of the translated
       document, it actually operates on the internal data string of the  translated  document.   This  internal
       data  string  may  be  a  text  spanning a paragraph containing multiple lines or may be a XML tag itself
       alone.  The exact insertion point of the addendum must be before or after the internal  data  string  and
       can not be within the internal data string.

       The  actual  internal data string of the translated document can be visualized by executing po4a in debug
       mode.

       Again, it sounds scary, but the examples given below should help you to find how to write the header line
       you need. To illustrate the discussion, assume we want to add a section called "About  this  translation"
       after the "About this document" one.

       Here are the possible header keys:

       mode (mandatory)
           It can be either the string before or after.

           If  mode=before,  the insertion point is determined by one step regex match specified by the position
           argument regex.  The insertion point is immediately before the uniquely matched internal data  string
           of the translated document.

           If  mode=after, the insertion point is determined by two step regex matches specified by the position
           argument regex; and by the beginboundary or endboundary argument regex.

           Since there may be multiple sections for the assumed case, let's use 2 step approach.

                mode=after

       position (mandatory)
           A Perl regexp for specifying the context.

           If more than one internal data strings match this expression (or none), the search for the  insertion
           point  and  addition of the addendum will fail. It is indeed better to report an error than inserting
           the addendum at the wrong location.

           If mode=before, the insertion point is specified to be immediately before the  internal  data  string
           uniquely matching the position argument regex.

           If  mode=after,  the  search  for the insertion point is narrowed down to the data after the internal
           data string uniquely matching the position argument regex.  The  exact  insertion  point  is  further
           specified by the beginboundary or endboundary.

           In  our  case,  we need to skip several preceding sections by narrowing down search using the section
           title string.

                position=About this document

           (In reality, you need to use the translated section title string here, instead.)

       beginboundary (used only when mode=after, and mandatory in that case)
       endboundary (idem)
           A second Perl regexp required only when mode=after. The addendum will be placed immediately before or
           after the first internal data string matching  the  beginboundary  or  endboundary  argument  regexp,
           respectively.

           In our case, we can choose to indicate the end of the section we match by adding:

              endboundary=</section>

           or to indicate the beginning of the next section by indicating:

              beginboundary=<section>

           In  both  cases, our addendum will be placed after the </section> and before the <section>. The first
           one is better since it will work even if the document gets reorganized.

           Both forms exist because documentation formats are different. In some of them, there is a way to mark
           the end of a section (just like the </section> we just used), while some other don't explicitly  mark
           the end of section (like in man). In the former case, you want to make a boundary matching the end of
           a  section,  so  that  the  insertion  point  comes  after it. In the latter case, you want to make a
           boundary matching the beginning of the next section, so that the insertion point  comes  just  before
           it.

       This can seem obscure, but hopefully, the next examples will enlighten you.

        To sum up the example we used so far, in order to add a section called "About this translation" after
       the "About this document" one in a SGML document, you can use either of those header lines:
          PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
          PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>

        If you want to add something after the following nroff section:
           .SH "AUTHORS"

         You  should  select two step approach by setting mode=after.  Then you should narrow down search to the
         line after AUTHORS with the position argument regex.  Then, you should match the beginning of the  next
         section (i.e., ^\.SH) with the beginboundary argument regex. That is to say:

          PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH

        If you want to add something into a section (like after "Copyright Big Dude") instead of adding a whole
       section, give a position matching this line, and give a beginboundary matching any line.
          PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^

        If you want to add something at the end of the document, give a position matching any line of your
       document (but only one line. Po4a won't proceed if it's not unique), and give an endboundary matching
       nothing. Don't use simple strings here like "EOF", but prefer those which have less chance to be in your
       document.
          PO4A-HEADER:mode=after;position=About this document;beginboundary=FakePo4aBoundary

       In any case, remember that these are regexp. For example, if you want to match the end of a nroff section
       ending with the line

         .fi

       don't  use  .fi  as endboundary, because it will match with "the[ fi]le", which is obviously not what you
       expect. The correct endboundary in that case is: ^\.fi$.

       If the addendum doesn't go where you expected, try to pass the -vv argument to the tools,  so  that  they
       explain you what they do while placing the addendum.

       More detailed example

       Original document (POD formatted):

        |=head1 NAME
        |
        |dummy - a dummy program
        |
        |=head1 AUTHOR
        |
        |me

       Then,  the following addendum will ensure that a section (in French) about the translator is added at the
       end of the file (in French, "TRADUCTEUR" means "TRANSLATOR", and "moi" means "me").

        |PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
        |
        |=head1 TRADUCTEUR
        |
        |moi
        |

       In order to put your addendum before the AUTHOR, use the following header:

        PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1

       This works because  the  next  line  matching  the  beginboundary  /^=head1/  after  the  section  "NAME"
       (translated  to  "NOM" in French), is the one declaring the authors. So, the addendum will be put between
       both sections. Note that if another section is added between NAME and AUTHOR sections  later,  po4a  will
       wrongfully put the addenda before the new section.

       To avoid this you may accomplish the same using mode=before:

        PO4A-HEADER:mode=before;position=^=head1 AUTEUR

   HOWTO do all this in one program invocation?
       The  use  of  po4a  proved  to  be  a  bit error prone for the users since you have to call two different
       programs in the right order (po4a-updatepo and then po4a-translate), each of them  needing  more  than  3
       arguments.  Moreover,  it  was  difficult with this system to use only one PO file for all your documents
       when more than one format was used.

       The po4a(1) program was designed to solve those difficulties. Once  your  project  is  converted  to  the
       system,  you  write a simple configuration file explaining where your translation files are (PO and POT),
       where the original documents are, their formats and where their translations should be placed.

       Then, calling po4a(1) on this file ensures that the  PO  files  are  synchronized  against  the  original
       document,  and that the translated document are generated properly. Of course, you will want to call this
       program twice: once before editing the PO files to update them and once afterward  to  get  a  completely
       updated translated document. But you only need to remember one command line.

   HOWTO customize po4a?
       po4a modules have options (specified with the -o option) that can be used to change the module behavior.

       You  can  also  edit the source code of the existing modules or even write your own modules. To make them
       visible to po4a, copy your modules into a path called "/bli/blah/blu/lib/Locale/Po4a/"  and  then  adding
       the path "/bli/blah/blu" in the "PERLIB" or "PERL5LIB" environment variable. For example:

          PERLLIB=$PWD/lib po4a --previous po4a/po4a.cfg

       Note: the actual name of the lib directory is not important.

How does it work?

       This  chapter  gives  you  a brief overview of the po4a internals, so that you may feel more confident to
       help us maintaining and improving it. It may also help you understanding why it  does  not  do  what  you
       expected, and how to solve your problems.

   What's the big picture here?
       The  po4a  architecture is object oriented (in Perl. Isn't that neat?). The common ancestor to all parser
       classes is called TransTractor. This strange name comes from the fact that it is  at  the  same  time  in
       charge of translating document and extracting strings.

       More  formally,  it  takes  a  document to translate plus a PO file containing the translations to use as
       input while producing two separate outputs: Another PO file (resulting of the extraction of  translatable
       strings  from the input document), and a translated document (with the same structure than the input one,
       but with all translatable  strings  replaced  with  content  of  the  input  PO).  Here  is  a  graphical
       representation of this:

          Input document --\                             /---> Output document
                            \      TransTractor::       /       (translated)
                             +-->--   parse()  --------+
                            /                           \
          Input PO --------/                             \---> Output PO
                                                                (extracted)

       This  little  bone  is  the  core  of  all the po4a architecture. If you omit the input PO and the output
       document, you get po4a-gettextize. If you provide both  input  and  disregard  the  output  PO,  you  get
       po4a-translate.   The  po4a  calls  TransTractor  twice  and calls msgmerge -U between these TransTractor
       invocations to provide one-stop solution with a single configuration file.

       TransTractor::parse() is a virtual function implemented by each module. Here is a little example to  show
       you how it works. It parses a list of paragraphs, each of them beginning with <p>.

         1 sub parse {
         2   PARAGRAPH: while (1) {
         3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
         4     $my $first=1;
         5     while (($line,$lref)=$document->shiftline() && defined($line)) {
         6       if ($line =~ m/<p>/ && !$first--; ) {
         7         $document->unshiftline($line,$lref);
         8
         9         $paragraph =~ s/^<p>//s;
        10         $document->pushline("<p>".$document->translate($paragraph,$pararef));
        11
        12         next PARAGRAPH;
        13       } else {
        14         $paragraph .= $line;
        15         $pararef = $lref unless(length($pararef));
        16       }
        17     }
        18     return; # Did not got a defined line? End of input file.
        19   }
        20 }

       On  line  6 and 7, we encounter "shiftline()" and "unshiftline()".  These help you to read and unread the
       head of internal input data stream of master document into the line string and its reference.  Here,  the
       reference  is  provided  by a string "$filename:$linenum".  Please remember Perl only has one dimensional
       array data structure.  So codes handling the internal input data stream line are a bit cryptic.

       On line 6, we encounter <p> for the second time. That's the signal of the next paragraph. We should  thus
       put  the  just obtained line back into the original document (line 7) and push the paragraph built so far
       into the outputs. After removing the leading <p> of it on line 9, we push the concatenation of  this  tag
       with the translation of the rest of the paragraph.

       This  translate()  function is very cool. It pushes its argument into the output PO file (extraction) and
       returns its translation as found in the input PO file (translation). Since  it's  used  as  part  of  the
       argument of pushline(), this translation lands into the output document.

       Isn't  that cool? It is possible to build a complete po4a module in less than 20 lines when the format is
       simple enough…

       You can learn more about this in Locale::Po4a::TransTractor(3pm).

   Gettextization: how does it work?
       The idea here is to take the original document and its translation, and to say  that  the  Nth  extracted
       string from the translation is the translation of the Nth extracted string from the original. In order to
       work,  both  files  must  share  exactly the same structure. For example, if the files have the following
       structure, it is very unlikely that the 4th string in translation (of type 'chapter') is the  translation
       of the 4th string in original (of type 'paragraph').

           Original         Translation

         chapter            chapter
           paragraph          paragraph
           paragraph          paragraph
           paragraph        chapter
         chapter              paragraph
           paragraph          paragraph

       For  that,  po4a parsers are used on both the original and the translation files to extract PO files, and
       then a third PO file is built from them taking strings from the second as translation of strings from the
       first. In order to check that the strings we put together are actually the translations  of  each  other,
       document  parsers  in  po4a should put information about the syntactical type of extracted strings in the
       document (all existing ones do so, yours should also). Then, this information is used to make  sure  that
       both  documents  have the same syntax. In the previous example, it would allow us to detect that string 4
       is a paragraph in one case, and a chapter title in another case and to report the problem.

       In theory, it would be possible to detect the problem, and resynchronize the files afterward  (just  like
       diff does). But what we should do of the few strings before desynchronizations is not clear, and it would
       produce bad results some times. That's why the current implementation don't try to resynchronize anything
       and verbosely fail when something goes wrong, requiring manual modification of files to fix the problem.

       Even  with  these  precautions, things can go wrong very easily here. That's why all translations guessed
       this way are marked fuzzy to make sure that the translator reviews and checks them.

   Addendum: How does it work?
       Well, that's pretty easy here. The translated document is not written  directly  to  disk,  but  kept  in
       memory  until  all  the  addenda are applied. The algorithms involved here are rather straightforward. We
       look for a line matching the position regexp, and insert the addendum before it if we're in  mode=before.
       If not, we search for the next line matching the boundary and insert the addendum after this line if it's
       an endboundary or before this line if it's a beginboundary.

FAQ

       This  chapter  groups  the  Frequently  Asked  Questions. In fact, most of the questions for now could be
       formulated that way: "Why is it designed this way, and not that one?" If you think po4a isn't  the  right
       answer to documentation translation, you should consider reading this section. If it does not answer your
       question, please contact us on the <devel@lists.po4a.org> mailing list. We love feedback.

   Why to translate each paragraph separately?
       Yes,  in  po4a,  each  paragraph  is  translated  separately  (in fact, each module decides this, but all
       existing modules do so, and yours should also).  There are two main advantages to this approach:

       • When the technical parts of the document are hidden from the scene,  the  translator  can't  mess  with
         them. The fewer markers we present to the translator the less error he can do.

       • Cutting  the  document  helps  in  isolating the changes to the original document. When the original is
         modified, finding what parts of the translation need to be updated is eased by this process.

       Even with these advantages, some people don't like the idea of  translating  each  paragraph  separately.
       Here are some of the answers I can give to their fear:

       • This  approach  proved  successfully  in the KDE project and allows people there to produce the biggest
         corpus of translated and up to date documentation I know.

       • The translators can still use the context to translate, since the strings in the PO  file  are  in  the
         same  order  than  in the original document. Translating sequentially is thus rather comparable whether
         you use po4a or not.  And in any case, the best way to get the context remains to convert the  document
         to a printable format since the text formatting ones are not really readable, IMHO.

       • This  approach  is the one used by professional translators. I agree, that they have somewhat different
         goals than open-source translators. The maintenance is for example often less critical  to  them  since
         the content changes rarely.

   Why not to split on sentence level (or smaller)?
       Professional translator tools sometimes split the document at the sentence level in order to maximize the
       reusability  of  previous translations and speed up their process.  The problem is that the same sentence
       may have several translations, depending on the context.

       Paragraphs are by definition longer than sentences.  It  will  hopefully  ensure  that  having  the  same
       paragraph  in  two  documents  will have the same meaning (and translation), regardless of the context in
       each case.

       Splitting on smaller parts than the sentence would be very bad. It would be a bit  long  to  explain  why
       here,  but interested reader can refer to the Locale::Maketext::TPJ13(3pm) man page (which comes with the
       Perl documentation), for example. To make short, each language has  its  specific  syntactic  rules,  and
       there  is  no way to build sentences by aggregating parts of sentences working for all existing languages
       (or even for the 5 of the 10 most spoken ones, or even less).

   Why not put the original as comment along with translation (or the other way around)?
       At the first glance, gettext doesn't seem to be adapted to all kind of  translations.   For  example,  it
       didn't seem adapted to debconf, the interface all Debian packages use for their interaction with the user
       during  installation.  In  that  case,  the  texts to translate were pretty short (a dozen lines for each
       package), and it was difficult to put the translation in a specialized file since it has to be  available
       before the package installation.

       That's  why the debconf developer decided to implement another solution, where translations are placed in
       the same file than the original. This is rather appealing. One would even want to do this  for  XML,  for
       example. It would look like that:

        <section>
         <title lang="en">My title</title>
         <title lang="fr">Mon titre</title>

         <para>
          <text lang="en">My text.</text>
          <text lang="fr">Mon texte.</text>
         </para>
        </section>

       But  it  was  so problematic that a PO-based approach is now used. Only the original can be edited in the
       file, and the translations must take place in PO files extracted from the  master  template  (and  placed
       back at package compilation time). The old system was deprecated because of several issues:

       •   maintenance problems

           If several translators provide a patch at the same time, it gets hard to merge them together.

           How  will  you detect changes to the original, which need to be applied to the translations? In order
           to use diff, you have to note which version of the original you translated. I.e., you need a PO  file
           in your file ;)

       •   encoding problems

           This  solution  is  viable when only European languages are involved, but the introduction of Korean,
           Russian and/or Arab really complicate the picture.  UTF could be a solution, but there are still some
           problems with it.

           Moreover, such problems are hard to detect (i.e., only Korean readers will detect that  the  encoding
           of Korean is broken [because of the Russian translator]).

       gettext solves all those problems together.

   But gettext wasn't designed for that use!
       That's  true,  but  until  now  nobody  came with a better solution. The only known alternative is manual
       translation, with all the maintenance issues.

   What about the other translation tools for documentation using gettext?
       As far as I know, there are only two of them:

       poxml
           This is the tool developed by KDE people to handle DocBook XML. AFAIK, it was the  first  program  to
           extract strings to translate from documentation to PO files, and inject them back after translation.

           It  can  only  handle  XML,  and only a particular DTD. I'm quite unhappy with the handling of lists,
           which end in one big msgid. When the list become big, the chunk becomes harder to swallow.

       po-debiandoc
           This program done by Denis Barbier is a sort of precursor of the po4a SGML module, which more or less
           deprecates it. As the name says, it handles  only  the  DebianDoc  DTD,  which  is  more  or  less  a
           deprecated DTD.

       The  main advantages of po4a over them are the ease of extra content addition (which is even worse there)
       and the ability to achieve gettextization.

   Educating developers about translation
       When you try to translate documentation or programs, you face three kinds of problems;  linguistics  (not
       everybody  speaks  two  languages),  technical  (that's  why  po4a  exists) and relational/human. Not all
       developers understand the necessity of translating stuff. Even when good willed, they may ignore  how  to
       ease  the  work  of  translators.  To  help  with that, po4a comes with lot of documentation which can be
       referred to.

       Another important point is that each translated file begins with a short comment indicating what the file
       is, how to use it. This should help the poor developers flooded with tons of files in different languages
       they hardly speak, and help them dealing correctly with it.

       In the po4a project, translated documents are not source files anymore, in the sense that these files are
       not the preferred form of the work for making modifications to it. Since this is  rather  unconventional,
       that's a source of easy mistakes. That's why all files present this header:

        |       *****************************************************
        |       *           GENERATED FILE, DO NOT EDIT             *
        |       * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
        |       *****************************************************
        |
        | This file was generated by po4a-translate(1). Do not store it (in VCS,
        | for example), but store the PO file used as source file by po4a-translate.
        |
        | In fact, consider this as a binary, and the PO file as a regular source file:
        | If the PO gets lost, keeping this translation up-to-date will be harder ;)

       Likewise,  gettext's  regular  PO  files only need to be copied to the po/ directory. But this is not the
       case of the ones manipulated by po4a. The major risk  here  is  that  a  developer  erases  the  existing
       translation  of  his  program with the translation of his documentation. (Both of them can't be stored in
       the same PO file, because the program  needs  to  install  its  translation  as  an  mo  file  while  the
       documentation  only  uses  its  translation at compile time). That's why the PO files produced by the po-
       debiandoc module contain the following header:

        #
        #  ADVISES TO DEVELOPERS:
        #    - you do not need to manually edit POT or PO files.
        #    - this file contains the translation of your debconf templates.
        #      Do not replace the translation of your program with this !!
        #        (or your translators will get very upset)
        #
        #  ADVISES TO TRANSLATORS:
        #    If you are not familiar with the PO format, gettext documentation
        #     is worth reading, especially sections dedicated to this format.
        #    For example, run:
        #         info -n '(gettext)PO Files'
        #         info -n '(gettext)Header Entry'
        #
        #    Some information specific to po-debconf are available at
        #            /usr/share/doc/po-debconf/README-trans
        #         or http://www.debian.org/intl/l10n/po-debconf/README-trans
        #

   SUMMARY of the advantages of the gettext based approach
       • The translations are not stored along  with  the  original,  which  makes  it  possible  to  detect  if
         translations become out of date.

       • The  translations are stored in separate files from each other, which prevents translators of different
         languages from interfering, both when submitting their patch and at the file encoding level.

       • It is based internally on gettext (but po4a offers a very simple interface so that you  don't  need  to
         understand the internals to use it).  That way, we don't have to re-implement the wheel, and because of
         their wide use, we can think that these tools are more or less bug free.

       • Nothing  changed  for  the end-user (beside the fact translations will hopefully be better maintained).
         The resulting documentation file distributed is exactly the same.

       • No need for translators to learn a new file syntax and their favorite PO file editor  (like  Emacs'  PO
         mode, Lokalize or Gtranslator) will work just fine.

       • gettext  offers a simple way to get statistics about what is done, what should be reviewed and updated,
         and what is still to do. Some example can be found at those addresses:

          - https://docs.kde.org/stable5/en/kdesdk/lokalize/project-view.html
          - http://www.debian.org/intl/l10n/

       But everything isn't green, and this approach also has some disadvantages we have to deal with.

       • Addenda are… strange at the first glance.

       • You can't adapt the translated text to your preferences, like splitting a paragraph here,  and  joining
         two  other ones there. But in some sense, if there is an issue with the original, it should be reported
         as a bug anyway.

       • Even with an easy interface, it remains a new tool people have to learn.

         One of my dreams would be to integrate somehow po4a to Gtranslator or Lokalize.  When  a  documentation
         file is opened, the strings are automatically extracted, and a translated file + po file can be written
         to  disk. If we manage to do an MS Word (TM) module (or at least RTF) professional translators may even
         use it.

AUTHORS

        Denis Barbier <barbier,linuxfr.org>
        Martin Quinson (mquinson#debian.org)

Po4a Tools                                         2020-04-15                                            PO4A(7)