Provided by: po4a_0.47-2_all bug

NAME

       po4a - framework to translate documentation and other materials

Introduction

       The po4a (PO for anything) project goal is to ease translations (and more interestingly,
       the maintenance of translations) using gettext tools on areas where they were not expected
       like documentation.

Table of content

       This document is organized as follow:

       1 Why should I use po4a? What is it good for?
           This introducing chapter explains the motivation of the project and its philosophy.
           You should read it first if you are in the process of evaluating po4a for your own
           translations.

       2 How to use po4a?
           This chapter is a sort of reference manual, trying to answer the users' questions and
           to give you a better understanding of the whole process. This introduces how to do
           things with po4a and serve as an introduction to the documentation of the specific
           tools.

           HOWTO begin a new translation?
           HOWTO change the translation back to a documentation file?
           HOWTO update a po4a translation?
           HOWTO convert a pre-existing translation to po4a?
           HOWTO add extra text to translations (like translator's name)?
           HOWTO do all this in one program invocation?
           HOWTO customize po4a?
       3 How does it work?
           This chapter gives you a brief overview of the po4a internals, so that you may feel
           more confident to help us maintaining and improving it. It may also help you
           understanding why it does not do what you expected, and how to solve your problems.

       4 FAQ
           This chapter groups the Frequently Asked Questions. In fact, most of the questions for
           now could be formulated that way: "Why is it designed this way, and not that one?" If
           you think po4a isn't the right answer to documentation translation, you should
           consider reading this section. If it does not answer your question, please contact us
           on the <po4a-devel@lists.alioth.debian.org> mailing list. We love feedback.

       5 Specific notes about modules
           This chapter presents the specificities of each module from the translator and
           original author's point of view. Read this to learn the syntax you will encounter when
           translating stuff in this module, or the rules you should follow in your original
           document to make translators' life easier.

           Actually, this section is not really part of this document. Instead, it is placed in
           each module's documentation. This helps ensuring that the information is up to date by
           keeping the documentation and the code together.

Why should I use po4a? What it is good for?

       I like the idea of open-source software, making it possible for everybody to access
       software and its source code. But being French, I'm well aware that the licensing is not
       the only restriction to the openness of software: non-translated free software is useless
       for non-English speakers, and we still have some work to make it available to really
       everybody out there.

       The perception of this situation by the open-source actors did dramatically improve
       recently. We, as translators, won the first battle and convinced everybody of the
       translations' importance. But unfortunately, it was the easy part. Now, we have to do the
       job and actually translate all this stuff.

       Actually, open-source software themselves benefit of a rather decent level of translation,
       thanks to the wonderful gettext tool suite. It is able to extract the strings to translate
       from the program, present a uniform format to translators, and then use the result of
       their works at run time to display translated messages to the user.

       But the situation is rather different when it comes to documentation. Too often, the
       translated documentation is not visible enough (not distributed as a part of the program),
       only partial, or not up to date. This last situation is by far the worst possible one.
       Outdated translation can turn out to be worse than no translation at all to the users by
       describing old program behavior which are not in use anymore.

   The problem to solve
       Translating documentation is not very difficult in itself. Texts are far longer than the
       messages of the program and thus take longer to be achieved, but no technical skill is
       really needed to do so. The difficult part comes when you have to maintain your work.
       Detecting which parts did change and need to be updated is very difficult, error-prone and
       highly unpleasant. I guess that this explains why so much translated documentation out
       there are outdated.

   The po4a answers
       So, the whole point of po4a is to make the documentation translation maintainable. The
       idea is to reuse the gettext methodology to this new field. Like in gettext, texts are
       extracted from their original locations in order to be presented in a uniform format to
       the translators. The classical gettext tools help them updating their works when a new
       release of the original comes out. But to the difference of the classical gettext model,
       the translations are then re-injected in the structure of the original document so that
       they can be processed and distributed just like the English version.

       Thanks to this, discovering which parts of the document were changed and need an update
       becomes very easy. Another good point is that the tools will make almost all the work when
       the structure of the original document gets fundamentally reorganized and when some
       chapters are moved around, merged or split. By extracting the text to translate from the
       document structure, it also keeps you away from the text formatting complexity and reduces
       your chances to get a broken document (even if it does not completely prevent you to do
       so).

       Please also see the FAQ below in this document for a more complete list of the advantages
       and disadvantages of this approach.

   Supported formats
       Currently, this approach has been successfully implemented to several kinds of text
       formatting formats:

       man

       The good old manual pages' format, used by so much programs out there. The po4a support is
       very welcome here since this format is somewhat difficult to use and not really friendly
       to the newbies.  The Locale::Po4a::Man(3pm) module also supports the mdoc format, used by
       the BSD man pages (they are also quite common on Linux).

       pod

       This is the Perl Online Documentation format. The language and extensions themselves are
       documented that way, as well as most of the existing Perl scripts. It makes easy to keep
       the documentation close to the actual code by embedding them both in the same file. It
       makes programmer life easier, but unfortunately, not the translator one.

       sgml

       Even if somewhat superseded by XML nowadays, this format is still used rather often for
       documents which are more than a few screens long. It allows you to make complete books.
       Updating the translation of so long documents can reveal to be a real nightmare. diff
       reveals often useless when the original text was re-indented after update. Fortunately,
       po4a can help you in that process.

       Currently, only the DebianDoc and DocBook DTD are supported, but adding support to a new
       one is really easy. It is even possible to use po4a on an unknown SGML DTD without
       changing the code by providing the needed information on the command line. See
       Locale::Po4a::Sgml(3pm) for details.

       TeX / LaTeX

       The LaTeX format is a major documentation format used in the Free Software world and for
       publications.  The Locale::Po4a::LaTeX(3pm) module was tested with the Python
       documentation, a book and some presentations.

       texinfo

       All the GNU documentation is written in this format (that's even one of the requirement to
       become an official GNU project).  The support for Locale::Po4a::Texinfo(3pm) in po4a is
       still at the beginning.  Please report bugs and feature requests.

       xml

       The XML format is a base format for many documentation formats.

       Currently, the DocBook DTD is supported by po4a. See Locale::Po4a::Docbook(3pm) for
       details.

       others

       Po4a can also handle some more rare or specialized formats, such as the documentation of
       compilation options for the 2.4.x kernels or the diagrams produced by the dia tool. Adding
       a new one is often very easy and the main task is to come up with a parser of your target
       format. See Locale::Po4a::TransTractor(3pm) for more information about this.

   Unsupported formats
       Unfortunately, po4a still lacks support for several documentation formats.

       There is a whole bunch of other formats we would like to support in po4a, and not only
       documentation ones. Indeed, we aim at plugging all "market holes" left by the classical
       gettext tools.  It encompass package descriptions (deb and rpm), package installation
       scripts questions, package changelogs, and all specialized file formats used by the
       programs such as game scenarios or wine resource files.

How to use po4a?

       This chapter is a sort of reference manual, trying to answer the users' questions and to
       give you a better understanding of the whole process. This introduces how to do things
       with po4a and serve as an introduction to the documentation of the specific tools.

   Graphical overview
       The following schema gives an overview of the process of translating documentation using
       po4a. Do not be afraid by its apparent complexity, it comes from the fact that the whole
       process is represented here. Once you converted your project to po4a, only the right part
       of the graphic is relevant.

       Note that master.doc is taken as an example for the documentation to be translated and
       translation.doc is the corresponding translated text.  The suffix could be .pod, .xml, or
       .sgml depending on its format. Each part of the picture will be detailed in the next
       sections.

                                          master.doc
                                              |
                                              V
            +<-----<----+<-----<-----<--------+------->-------->-------+
            :           |                     |                        :
       {translation}    |         { update of master.doc }             :
            :           |                     |                        :
          XX.doc        |                     V                        V
       (optional)       |                 master.doc ->-------->------>+
            :           |                   (new)                      |
            V           V                     |                        |
         [po4a-gettextize]   doc.XX.po--->+   |                        |
                 |            (old)       |   |                        |
                 |              ^         V   V                        |
                 |              |     [po4a-updatepo]                  |
                 V              |           |                          V
          translation.pot       ^           V                          |
                 |              |         doc.XX.po                    |
                 |              |         (fuzzy)                      |
          { translation }       |           |                          |
                 |              ^           V                          V
                 |              |     {manual editing}                 |
                 |              |           |                          |
                 V              |           V                          V
             doc.XX.po --->---->+<---<---- doc.XX.po   addendum     master.doc
             (initial)                   (up-to-date) (optional)   (up-to-date)
                 :                          |            |             |
                 :                          V            |             |
                 +----->----->----->------> +            |             |
                                            |            |             |
                                            V            V             V
                                            +------>-----+------<------+
                                                         |
                                                         V
                                                  [po4a-translate]
                                                         |
                                                         V
                                                       XX.doc
                                                   (up-to-date)

       On the left part, the conversion of a translation not using po4a to this system is shown.
       On the top of the right part, the action of the original author is depicted (updating the
       documentation).  The middle of the right part is where the automatic actions of po4a are
       depicted. The new material are extracted, and compared against the exiting translation.
       Parts which didn't change are found, and previous translation is used. Parts which where
       partially modified are also connected to the previous translation, but with a specific
       marker indicating that the translation must be updated. The bottom of the figure shows how
       a formatted document is built.

       Actually, as a translator, the only manual operation you have to do is the part marked
       {manual editing}. Yeah, I'm sorry, but po4a helps you translate.  It does not translate
       anything for you...

   HOWTO begin a new translation?
       This section presents the needed steps required to begin a new translation with po4a. The
       refinements involved in converting an existing project to this system are detailed in the
       relevant section.

       To begin a new translation using po4a, you have to do the following steps:

       - Extract the text which have to be translated from the original <master.doc> document
         into a new translation template <translation.pot> file (the gettext format). For that,
         use the po4a-gettextize program this way:

           $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>

         <format> is naturally the format used in the master.doc document. As expected, the
         output goes into translation.pot.  Please refer to po4a-gettextize(1) for more details
         about the existing options.

       - Actually translate what should be translated. For that, you have to rename the POT file
         for example to doc.XX.po (where XX is the ISO639 code of the language you are
         translating to, e.g. fr for French), and edit the resulting file. It is often a good
         idea to not name the file XX.po to avoid confusion with the translation of the program
         messages, but this your call.  Don't forget to update the PO file headers, they are
         important.

         The actual translation can be done using the Emacs' or Vi's PO mode, Lokalize (KDE
         based), Gtranslator (GNOME based) or whichever program you prefer to use for them (e.g.
         Virtaal).

         If you wish to learn more about this, you definitively need to refer to the gettext
         documentation, available in the gettext-doc package.

   HOWTO change the translation back to a documentation file?
       Once you're done with the translation, you want to get the translated documentation and
       distribute it to users along with the original one.  For that, use the po4a-translate(1)
       program like that (where XX is the language code):

         $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>

       As before, <format> is the format used in the master.doc document. But this time, the PO
       file provided with the -p flag is part of the input. This is your translation. The output
       goes into XX.doc.

       Please refer to po4a-translate(1) for more details.

   HOWTO update a po4a translation?
       To update your translation when the original master.doc file has changed, use the
       po4a-updatepo(1) program like that:

         $ po4a-updatepo -f <format> -m <new_master.doc> -p <old_doc.XX.po>

       (Please refer to po4a-updatepo(1) for more details)

       Naturally, the new paragraph in the document won't get magically translated in the PO file
       with this operation, and you'll need to update the PO file manually. Likewise, you may
       have to rework the translation for paragraphs which were modified a bit. To make sure you
       won't miss any of them, they are marked as "fuzzy" during the process and you have to
       remove this marker before the translation can be used by po4a-translate.  As for the
       initial translation, the best is to use your favorite PO editor here.

       Once your PO file is up-to-date again, without any untranslated or fuzzy string left, you
       can generate a translated documentation file, as explained in the previous section.

   HOWTO convert a pre-existing translation to po4a?
       Often, you used to translate manually the document happily until a major reorganization of
       the original master.doc document happened. Then, after some unpleasant tries with diff or
       similar tools, you want to convert to po4a.  But of course, you don't want to loose your
       existing translation in the process. Don't worry, this case is also handled by po4a tools
       and is called gettextization.

       The key here is to have the same structure in the translated document and in the original
       one so that the tools can match the content accordingly.

       If you are lucky (i.e., if the structures of both documents perfectly match), it will work
       seamlessly and you will be set in a few seconds. Otherwise, you may understand why this
       process has such an ugly name, and you'd better be prepared to some grunt work here. In
       any case, remember that it is the price to pay to get the comfort of po4a afterward. And
       the good point is that you have to do so only once.

       I cannot emphasis this too much. In order to ease the process, it is thus important that
       you find the exact version which were used to do the translation. The best situation is
       when you noted down the VCS revision used for the translation and you didn't modify it in
       the translation process, so that you can use it.

       It won't work well when you use the updated original text with the old translation. It
       remains possible, but is harder and really should be avoided if possible. In fact, I guess
       that if you fail to find the original text again, the best solution is to find someone to
       do the gettextization for you (but, please, not me ;).

       Maybe I'm too dramatic here. Even when things go wrong, it remains ways faster than
       translating everything again. I was able to gettextize the existing French translation of
       the Perl documentation in one day, even though things did went wrong. That was more than
       two megabytes of text, and a new translation would have lasted months or more.

       Let me explain the basis of the procedure first and I will come back on hints to achieve
       it when the process goes wrong. To ease comprehension, let's use above example once again.

       Once you have the old master.doc again which matches with the translation XX.doc, the
       gettextization can be done directly to the PO file doc.XX.po without manual translation of
       translation.pot file:

        $ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>

       When you're lucky, that's it. You converted your old translation to po4a and can begin
       with the updating task right away. Just follow the procedure explained a few section ago
       to synchronize your PO file with the newest original document, and update the translation
       accordingly.

       Please note that even when things seem to work properly, there is still room for errors in
       this process. The point is that po4a is unable to understand the text to make sure that
       the translation match the original. That's why all strings are marked as "fuzzy" in the
       process. You should check each of them carefully before removing those markers.

       Often the document structures don't match exactly, preventing po4a-gettextize from doing
       its job properly. At that point, the whole game is about editing the files to get their
       damn structures matching.

       It may help to read the section Gettextization: how does it work? below.  Understanding
       the internal process will help you to make this work. The good point is that
       po4a-gettextize is rather verbose about what went wrong when it happens. First, it
       pinpoints where in the documents the structures' discrepancies are. You will learn the
       strings that don't match, their positions in the text, and the type of each of them.
       Moreover, the PO file generated so far will be dumped to gettextization.failed.po.

       -   Remove all extra parts of the translations, such as the section in which you give the
           translator name and thank every people who contributed to the translation. Addenda,
           which are described in the next section, will allow you to re-add them afterward.

       -   Do not hesitate to edit both the original and the translation. The most important
           thing is to get the PO file. You will be able to update it afterward. That being said,
           editing the translation should be preferred when both are possible since it makes
           things easier when the gettextization is done.

       -   If needed, kill some parts of the original if they happen to not be translated. When
           synchronizing the PO with the document afterward, they will come back from themselves.

       -   If you changed the structure a bit (to merge two paragraphs, or split another one),
           undo those changes. If there are issues in the original, you should inform the
           original author. Fixing them in your translation only fixes them for a part of the
           community. And moreover, it's impossible when using po4a ;)

       -   Sometimes, the paragraph content does match, but their types don't. Fixing it is
           rather format-dependant. In POD and man, it often comes from the fact that one of the
           two contains a line beginning with a white space where the other doesn't. In those
           formats, such paragraph cannot be wrapped and thus become a different type. Just
           remove the space and you are fine. It may also be a typo in the tag name.

           Likewise, two paragraphs may get merged together in POD when the separating line
           contains some spaces, or when there is no empty line between the =item line and the
           content of the item.

       -   Sometimes, there is a desynchronization between the files, and the translation is
           attached to the wrong original paragraph. It is the sign that the real problem was
           before in the files. Check gettextization.failed.po to see when the desynchronization
           begins, and fix it there.

       -   Sometimes, you get the strong feeling that po4a ate some parts of the text, either the
           original or the translation. gettextization.failed.po indicates that both of them
           where gently matching, and then the gettextization fails because it tried to match one
           paragraph with the one after (or before) the right one, as if the right one
           disappeared. Curse po4a as I did when it first happened to me. Generously.

           This unfortunate situation happens when the same paragraph is repeated over the
           document. In that case, no new entry is created in the PO file, but a new reference is
           added to the existing one instead.

           So, when the same paragraph appears twice in the original but both are not translated
           in the exact same way each time, you will get the feeling that a paragraph of the
           original disappeared. Just kill the new translation. If you prefer to kill the first
           translation instead when the second one was actually better, remove the second one
           from where it is and put the first one in the place of the second one.

           In the contrary, if two similar but different paragraphs were translated in the exact
           same way, you will get the feeling that a paragraph of the translation disappeared. A
           solution is to add a stupid string to the original paragraph (such as "I'm
           different"). Don't be afraid, those things will disappear during the synchronization,
           and when the added text is short enough, gettext will match your translation to the
           existing text (marking it as fuzzy, but you don't really care since all strings are
           fuzzy after gettextization).

       Hopefully, those tips will help you making your gettextization work and obtain your
       precious PO file. You are now ready to synchronize your file and begin your translation.
       Please note that on large text, it may happen that the first synchronization takes a long
       time.

       For example, the first po4a-updatepo of the Perl documentation's French translation (5.5
       Mb PO file) took about two days full on a 1Ghz G5 computer.  Yes, 48 hours. But the
       subsequent ones only take a dozen of seconds on my old laptop. This is because the first
       time, most of the msgid of the PO file don't match any of the POT file ones. This forces
       gettext to search for the closest one using a costly string proximity algorithm.

   HOWTO add extra text to translations (like translator's name)?
       Because of the gettext approach, doing this becomes more difficult in po4a than it was
       when simply editing a new file along the original one. But it remains possible, thanks to
       the so-called addenda.

       It may help the comprehension to consider addenda as a sort of patches applied to the
       localized document after processing. They are rather different from the usual patches
       (they have only one line of context, which can embed Perl regular expression, and they can
       only add new text without removing any), but the functionalities are the same.

       Their goal is to allow the translator to add extra content to the document which is not
       translated from the original document. The most common usage is to add a section about the
       translation itself, listing contributors and explaining how to report bug against the
       translation.

       An addendum must be provided as a separate file. The first line constitutes a header
       indicating where in the produced document they should be placed. The rest of the addendum
       file will be added verbatim at the determined position of the resulting document.

       The header has a pretty rigid syntax: It must begin with the string PO4A-HEADER:, followed
       by a semi-colon (;) separated list of key=value fields. White spaces ARE important. Note
       that you cannot use the semi-colon char (;) in the value, and that quoting it doesn't
       help.

       Again, it sounds scary, but the examples given below should help you to find how to write
       the header line you need. To illustrate the discussion, assume we want to add a section
       called "About this translation" after the "About this document" one.

       Here are the possible header keys:

       position (mandatory)
           a regexp. The addendum will be placed near the line matching this regexp.  Note that
           we're speaking about the translated document here, not the original. If more than a
           line match this expression (or none), the addition will fail. It is indeed better to
           report an error than inserting the addendum at the wrong location.

           This line is called position point in the following. The point where the addendum is
           added is called insertion point. Those two points are near one from another, but not
           equal. For example, if you want to insert a new section, it is easier to put the
           position point on the title of the preceding section and explain po4a where the
           section ends (remember that position point is given by a regexp which should match a
           unique line).

           The localization of the insertion point with regard to the position point is
           controlled by the mode, beginboundary and endboundary fields, as explained below.

           In our case, we would have:

                position=<title>About this document</title>

       mode (mandatory)
           It can be either the string before or after, specifying the position of the addendum,
           relative to the position point.

           Since we want the new section to be placed below the one we are matching, we have:

                mode=after

       beginboundary (used only when mode=after, and mandatory in that case)
       endboundary (idem)
           regexp matching the end of the section after which the addendum goes.

           When mode=after, the insertion point is after the position point, but not directly
           after! It is placed at the end of the section beginning at the position point, i.e.,
           after or before the line matched by the ???boundary argument, depending on whether you
           used beginboundary or endboundary.

           In our case, we can choose to indicate the end of the section we match by adding:

              endboundary=</section>

           or to indicate the beginning of the next section by indicating:

              beginboundary=<section>

           In both case, our addendum will be placed after the </section> and before the
           <section>. The first one is better since it will work even if the document gets
           reorganized.

           Both forms exist because documentation formats are different. In some of them, there
           is a way to mark the end of a section (just like the </section> we just used), while
           some other don't explicitly mark the end of section (like in man). In the former case,
           you want to make a boundary matching the end of a section, so that the insertion point
           comes after it. In the latter case, you want to make a boundary matching the beginning
           of the next section, so that the insertion point comes just before it.

       This can seem obscure, but hopefully, the next examples will enlighten you.

        To sum up the example we used so far, in order to add a section called "About this
       translation" after the "About this document" one in a SGML document, you can use either of
       those header lines:
          PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
          PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>

        If you want to add something after the following nroff section:
           .SH "AUTHORS"

         you should put a position matching this line, and a beginboundary matching the beginning
         of the next section (i.e., ^\.SH). The addendum will then be added after the position
         point and immediately before the first line matching the beginboundary. That is to say:

          PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH

        If you want to add something into a section (like after "Copyright Big Dude") instead of
       adding a whole section, give a position matching this line, and give a beginboundary
       matching any line.
          PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^

       If you want to add something at the end of the document, give a position matching any line
       of your document (but only one line. Po4a won't proceed if it's not unique), and give an
       endboundary matching nothing. Don't use simple strings here like "EOF", but prefer those
       which have less chance to be in your document.
          PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary

       In any case, remember that these are regexp. For example, if you want to match the end of
       a nroff section ending with the line

         .fi

       don't use .fi as endboundary, because it will match with "the[ fi]le", which is obviously
       not what you expect. The correct endboundary in that case is: ^\.fi$.

       If the addendum doesn't go where you expected, try to pass the -vv argument to the tools,
       so that they explain you what they do while placing the addendum.

       More detailed example

       Original document (POD formatted):

        |=head1 NAME
        |
        |dummy - a dummy program
        |
        |=head1 AUTHOR
        |
        |me

       Then, the following addendum will ensure that a section (in French) about the translator
       is added at the end of the file. (in French, "TRADUCTEUR" means "TRANSLATOR", and "moi"
       means "me")

        |PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
        |
        |=head1 TRADUCTEUR
        |
        |moi

       In order to put your addendum before the AUTHOR, use the following header:

        PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1

       This works because the next line matching the beginboundary /^=head1/ after the section
       "NAME" (translated to "NOM" in French), is the one declaring the authors. So, the addendum
       will be put between both sections.

   HOWTO do all this in one program invocation?
       The use of po4a proved to be a bit error prone for the users since you have to call two
       different programs in the right order (po4a-updatepo and then po4a-translate), each of
       them needing more than 3 arguments. Moreover, it was difficult with this system to use
       only one PO file for all your documents when more than one format was used.

       The po4a(1) program was designed to solve those difficulties. Once your project is
       converted to the system, you write a simple configuration file explaining where your
       translation files are (PO and POT), where the original documents are, their formats and
       where their translations should be placed.

       Then, calling po4a(1) on this file ensures that the PO files are synchronized against the
       original document, and that the translated document are generated properly. Of course, you
       will want to call this program twice: once before editing the PO file to update them and
       once afterward to get a completely updated translated document. But you only need to
       remember one command line.

   HOWTO customize po4a?
       po4a modules have options (specified with the -o option) that can be used to change the
       module behavior.

       It is also possible to customize a module or new / derivative / modified modules by
       putting a module in lib/Locale/Po4a/, and adding lib to the paths specified by the PERLLIB
       or PERL5LIB environment. For example:

          PERLLIB=$PWD/lib po4a --previous po4a/po4a.cfg

       Note: the actual name of the lib directory is not important.

How does it work?

       This chapter gives you a brief overview of the po4a internals, so that you may feel more
       confident to help us maintaining and improving it. It may also help you understanding why
       it does not do what you expected, and how to solve your problems.

   What's the big picture here?
       The po4a architecture is object oriented (in Perl. Isn't that neat?). The common ancestor
       to all parser classes is called TransTractor. This strange name comes from the fact that
       it is at the same time in charge of translating document and extracting strings.

       More formally, it takes a document to translate plus a PO file containing the translations
       to use as input while producing two separate outputs: Another PO file (resulting of the
       extraction of translatable strings from the input document), and a translated document
       (with the same structure than the input one, but with all translatable strings replaced
       with content of the input PO). Here is a graphical representation of this:

          Input document --\                             /---> Output document
                            \      TransTractor::       /       (translated)
                             +-->--   parse()  --------+
                            /                           \
          Input PO --------/                             \---> Output PO
                                                                (extracted)

       This little bone is the core of all the po4a architecture. If you omit the input PO and
       the output document, you get po4a-gettextize. If you provide both input and disregard the
       output PO, you get po4a-translate.

       TransTractor::parse() is a virtual function implemented by each module. Here is a little
       example to show you how it works. It parses a list of paragraphs, each of them beginning
       with <p>.

         1 sub parse {
         2   PARAGRAPH: while (1) {
         3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
         4     $my $first=1;
         5     while (($line,$lref)=$document->shiftline() && defined($line)) {
         6       if ($line =~ m/<p>/ && !$first--; ) {
         7         $document->unshiftline($line,$lref);
         8
         9         $paragraph =~ s/^<p>//s;
        10         $document->pushline("<p>".$document->translate($paragraph,$pararef));
        11
        12         next PARAGRAPH;
        13       } else {
        14         $paragraph .= $line;
        15         $pararef = $lref unless(length($pararef));
        16       }
        17     }
        18     return; # Did not got a defined line? End of input file.
        19   }
        20 }

       On line 6, we encounter <p> for the second time. That's the signal of the next paragraph.
       We should thus put the just obtained line back into the original document (line 7) and
       push the paragraph built so far into the outputs. After removing the leading <p> of it on
       line 9, we push the concatenation of this tag with the translation of the rest of the
       paragraph.

       This translate() function is very cool. It pushes its argument into the output PO file
       (extraction) and returns its translation as found in the input PO file (translation).
       Since it's used as part of the argument of pushline(), this translation lands into the
       output document.

       Isn't that cool? It is possible to build a complete po4a module in less than 20 lines when
       the format is simple enough...

       You can learn more about this in Locale::Po4a::TransTractor(3pm).

   Gettextization: how does it work?
       The idea here is to take the original document and its translation, and to say that the
       Nth extracted string from the translation is the translation of the Nth extracted string
       from the original. In order to work, both files must share exactly the same structure. For
       example, if the files have the following structure, it is very unlikely that the 4th
       string in translation (of type 'chapter') is the translation of the 4th string in original
       (of type 'paragraph').

           Original         Translation

         chapter            chapter
           paragraph          paragraph
           paragraph          paragraph
           paragraph        chapter
         chapter              paragraph
           paragraph          paragraph

       For that, po4a parsers are used on both the original and the translation files to extract
       PO files, and then a third PO file is built from them taking strings from the second as
       translation of strings from the first. In order to check that the strings we put together
       are actually the translations of each other, document parsers in po4a should put
       information about the syntactical type of extracted strings in the document (all existing
       ones do so, yours should also). Then, this information is used to make sure that both
       documents have the same syntax. In the previous example, it would allow us to detect that
       string 4 is a paragraph in one case, and a chapter title in another case and to report the
       problem.

       In theory, it would be possible to detect the problem, and resynchronize the files
       afterward (just like diff does). But what we should do of the few strings before
       desynchronizations is not clear, and it would produce bad results some times. That's why
       the current implementation don't try to resynchronize anything and verbosely fail when
       something goes wrong, requiring manual modification of files to fix the problem.

       Even with these precautions, things can go wrong very easily here. That's why all
       translations guessed this way are marked fuzzy to make sure that the translator reviews
       and checks them.

   Addendum: How does it work?
       Well, that's pretty easy here. The translated document is not written directly to disk,
       but kept in memory until all the addenda are applied. The algorithms involved here are
       rather straightforward. We look for a line matching the position regexp, and insert the
       addendum before it if we're in mode=before. If not, we search for the next line matching
       the boundary and insert the addendum after this line if it's an endboundary or before this
       line if it's a beginboundary.

FAQ

       This chapter groups the Frequently Asked Questions. In fact, most of the questions for now
       could be formulated that way: "Why is it designed this way, and not that one?" If you
       think po4a isn't the right answer to documentation translation, you should consider
       reading this section. If it does not answer your question, please contact us on the
       <po4a-devel@lists.alioth.debian.org> mailing list. We love feedback.

   Why to translate each paragraph separately?
       Yes, in po4a, each paragraph is translated separately (in fact, each module decides this,
       but all existing modules do so, and yours should also).  There are two main advantages to
       this approach:

       • When the technical parts of the document are hidden from the scene, the translator can't
         mess with them. The fewer markers we present to the translator the less error he can do.

       • Cutting the document helps in isolating the changes to the original document. When the
         original is modified, finding what parts of the translation need to be updated is eased
         by this process.

       Even with these advantages, some people don't like the idea of translating each paragraph
       separately. Here are some of the answers I can give to their fear:

       • This approach proved successfully in the KDE project and allows people there to produce
         the biggest corpus of translated and up to date documentation I know.

       • The translators can still use the context to translate, since the strings in the PO file
         are in the same order than in the original document. Translating sequentially is thus
         rather comparable whether you use po4a or not.  And in any case, the best way to get the
         context remains to convert the document to a printable format since the text formatting
         ones are not really readable, IMHO.

       • This approach is the one used by professional translators. I agree, that they have
         somewhat different goals than open-source translators. The maintenance is for example
         often less critical to them since the content changes rarely.

   Why not to split on sentence level (or smaller)?
       Professional translator tools sometimes split the document at the sentence level in order
       to maximize the reusability of previous translations and speed up their process.  The
       problem is that the same sentence may have several translations, depending on the context.

       Paragraphs are by definition longer than sentences. It will hopefully ensure that having
       the same paragraph in two documents will have the same meaning (and translation),
       regardless of the context in each case.

       Splitting on smaller parts than the sentence would be very bad. It would be a bit long to
       explain why here, but interested reader can refer to the Locale::Maketext::TPJ13(3pm) man
       page (which comes with the Perl documentation), for example. To make short, each language
       has its specific syntactic rules, and there is no way to build sentences by aggregating
       parts of sentences working for all existing languages (or even for the 5 of the 10 most
       spoken ones, or even less).

   Why not put the original as comment along with translation (or the other way around)?
       At the first glance, gettext doesn't seem to be adapted to all kind of translations.  For
       example, it didn't seemed adapted to debconf, the interface all Debian packages use for
       their interaction with the user during installation. In that case, the texts to translate
       were pretty short (a dozen lines for each package), and it was difficult to put the
       translation in a specialized file since it has to be available before the package
       installation.

       That's why the debconf developer decided to implement another solution, where translations
       are placed in the same file than the original. This is rather appealing. One would even
       want to do this for XML, for example. It would look like that:

        <section>
         <title lang="en">My title</title>
         <title lang="fr">Mon titre</title>

         <para>
          <text lang="en">My text.</text>
          <text lang="fr">Mon texte.</text>
         </para>
        </section>

       But it was so problematic that a PO-based approach is now used. Only the original can be
       edited in the file, and the translations must take place in PO files extracted from the
       master template (and placed back at package compilation time). The old system was
       deprecated because of several issues:

       •   maintenance problems

           If several translators provide a patch at the same time, it gets hard to merge them
           together.

           How will you detect changes to the original, which need to be applied to the
           translations? In order to use diff, you have to note which version of the original you
           translated. I.e., you need a PO file in your file ;)

       •   encoding problems

           This solution is viable when only European languages are involved, but the
           introduction of Korean, Russian and/or Arab really complicate the picture.  UTF could
           be a solution, but there are still some problems with it.

           Moreover, such problems are hard to detect (i.e., only Korean readers will detect that
           the encoding of Korean is broken [because of the Russian translator])

       gettext solves all those problems together.

   But gettext wasn't designed for that use!
       That's true, but until now nobody came with a better solution. The only known alternative
       is manual translation, with all the maintenance issues.

   What about the other translation tools for documentation using gettext?
       As far as I know, there are only two of them:

       poxml
           This is the tool developed by KDE people to handle DocBook XML. AFAIK, it was the
           first program to extract strings to translate from documentation to PO files, and
           inject them back after translation.

           It can only handle XML, and only a particular DTD. I'm quite unhappy with the handling
           of lists, which end in one big msgid. When the list become big, the chunk becomes
           harder to shallow.

       po-debiandoc
           This program done by Denis Barbier is a sort of precursor of the po4a SGML module,
           which more or less deprecates it. As the name says, it handles only the DebianDoc DTD,
           which is more or less a deprecated DTD.

       The main advantages of po4a over them are the ease of extra content addition (which is
       even worse there) and the ability to achieve gettextization.

   Educating developers about translation
       When you try to translate documentation or programs, you face three kinds of problems;
       linguistics (not everybody speaks two languages), technical (that's why po4a exists) and
       relational/human. Not all developers understand the necessity of translating stuff. Even
       when good willed, they may ignore how to ease the work of translators. To help with that,
       po4a comes with lot of documentation which can be referred to.

       Another important point is that each translated file begins with a short comment
       indicating what the file is, how to use it. This should help the poor developers flooded
       with tons of files in different languages they hardly speak, and help them dealing
       correctly with it.

       In the po4a project, translated documents are not source files anymore. Since SGML files
       are habitually source files, it's an easy mistake. That's why all files present this
       header:

        |       *****************************************************
        |       *           GENERATED FILE, DO NOT EDIT             *
        |       * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
        |       *****************************************************
        |
        | This file was generated by po4a-translate(1). Do not store it (in VCS,
        | for example), but store the PO file used as source file by po4a-translate.
        |
        | In fact, consider this as a binary, and the PO file as a regular source file:
        | If the PO gets lost, keeping this translation up-to-date will be harder ;)

       Likewise, gettext's regular PO files only need to be copied to the po/ directory. But this
       is not the case of the ones manipulated by po4a. The major risk here is that a developer
       erases the existing translation of his program with the translation of his documentation.
       (Both of them can't be stored in the same PO file, because the program needs to install
       its translation as an mo file while the documentation only uses its translation at compile
       time). That's why the PO files produced by the po-debiandoc module contain the following
       header:

        #
        #  ADVISES TO DEVELOPERS:
        #    - you do not need to manually edit POT or PO files.
        #    - this file contains the translation of your debconf templates.
        #      Do not replace the translation of your program with this !!
        #        (or your translators will get very upset)
        #
        #  ADVISES TO TRANSLATORS:
        #    If you are not familiar with the PO format, gettext documentation
        #     is worth reading, especially sections dedicated to this format.
        #    For example, run:
        #         info -n '(gettext)PO Files'
        #         info -n '(gettext)Header Entry'
        #
        #    Some information specific to po-debconf are available at
        #            /usr/share/doc/po-debconf/README-trans
        #         or http://www.debian.org/intl/l10n/po-debconf/README-trans
        #

   SUMMARY of the advantages of the gettext based approach
       • The translations are not stored along with the original, which makes it possible to
         detect if translations become out of date.

       • The translations are stored in separate files from each other, which prevents
         translators of different languages from interfering, both when submitting their patch
         and at the file encoding level.

       • It is based internally on gettext (but po4a offers a very simple interface so that you
         don't need to understand the internals to use it).  That way, we don't have to re-
         implement the wheel, and because of their wide use, we can think that these tools are
         more or less bug free.

       • Nothing changed for the end-user (beside the fact translations will hopefully be better
         maintained). The resulting documentation file distributed is exactly the same.

       • No need for translators to learn a new file syntax and their favorite PO file editor
         (like Emacs' PO mode, Lokalize or Gtranslator) will work just fine.

       • gettext offers a simple way to get statistics about what is done, what should be
         reviewed and updated, and what is still to do. Some example can be found at those
         addresses:

          - http://kv-53.narod.ru/kaider1.png
          - http://www.debian.org/intl/l10n/

       But everything isn't green, and this approach also has some disadvantages we have to deal
       with.

       • Addenda are... strange at the first glance.

       • You can't adapt the translated text to your preferences, like splitting a paragraph
         here, and joining two other ones there. But in some sense, if there is an issue with the
         original, it should be reported as a bug anyway.

       • Even with an easy interface, it remains a new tool people have to learn.

         One of my dreams would be to integrate somehow po4a to Gtranslator or Lokalize. When an
         SGML file is opened, the strings are automatically extracted.  When it's saved a
         translated SGML file can be written to disk. If we manage to do an MS Word (TM) module
         (or at least RTF) professional translators may even use it.

AUTHORS

        Denis Barbier <barbier,linuxfr.org>
        Martin Quinson (mquinson#debian.org)