Provided by: texlive-binaries_2009-11ubuntu2_amd64 bug

NAME

       patgen - generate patterns for TeX hyphenation

SYNOPSIS

       patgen dictionary_file pattern_file patout_file translate_file

DESCRIPTION

       This manual page is not meant to be exhaustive.  See also the Info file or manual Web2C: A
       TeX implementation.

       The patgen program reads the dictionary_file containing a list of hyphenated words and the
       pattern_file  containing  previously-generated patterns (if any) for a particular language
       (not  a  complete  TeX  source  file;  see  below),  and  produces  the  patout_file  with
       (previously-   plus   newly-generated)   hyphenation   patterns  for  that  language.  The
       translate_file defines language specific values for  the  parameters  left_hyphen_min  and
       right_hyphen_min  used  by  TeX's hyphenation algorithm and the external representation of
       the lower and upper case version(s) of all `letters' of that language. Further details  of
       the  pattern  generation  process  such  as  hyphenation  levels  and  pattern lengths are
       requested interactively  from  the  user's  terminal.  Optionally  patgen  creates  a  new
       dictionary file pattmp.n showing the good and bad hyphens found by the generated patterns,
       where n is the highest hyphenation level.

       The patterns generated by patgen can be read by initex for use in hyphenating words. For a
       real-life  example of patgen's output, see $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which
       contains the patterns TeX uses for English by default.  At some sites, patterns for (many)
       other languages may be available, and the local tex programs may have them preloaded.

       All filenames must be complete; no adding of default extensions or path searching is done.

FILE FORMATS

       Letters
           When initex digests hyphenation patterns, TeX first expands macros and the result must
           entirely consist of digits (hyphenation levels), dots  (`.',  edge  of  a  word),  and
           letters.  In  pattern files for non-English languages letters are often represented by
           macros or other expandable constructs.  For the  purpose  of  patgen  these  are  just
           character  sequences,  subject  to  the condition that no such sequence is a prefix of
           another one.

       Dictionary file
           A dictionary file contains a weighted list of hyphenated  words,  one  word  per  line
           starting  in  column  1. A digit in column 1 indicates a global word weight (initially
           =1) applicable to all following words up to the next global word weight.  A  digit  at
           some intercharacter position indicates a weight for that position only.

           The  hyphens  in  a  word  are indicated by `-', `*', or `.' (or their replacements as
           defined in the translate file) for hyphens yet to be found, `good' hyphens  (correctly
           found  by  the  patterns),  and  `bad'  hyphens  (erroneously  found  by the patterns)
           respectively; when reading a dictionary file `*'  is  treated  like  `-'  and  `.'  is
           ignored.

       Pattern file
           A  pattern  file contains only patterns in the format above, e.g., from a previous run
           of patgen.  It may not contain any TeX comments or control sequences.   For  instance,
           this is not a valid pattern file:

           % this is a pattern file read by TeX.
           \patterns{%
            ...
           }
           It can only contain the actual patterns, i.e., the `...'.

       Translate file
           A  translate  file  starts  with  a  line  containing the values of left_hyphen_min in
           columns 1-2, right_hyphen_min in columns 3-4, and either a blank  or  the  replacement
           for  one  of  the "hyphen" characters `-', `*', and `.' in columns 5, 6, and 7. (Input
           lines are padded with blanks as for many TeX related programs.)

           Each following line defines one `letter': an arbitrary delimiter character  in  column
           1,  followed  by  one  or  more  external representations of that character (first the
           `lower' case one used for output), each one terminated by the delimiter and the  whole
           sequence terminated by another delimiter.

           If  the translate file is empty, the values left_hyphen_min=2, right_hyphen_min=3, and
           the 26 lower case letters a...z  with  their  upper  case  representations  A...Z  are
           assumed.

       Terminal input
           After   reading   the   translate_file  and  any  previously-generated  patterns  from
           pattern_file, patgen requests input from the user's terminal.

           First the integer values  of  hyph_start  and  hyph_finish,  the  lowest  and  highest
           hyphenation  level  for  which  patterns  are to be generated. The value of hyph_start
           should be larger than any hyphenation level already present in pattern_file.

           Then, for each hyphenation level, the integer values of pat_start and pat_finish,  the
           smallest  and  largest  pattern  length  to  be  analyzed, as well as good weight, bad
           weight, and threshold, the weights for good and bad hyphens and a weight threshold for
           useful patterns.

           Finally  the  decision  (`y'  or  `Y'  vs.  anything else) whether or not to produce a
           hyphenated word list.

FILES

       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
           The original hyphenation patterns for English, by Donald Knuth and Frank Liang.

       $TEXMFMAIN/tex/generic/hyphen/ushyphmax.tex
           Maximal hyphenation patterns for English, extended by Gerard Kuiken.

       http://www.ctan.org/tex-archive/language/
           Patterns and support for many other languages

SEE ALSO

       Frank Liang and Peter Breitenlohner, patgen.web.

       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford  University  Ph.D.
       thesis, 1983, http://tug.org/docs/liang.

       Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.

AUTHORS

       Frank  Liang  wrote  the  first  version  of  this  program.   Peter  Breitenlohner made a
       substantial revision in 1991 for TeX 3.  The first version was published as  the  appendix
       to the TeXware technical report. Howard Trickey originally ported it to Unix.