Provided by: swath_0.5.2-1_amd64 bug

NAME

       swath - General-purpose Thai word segmentation utility

SYNOPSIS

       swath [options] < infile > outfile

DESCRIPTION

       Thai  script  has no word delimitor. Applications need some knowledge about Thai word list
       to recognize word boundaries before they can do useful things about  Thai  text,  such  as
       line wrapping.

       Swath  provides  word analysis filter to insert word delimitors in a text stream. It reads
       text from standard input, analyze it for word boundaries by consulting a Thai  word  list,
       and output to standard output the same text with the predefined word delimitors inserted.

       Currently,  it  can read plain text, HTML, RTF, LaTeX and Lambda (Unicode version of LaTeX
       with Omega typesetter kernel) documents and insert commonly used word delimitors for  each
       format  (pipe  `|' for plain text). But the user can always override this with a preferred
       delimitor.

OPTIONS

       -b [delimitor]
              Define a string to be used as word delimitor code in the output text.

       -d [dict-path]
              Specify alternative dictionary location.  dict-path  must  either  be  a  directory
              containing the swath dictionary file `swathdic.tri', or be a path to the dictionary
              file itself. The dictionary file must be a trie file prepared using trietool-0.2(1)
              utility from libdatrie package.

              If this option is given, swath will override normal dictionary search and will exit
              on failure. Otherwise, it will try to open dictionary from the  location  specified
              in  SWATHDICT environment if set, then in current working directory, and finally in
              the usual installed location.

       -f [format]
              Specify format of the input. Possible formats are: html, rtf, latex, lambda.

       -m [scheme]
              Choose word matching scheme when analyzing word boundaries.  Possible  schemes  are
              `long' (for longest or greedy matching) and `max' (for maximal matching, with least
              words preferred). Maximal matching is the default value.

       -u input-enc,output-enc
              Specify encodings of input and output. input-enc and output-enc can be one  of  'u'
              (for  UTF-8  encoding)  and  't'  (for  TIS-620  encoding).  Swath will convert the
              character encoding as necessary. If omitted, TIS-620 encodings on  both  input  and
              output are assumed.

       -v, --verbose
              Turn on verbose mode.

       -help, --help
              Show help.

ENVIRONMENT VARIABLES

       SWATHDICT
              If  specified,  swath  will search for dictionary in this location before the usual
              places (current working directory and  usual  installed  directory,  respectively).
              This value is overridden by -d option.

EXAMPLES

       For LaTeX (to be used with thailatex package):

       $ swath -f latex < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       For  HTML  (to provide web pages to web browsers that cannot wrap Thai lines properly, but
       support the <wbr> tag):

       $ swath -f html < myweb.html > myweb-wbr.html

       To preprocess a Thai UTF-8 encoded LaTeX file  for  thailatex,  which  always  works  with
       TIS-620:

       $ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       This is equivalent to filtering with iconv(1):

       $ iconv -f UTF-8 -t TIS-620 thaifile.tex | swath -f latex > thaifile.ttex
       $ latex thaifile.ttex

       To use longest matching scheme with LaTeX document:

       $ swath -f latex -m long < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       To use an alternative dictionary from libthai:

       $ swath -f latex -d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex

AUTHOR

       This manual page was written by Theppitak Karoonboonyanan <thep@linux.thai.net>.

                                           January 2008                                  SWATH(1)