trusty (1) swath.1.gz

Provided by: swath_0.5.2-1_amd64 bug

NAME

       swath - General-purpose Thai word segmentation utility

SYNOPSIS

       swath [options] < infile > outfile

DESCRIPTION

       Thai  script  has  no  word delimitor. Applications need some knowledge about Thai word list to recognize
       word boundaries before they can do useful things about Thai text, such as line wrapping.

       Swath provides word analysis filter to insert word delimitors in  a  text  stream.  It  reads  text  from
       standard  input,  analyze  it  for word boundaries by consulting a Thai word list, and output to standard
       output the same text with the predefined word delimitors inserted.

       Currently, it can read plain text, HTML, RTF, LaTeX and Lambda  (Unicode  version  of  LaTeX  with  Omega
       typesetter kernel) documents and insert commonly used word delimitors for each format (pipe `|' for plain
       text). But the user can always override this with a preferred delimitor.

OPTIONS

       -b [delimitor]
              Define a string to be used as word delimitor code in the output text.

       -d [dict-path]
              Specify alternative dictionary location. dict-path must either be a directory containing the swath
              dictionary  file  `swathdic.tri',  or be a path to the dictionary file itself. The dictionary file
              must be a trie file prepared using trietool-0.2(1) utility from libdatrie package.

              If this option is given, swath will override normal dictionary search and will  exit  on  failure.
              Otherwise,  it will try to open dictionary from the location specified in SWATHDICT environment if
              set, then in current working directory, and finally in the usual installed location.

       -f [format]
              Specify format of the input. Possible formats are: html, rtf, latex, lambda.

       -m [scheme]
              Choose word matching scheme when analyzing word  boundaries.  Possible  schemes  are  `long'  (for
              longest  or greedy matching) and `max' (for maximal matching, with least words preferred). Maximal
              matching is the default value.

       -u input-enc,output-enc
              Specify encodings of input and output. input-enc and output-enc can  be  one  of  'u'  (for  UTF-8
              encoding) and 't' (for TIS-620 encoding).  Swath will convert the character encoding as necessary.
              If omitted, TIS-620 encodings on both input and output are assumed.

       -v, --verbose
              Turn on verbose mode.

       -help, --help
              Show help.

ENVIRONMENT VARIABLES

       SWATHDICT
              If specified, swath will search for dictionary in this location before the usual  places  (current
              working  directory  and  usual  installed directory, respectively). This value is overridden by -d
              option.

EXAMPLES

       For LaTeX (to be used with thailatex package):

       $ swath -f latex < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       For HTML (to provide web pages to web browsers that cannot wrap Thai  lines  properly,  but  support  the
       <wbr> tag):

       $ swath -f html < myweb.html > myweb-wbr.html

       To preprocess a Thai UTF-8 encoded LaTeX file for thailatex, which always works with TIS-620:

       $ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       This is equivalent to filtering with iconv(1):

       $ iconv -f UTF-8 -t TIS-620 thaifile.tex | swath -f latex > thaifile.ttex
       $ latex thaifile.ttex

       To use longest matching scheme with LaTeX document:

       $ swath -f latex -m long < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       To use an alternative dictionary from libthai:

       $ swath -f latex -d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex

AUTHOR

       This manual page was written by Theppitak Karoonboonyanan <thep@linux.thai.net>.

                                                  January 2008                                          SWATH(1)