xenial (1) swath.1.gz

Provided by: swath_0.5.3-1_amd64 bug

NAME

       swath - General-purpose Thai word segmentation utility

SYNOPSIS

       swath [options] < infile > outfile

DESCRIPTION

       Thai  script  has  no  word delimitor.  Applications need to recognize word boundaries before they can do
       useful things with Thai text, such as line wrapping.

       Swath provides word analysis filter to insert word delimitors into a given text stream.   It  reads  text
       from  standard  input,  analyzes  it  for  word  boundaries by consulting a Thai word list, and output to
       standard output the same text with the predefined word delimitors inserted.

       Currently, it can read plain text, HTML, RTF, LaTeX and Lambda  (Unicode  version  of  LaTeX  with  Omega
       typesetter kernel) documents and insert common word delimitors for each format (pipe `|' for plain text).
       But user can always override this with a preferred delimitor.

OPTIONS

       -b [delimitor]
              Define a string to be used as word delimitor code in the output text.

       -d [dict-path]
              Specify alternative dictionary location.  dict-path must be  either  a  directory  containing  the
              swath  dictionary  file  `swathdic.tri',  or a path to the dictionary file itself.  The dictionary
              file must be a trie file prepared using trietool-0.2(1) utility from libdatrie package.

              If this option is given, swath will override normal dictionary search and will exit on failure  to
              find  the  given  dictionary.   Otherwise,  if  SWATHDICT  environment is set, it will try to open
              dictionary from the location specified by its value.  Otherwise, it will try the  current  working
              directory, and finally the usual installed location.

       -f [format]
              Specify format of the input.  Possible formats are: html, rtf, latex, lambda.

       -m [scheme]
              Choose  word  matching  scheme  when  analyzing word boundaries.  Possible schemes are `long' (for
              longest or greedy matching) and `max' (for maximal matching, with least words preferred).  Maximal
              matching is the default value.

       -u input-enc,output-enc
              Specify  encodings  of  input  and  output.  input-enc and output-enc can be one of 'u' (for UTF-8
              encoding) and 't' (for TIS-620 encoding).  Swath will convert the character encoding as necessary.
              If omitted, TIS-620 encodings on both input and output are assumed.

       -v, --verbose
              Turn on verbose mode.

       -help, --help
              Show help.

ENVIRONMENT VARIABLES

       SWATHDICT
              If  specified,  swath will search for dictionary in this location before the usual places (current
              working directory and usual installed directory, respectively).  This value is  overridden  by  -d
              option.

EXAMPLES

       For LaTeX (to be used with babel-thai package):

       $ swath -f latex < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       For  HTML  (to  provide  web  pages to web browsers that cannot wrap Thai lines properly, but support the
       <wbr> tag):

       $ swath -f html < myweb.html > myweb-wbr.html

       To preprocess a Thai UTF-8 encoded LaTeX file for babel-thai with tis620 inputenc:

       $ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       This is equivalent to filtering with iconv(1):

       $ iconv -f UTF-8 -t TIS-620 thaifile.tex | swath -f latex > thaifile.ttex
       $ latex thaifile.ttex

       To use longest matching scheme with LaTeX document:

       $ swath -f latex -m long < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       To use an alternative dictionary from libthai:

       $ swath -f latex -d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex

AUTHOR

       This manual page was written by Theppitak Karoonboonyanan <theppitak@gmail.com>.

                                                  January 2008                                          SWATH(1)