lunar (1) swath.1.gz

Provided by: swath_0.6.1-2build1_amd64 bug

NAME

       swath - General-purpose Thai word segmentation utility

SYNOPSIS

       swath [options] < infile > outfile

DESCRIPTION

       Thai  script has no word delimitor.  Applications need to recognize word boundaries before
       they can do useful things with Thai text, such as line wrapping.

       Swath provides word analysis filter to insert word delimitors into a  given  text  stream.
       It  reads  text  from standard input, analyzes it for word boundaries by consulting a Thai
       word list, and outputs  to  standard  output  the  same  text  with  the  predefined  word
       delimitors inserted.

       Currently,  it  can read plain text, HTML, RTF, LaTeX and Lambda (Unicode version of LaTeX
       with Omega typesetter kernel) documents and insert common word delimitors for each  format
       (pipe `|' for plain text). But user can always override this with a preferred delimitor.

OPTIONS

       -b [delimitor]
              Define the string to be inserted as word delimitor in the output text.

       -d [dict-path]
              Specify  alternative  dictionary  location.   dict-path  must be either a directory
              containing the swath dictionary file `swathdic.tri', or a path  to  the  dictionary
              file  itself.   The  dictionary file must be a trie file prepared using trietool(1)
              utility from the libdatrie package.

              If this option is given, swath will override normal dictionary search and will exit
              if  the  given  dictionary cannot be found.  Otherwise, if SWATHDICT environment is
              set, it will try to open dictionary from  the  location  specified  by  its  value.
              Otherwise,  it  will  try  the  current  working  directory,  and finally the usual
              installed location.

       -f [format]
              Specify format of the input.  Possible formats are: html, rtf, latex, lambda.

       -m [scheme]
              Choose word matching scheme when analyzing word boundaries.  Possible  schemes  are
              `long' (for longest or greedy matching) and `max' (for maximal matching, with least
              words preferred).  Maximal matching is the default value.

       -u input-enc,output-enc
              Specify encodings of the input and the output.  input-enc and output-enc can be one
              of 'u' (for UTF-8 encoding) and 't' (for TIS-620 encoding).  Swath will convert the
              character encoding as necessary.  If this option is omitted, TIS-620  encodings  on
              both input and output are assumed.

       -v, --verbose
              Turn on verbose mode.

       -help, --help
              Show help.

ENVIRONMENT VARIABLES

       SWATHDICT
              If  specified, swath will search for dictionary from this location before the usual
              places (current working directory and  usual  installed  directory,  respectively).
              This value is overridden by -d option.

EXAMPLES

       For LaTeX (to be used with babel-thai package):

       $ swath -f latex < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       For  HTML  (to provide web pages to web browsers that cannot wrap Thai lines properly, but
       support the <wbr> tag):

       $ swath -f html < myweb.html > myweb-wbr.html

       To preprocess a Thai UTF-8 encoded LaTeX file for babel-thai with tis620 inputenc:

       $ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       This is equivalent to filtering with iconv(1):

       $ iconv -f UTF-8 -t TIS-620 thaifile.tex | swath -f latex > thaifile.ttex
       $ latex thaifile.ttex

       To use longest matching scheme with LaTeX document:

       $ swath -f latex -m long < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       To use an alternative dictionary from libthai:

       $ swath -f latex -d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex

AUTHOR

       This manual page was written by Theppitak Karoonboonyanan <theppitak@gmail.com>.

                                           January 2008                                  SWATH(1)