Provided by: swath_0.6.0-2_amd64 

NAME
swath - General-purpose Thai word segmentation utility
SYNOPSIS
swath [options] < infile > outfile
DESCRIPTION
Thai script has no word delimitor. Applications need to recognize word boundaries before they can do
useful things with Thai text, such as line wrapping.
Swath provides word analysis filter to insert word delimitors into a given text stream. It reads text
from standard input, analyzes it for word boundaries by consulting a Thai word list, and outputs to
standard output the same text with the predefined word delimitors inserted.
Currently, it can read plain text, HTML, RTF, LaTeX and Lambda (Unicode version of LaTeX with Omega
typesetter kernel) documents and insert common word delimitors for each format (pipe `|' for plain text).
But user can always override this with a preferred delimitor.
OPTIONS
-b [delimitor]
Define the string to be inserted as word delimitor in the output text.
-d [dict-path]
Specify alternative dictionary location. dict-path must be either a directory containing the
swath dictionary file `swathdic.tri', or a path to the dictionary file itself. The dictionary
file must be a trie file prepared using trietool(1) utility from the libdatrie package.
If this option is given, swath will override normal dictionary search and will exit if the given
dictionary cannot be found. Otherwise, if SWATHDICT environment is set, it will try to open
dictionary from the location specified by its value. Otherwise, it will try the current working
directory, and finally the usual installed location.
-f [format]
Specify format of the input. Possible formats are: html, rtf, latex, lambda.
-m [scheme]
Choose word matching scheme when analyzing word boundaries. Possible schemes are `long' (for
longest or greedy matching) and `max' (for maximal matching, with least words preferred). Maximal
matching is the default value.
-u input-enc,output-enc
Specify encodings of the input and the output. input-enc and output-enc can be one of 'u' (for
UTF-8 encoding) and 't' (for TIS-620 encoding). Swath will convert the character encoding as
necessary. If this option is omitted, TIS-620 encodings on both input and output are assumed.
-v, --verbose
Turn on verbose mode.
-help, --help
Show help.
ENVIRONMENT VARIABLES
SWATHDICT
If specified, swath will search for dictionary from this location before the usual places (current
working directory and usual installed directory, respectively). This value is overridden by -d
option.
EXAMPLES
For LaTeX (to be used with babel-thai package):
$ swath -f latex < thaifile.tex > thaifile.ttex
$ latex thaifile.ttex
For HTML (to provide web pages to web browsers that cannot wrap Thai lines properly, but support the
<wbr> tag):
$ swath -f html < myweb.html > myweb-wbr.html
To preprocess a Thai UTF-8 encoded LaTeX file for babel-thai with tis620 inputenc:
$ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
$ latex thaifile.ttex
This is equivalent to filtering with iconv(1):
$ iconv -f UTF-8 -t TIS-620 thaifile.tex | swath -f latex > thaifile.ttex
$ latex thaifile.ttex
To use longest matching scheme with LaTeX document:
$ swath -f latex -m long < thaifile.tex > thaifile.ttex
$ latex thaifile.ttex
To use an alternative dictionary from libthai:
$ swath -f latex -d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex
AUTHOR
This manual page was written by Theppitak Karoonboonyanan <theppitak@gmail.com>.
January 2008 SWATH(1)