Ubuntu Manpage: dictfmt - formats a DICT protocol dictionary database

Provided by: dictfmt_1.12.1+dfsg-2_amd64

NAME

       dictfmt - formats a DICT protocol dictionary database

SYNOPSIS

       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
       dictfmt  -i|-I [options]

DESCRIPTION

dictfmt takes a file, FILE, on stdin, and creates a dictionary database named
basename.dict, that conforms to the DICT protocol. It also creates an index file named
basename.index. By default, the index is sorted according to the C locale, and only
alphanumeric characters and spaces are used in sorting, however this may be changed with
the --locale and --allchars options. ( basename is commonly chosen to correspond to the
basename of FILE , but this is not mandatory.)

Unless the database is extremely small, it is highly recommended that basename.dict be
compressed with /usr/bin/dictzip to create basename.dict.dz. (dictzip is included in the
dictd source package.)

FILE may be in any of the several formats described by the format options -c5, -t, -e, -f,
-h, -j, -p, -i or -I. Exactly one of these options must be given.

dictfmt prepends several headers are to the .dict file. The 00-database-url header gives
the value of the -u option as the URL of the site from which the original database was
obtained. The 00-database-short header gives the value of the -s option as the short name
of the dictionary. (This "short name" is the identifying name given by the "dict- D"
option.) If the -u and/or -s options are omitted, these values will be shown as
"unknown", which is undesirable for a publicly distributed database.

The date of conversion (formatting) is given in the 00-database-info header. All text in
the input file prior to the first headword (as defined by the appropriate formatting
option) is appended to this header. All text in the input file following a headword, up
to the next headword, is copied unchanged to the .dict file.

FORMATTING OPTIONS

-c5 FILE is formatted with headwords preceded by 5 or more underscore characters (_)
and a blank line. All text until the next headword is considered the definition.
Any leading `@' characters are stripped out, but the file is otherwise unchanged.
This option was written to format the CIA WORLD FACTBOOK 1995.

-t -c5, --without-info and --without-headword options are implied. Use this option,
if an input database comes from dictunformat utility.

-e FILE is in html format, with the headword tagged as bold. (<B>headword - </B>)
This option was written to format EASTON'S 1897 BIBLE DICTIONARY. A typical entry
from Easton is:

<A NAME="T0000005">
<B>Abagtha - </B>
one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

This is converted to:
Abagtha
one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

The heading "<A NAME="T0000005"> is omitted, and the headword `Abagtha' is indexed.

NOTE: This option should be used with caution. It removes several html tags
(enough to format Easton properly), but not all. The Makefile that was originally
written to format dict-easton uses sed scripts to modify certain cross reference
tags. It may be necessary to pipe the input file through a sed script, or hack the
source of dictfmt in order to properly format other html databases.

-f FILE is formatted with the headwords starting in column 0, with the definition
indented at least one space (or tab character) on subsequent lines. The third line
starting in column 0 is taken as the first headword , and the first two lines
starting in column 0 are treated as part of the 00-database-info header. This
option was written to format the F.O.L.D.O.C.

-h FILE is formatted with the headwords starting in column 0, followed by a comma,
with the definition continuing on the same line. All text before the first single
character line is included in 00-database-info header, and lines with only one
character are omitted from the .dict file. The first headword is on the line
following the first single character line. The headword is indexed; the text of
the file is not changed. This option was written to format HITCHCOCK'S BIBLE NAMES
DICTIONARY.

-j FILE is formatted with headwords starting in col 0, enclosed in colons, followed by
the definition. The colons surrounding the headword are removed, and the headword
is indexed. Lines beginning with '*', '=', or '-' are also removed. All text
before the first headword is included in the headers. This option was written to
format the JARGON FILE.
NOTE: Some recent versions of the JARGON FILE had three blanks inserted before the
first colon at each headword. These must be removed before processing with
dictfmt. (sed scripts have been used for this purpose. ed, awk, or perl scripts
are also possible.)

-p FILE is formatted with `%h' in column 0, followed by a blank, followed by the
headword, optionally followed by a line containing `%d' in column 0. The
definition starts on the following line. The first line beginning ´%h´ and any
lines beginning '%d' are stripped from the .dict file, and '%h ' is stripped from
in front of the headword. All text before the first headword is included in the
headers. The second line beginning '%h' is taken as the first headword.
This option was written to format Jay Kominek's elements database.

-i -I These two options are different from all other formatting options. They are
intended to resort (according to dictd requirement) an .index file given on stdin.
That is .dict file is not generated at all. Only resorting is made. Three- or
four-column .index like input is expected. -i expects decimal offset and length,
while -I expects them in base64 format.

OPTIONS

-u url Specifies the URL of the site from which the raw database was obtained. If this
option is specified, 00-database-url headword and appropriate definition will be
ignored.

-s name
Specifies the name and, optionally, the version and date, of the database. (If
this contains spaces, it must be quoted.) If this option is specified,
00-database-short headword and appropriate definition will be ignored.

-L display license and copyright information

-V display version information

-D output debugging information

--help display a help message

--locale locale
Specifies the locale used for sorting. If no locale is specified, the "C" locale
is used. For using UTF-8 mode, --utf8 is needed.

--8bit generates database in 8-bit mode, see --locale option also.
Note: This option is deprecated. Use it for creating 8-bit (non-UTF8) dictionaries
only. In order to create UTF-8 dictionary, use --utf8 option instead.

--utf8 If specified, UTF-8 database is created.

--allchars
Specifies that all characters should be used for the search, by default only
alphabetic, numeric characters and spaces are put to .index file and therefore are
used in search. Creates the special entry 00-database-allchars.

--case-sensitive
makes the search case sensitive. Creates the special entry 00-database-case-
sensitive.

--headword-separator sep
sets the headword separator, which allows several words to have the same
definition. For example, if ´--headword-separator %%%' is given, and the input
file contains ´autumn%%%fall', both 'autumn' and 'fall' will be indexed as
headwords, with the same definition.

--index-data-separator sep
sets the index/data separator, which allows one to set the first and fourth columns
of .index file independently. That is the first column can be treated as an index
column (where the MATCH command searches) and the fourth column as a result column
(where the MATCH gets things to be returned), and they (1-st and 4-th columns) are
completely independent of each other. The default value for this separator is
ASCII symbol " \034".

--break-headwords
multiple headwords will be written on separate lines in the .dict file. For use
with '--headword-separator.

--index-keep-orig
When --utf-8 is specified headwords are lowercased and non-alphanumeric characters
are removed from it before saving to .index file in order to simplify the search.
When --index-keep-orig option is used fourth column is created (if necessary) in
.index file, and contains an original headword which is returned by MATCH command.
This option may be useful to prevent converting " AT&T" to " ATT" or to keep proper
nouns with uppercased first letter.

--without-headword
headwords will not be included in .dict file

--without-header
header will not be copied to DB info entry

--without-url
URL will not be copied to DB info entry

--without-time
time of creation will not be copied to DB info entry

--without-ver
By default dictfmt creates a special entry 00-database-dictfmt-X.Y.Z that contains
(in .dict file) dictfmt version in format dictfmt-X.Y.Z. This option suppresses
this.

--without-info
DB info entry will not be created. This may be useful if 00-database-info headword
is expected from stdin (dictunformat outputs it).

--columns columns
By default dictfmt wraps strings read from stdin to 72 columns. This option
changes this default. If it is set to zero or negative value, wrapping is off.

--default-strategy strategy
Sets the default search strategy for the database. It will be used instead of
strategy '.'. Special entry 00-database-default-strategy is created for this
purpose. This option may be useful, for example, for dictionaries containing
mainly phrases but the single words. In any case, use this option if you are
absolutely sure what you are doing.

--mime-header mime_header
When client sends OPTION MIME command to the dictd , definitions found in this
database are prepended by the specified MIME header. Creates the special entry
00-database-mime-header.

CREDITS

       dictfmt  was  written  by  Rik  Faith (faith@cs.unc.edu) as part of the dict-misc package.
       dictfmt is distributed under the terms of the GNU General Public License.  If you need  to
       distribute under other terms, write to the author.

AUTHOR

       This manual page was written by Robert D. Hilliard <hilliard@debian.org> .