Provided by: dictfmt_1.12.1+dfsg-8_amd64 bug

NAME

       dictfmt - formats a DICT protocol dictionary database

SYNOPSIS

       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
       dictfmt  -i|-I [options]

DESCRIPTION

       dictfmt  takes  a  file,  FILE,  on  stdin,  and  creates a dictionary database named basename.dict, that
       conforms to the DICT protocol.  It also creates an index file  named  basename.index.   By  default,  the
       index  is  sorted  according  to  the  C  locale, and only alphanumeric characters and spaces are used in
       sorting, however this may be changed with the --locale and --allchars options.  (  basename  is  commonly
       chosen to correspond to the basename of FILE , but this is not mandatory.)

       Unless  the  database  is extremely small, it is highly recommended that basename.dict be compressed with
       /usr/bin/dictzip to create basename.dict.dz.  (dictzip is included in the dictd source package.)

       FILE may be in any of the several formats described by the format options -c5, -t, -e, -f, -h, -j, -p, -i
       or -I.  Exactly one of these options must be given.

       dictfmt  prepends  several  headers are to the .dict file.  The 00-database-url header gives the value of
       the -u option as the URL of the site from which the original database  was  obtained.   The  00-database-
       short header gives the value of the -s option as the short name of the dictionary.  (This "short name" is
       the identifying name given by the "dict- D" option.)  If the -u and/or  -s  options  are  omitted,  these
       values will be shown as "unknown", which is undesirable for a publicly distributed database.

       The  date of conversion (formatting) is given in the 00-database-info header.  All text in the input file
       prior to the first headword (as defined by the appropriate formatting option) is appended to this header.
       All  text  in  the  input  file following a headword, up to the next headword, is copied unchanged to the
       .dict file.

FORMATTING OPTIONS

       -c5    FILE is formatted with headwords preceded by 5 or more underscore characters (_) and a blank line.
              All  text  until  the  next headword is considered the definition.  Any leading `@' characters are
              stripped out, but the file is otherwise unchanged. This option was written to format the CIA WORLD
              FACTBOOK 1995.

       -t     -c5,  --without-info  and  --without-headword  options  are implied.  Use this option, if an input
              database comes from dictunformat utility.

       -e     FILE is in html format, with the headword tagged as bold.  (<B>headword - </B>)
              This option was written to format EASTON'S 1897 BIBLE DICTIONARY.  A typical entry from Easton is:

              <A NAME="T0000005">
              <B>Abagtha - </B>
              one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

              This is converted to:
              Abagtha
                 one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

              The heading "<A NAME="T0000005"> is omitted, and the headword `Abagtha' is indexed.

              NOTE: This option should be used with caution.  It removes several html  tags  (enough  to  format
              Easton  properly),  but  not  all.  The Makefile that was originally written to format dict-easton
              uses sed scripts to modify certain cross reference tags.  It may be necessary to  pipe  the  input
              file  through  a  sed script, or hack the source of dictfmt in order to properly format other html
              databases.

       -f     FILE is formatted with the headwords starting in column 0, with the definition indented  at  least
              one space (or tab character) on subsequent lines.  The third line starting in column 0 is taken as
              the first headword , and the first two lines starting in column 0  are  treated  as  part  of  the
              00-database-info header.  This option was written to format the F.O.L.D.O.C.

       -h     FILE  is  formatted  with  the  headwords  starting  in  column  0,  followed by a comma, with the
              definition continuing on the same line.  All text  before  the  first  single  character  line  is
              included  in 00-database-info header, and lines with only one character are omitted from the .dict
              file.  The first headword is on the line following the first single character line.  The  headword
              is  indexed;  the  text of the file is not changed.  This option was written to format HITCHCOCK'S
              BIBLE NAMES DICTIONARY.

       -j     FILE is formatted with  headwords  starting  in  col  0,  enclosed  in  colons,  followed  by  the
              definition.   The colons surrounding the headword are removed, and the headword is indexed.  Lines
              beginning with '*', '=', or '-' are also removed.  All text before the first headword is  included
              in the headers.  This option was written to format the JARGON FILE.
              NOTE:  Some recent versions of the JARGON FILE had three blanks inserted before the first colon at
              each headword.  These must be removed before processing with dictfmt.  (sed scripts have been used
              for this purpose. ed, awk, or perl scripts are also possible.)

       -p     FILE is formatted with `%h' in column 0, followed by a blank, followed by the headword, optionally
              followed by a line containing `%d' in column 0.  The definition starts on the following line.  The
              first line beginning ´%h´ and any lines beginning '%d' are stripped from the .dict file, and '%h '
              is stripped from in front of the headword.  All text before the first headword is included in  the
              headers.  The second line beginning '%h' is taken as the first headword.
              This option was written to format Jay Kominek's elements database.

       -i -I  These  two  options  are different from all other formatting options.  They are intended to resort
              (according to dictd requirement) an .index file given  on  stdin.   That  is  .dict  file  is  not
              generated  at  all.  Only resorting is made.  Three- or four-column .index like input is expected.
              -i expects decimal offset and length, while -I expects them in base64 format.

OPTIONS

       -u url Specifies the URL of the site from which the  raw  database  was  obtained.   If  this  option  is
              specified, 00-database-url headword and appropriate definition will be ignored.

       -s name
              Specifies  the  name  and,  optionally,  the version and date, of the database.  (If this contains
              spaces, it must  be  quoted.)   If  this  option  is  specified,  00-database-short  headword  and
              appropriate definition will be ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a help message

       --locale locale
              Specifies  the  locale  used  for sorting.  If no locale is specified, the "C" locale is used. For
              using UTF-8 mode, --utf8 is needed.

       --8bit generates database in 8-bit mode, see --locale option also.
              Note: This option is deprecated.  Use it for creating  8-bit  (non-UTF8)  dictionaries  only.   In
              order to create UTF-8 dictionary, use --utf8 option instead.

       --utf8 If specified, UTF-8 database is created.

       --allchars
              Specifies  that  all characters should be used for the search, by default only alphabetic, numeric
              characters and spaces are put to .index file and therefore are used in search. Creates the special
              entry 00-database-allchars.

       --case-sensitive
              makes the search case sensitive.  Creates the special entry 00-database-case-sensitive.

       --headword-separator sep
              sets the headword separator, which allows several words to have the same definition.  For example,
              if ´--headword-separator %%%' is given, and the input file contains ´autumn%%%fall', both 'autumn'
              and 'fall' will be indexed as  headwords, with the same definition.

       --index-data-separator sep
              sets the index/data separator, which allows one to set the first and fourth columns of .index file
              independently. That is the first column can be treated as an index column (where the MATCH command
              searches)  and  the fourth column as a result column (where the MATCH gets things to be returned),
              and they (1-st and 4-th columns) are completely independent of each other.  The default value  for
              this separator is ASCII symbol " \034".

       --break-headwords
              multiple  headwords  will  be  written  on  separate  lines  in  the  .dict  file.   For  use with
              '--headword-separator.

       --index-keep-orig
              When --utf-8 is specified headwords are lowercased and  non-alphanumeric  characters  are  removed
              from  it  before  saving  to  .index file in order to simplify the search.  When --index-keep-orig
              option is used fourth column is created (if necessary) in .index file, and  contains  an  original
              headword  which  is  returned by MATCH command.  This option may be useful to prevent converting "
              AT&T" to " ATT" or to keep proper nouns with uppercased first letter.

       --without-headword
              headwords will not be included in .dict file

       --without-header
              header will not be copied to DB info entry

       --without-url
              URL will not be copied to DB info entry

       --without-time
              time of creation will not be copied to DB info entry

       --without-ver
              By default dictfmt creates a special entry 00-database-dictfmt-X.Y.Z that contains (in .dict file)
              dictfmt version in format dictfmt-X.Y.Z. This option suppresses this.

       --without-info
              DB  info  entry  will not be created.  This may be useful if 00-database-info headword is expected
              from stdin (dictunformat outputs it).

       --columns columns
              By default dictfmt wraps strings read from stdin to 72 columns.  This option changes this default.
              If it is set to zero or negative value, wrapping is off.

       --default-strategy strategy
              Sets  the  default  search  strategy  for  the database.  It will be used instead of strategy '.'.
              Special entry 00-database-default-strategy is created  for  this  purpose.   This  option  may  be
              useful,  for  example,  for  dictionaries  containing mainly phrases but the single words.  In any
              case, use this option if you are absolutely sure what you are doing.

       --mime-header mime_header
              When client sends OPTION MIME command to the dictd  ,  definitions  found  in  this  database  are
              prepended by the specified MIME header. Creates the special entry 00-database-mime-header.

CREDITS

       dictfmt  was  written  by  Rik  Faith  (faith@cs.unc.edu)  as  part of the dict-misc package.  dictfmt is
       distributed under the terms of the GNU General Public License.  If you need  to  distribute  under  other
       terms, write to the author.

AUTHOR

       This manual page was written by Robert D. Hilliard <hilliard@debian.org> .

SEE ALSO

       dict(1), dictd(8), dictzip(1), dictunformat(1), http://www.dict.org, RFC 2229

                                                25 December 2000                                      DICTFMT(1)