Provided by: dictfmt_1.13.1+dfsg-1build1_amd64 bug

NAME

       dictfmt - formats a DICT protocol dictionary database

SYNOPSIS

       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
       dictfmt  -i|-I [options]

DESCRIPTION

       dictfmt   takes  a  file,  FILE,  on  stdin,  and  creates  a  dictionary  database  named
       basename.dict, that conforms to the DICT protocol.  It also creates an  index  file  named
       basename.index.   By  default,  the  index  is  sorted according to the C locale, and only
       alphanumeric characters and spaces are used in sorting, however this may be  changed  with
       the  --locale  and --allchars options.  ( basename is commonly chosen to correspond to the
       basename of FILE , but this is not mandatory.)

       Unless the database is extremely small, it is highly  recommended  that  basename.dict  be
       compressed  with /usr/bin/dictzip to create basename.dict.dz.  (dictzip is included in the
       dictd source package.)

       FILE may be in any of the several formats described by the format options -c5, -t, -e, -f,
       -h, -j, -p, -i or -I.  Exactly one of these options must be given.

       dictfmt  prepends several headers are to the .dict file.  The 00-database-url header gives
       the value of the -u option as the URL of the site from which  the  original  database  was
       obtained.  The 00-database-short header gives the value of the -s option as the short name
       of the dictionary.  (This "short name" is the identifying name  given  by  the  "dict-  D"
       option.)   If  the  -u  and/or  -s  options  are  omitted,  these  values will be shown as
       "unknown", which is undesirable for a publicly distributed database.

       The date of conversion (formatting) is given in the 00-database-info header.  All text  in
       the  input  file  prior  to  the  first headword (as defined by the appropriate formatting
       option) is appended to this header.  All text in the input file following a  headword,  up
       to the next headword, is copied unchanged to the .dict file.

FORMATTING OPTIONS

       -c5    FILE  is  formatted  with headwords preceded by 5 or more underscore characters (_)
              and a blank line.  All text until the next headword is considered  the  definition.
              Any  leading  `@' characters are stripped out, but the file is otherwise unchanged.
              This option was written to format the CIA WORLD FACTBOOK 1995.

       -t     -c5, --without-info and --without-headword options are implied.  Use  this  option,
              if an input database comes from dictunformat utility.

       -e     FILE is in html format, with the headword tagged as bold.  (<B>headword - </B>)
              This  option was written to format EASTON'S 1897 BIBLE DICTIONARY.  A typical entry
              from Easton is:

              <A NAME="T0000005">
              <B>Abagtha - </B>
              one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

              This is converted to:
              Abagtha
                 one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

              The heading "<A NAME="T0000005"> is omitted, and the headword `Abagtha' is indexed.

              NOTE: This option should be used  with  caution.   It  removes  several  html  tags
              (enough  to format Easton properly), but not all.  The Makefile that was originally
              written to format dict-easton uses sed scripts to modify  certain  cross  reference
              tags.  It may be necessary to pipe the input file through a sed script, or hack the
              source of dictfmt in order to properly format other html databases.

       -f     FILE is formatted with the headwords starting in  column  0,  with  the  definition
              indented at least one space (or tab character) on subsequent lines.  The third line
              starting in column 0 is taken as the first headword  ,  and  the  first  two  lines
              starting  in  column  0  are  treated as part of the 00-database-info header.  This
              option was written to format the F.O.L.D.O.C.

       -h     FILE is formatted with the headwords starting in column 0,  followed  by  a  comma,
              with  the definition continuing on the same line.  All text before the first single
              character line is included in 00-database-info header,  and  lines  with  only  one
              character  are  omitted  from  the  .dict  file.  The first headword is on the line
              following the first single character line.  The headword is indexed;  the  text  of
              the file is not changed.  This option was written to format HITCHCOCK'S BIBLE NAMES
              DICTIONARY.

       -j     FILE is formatted with headwords starting in col 0, enclosed in colons, followed by
              the  definition.  The colons surrounding the headword are removed, and the headword
              is indexed.  Lines beginning with '*', '=', or '-'  are  also  removed.   All  text
              before  the  first headword is included in the headers.  This option was written to
              format the JARGON FILE.
              NOTE: Some recent versions of the JARGON FILE had three blanks inserted before  the
              first  colon  at  each  headword.   These  must  be  removed before processing with
              dictfmt.  (sed scripts have been used for this purpose. ed, awk,  or  perl  scripts
              are also possible.)

       -p     FILE  is  formatted  with  '%h'  in  column 0, followed by a blank, followed by the
              headword, optionally  followed  by  a  line  containing  `%d'  in  column  0.   The
              definition  starts  on  the  following line.  The first line beginning '%h' and any
              lines beginning '%d' are stripped from the .dict file, and '%h ' is  stripped  from
              in  front  of  the headword.  All text before the first headword is included in the
              headers.  The second line beginning '%h' is taken as the first headword.
              This option was written to format Jay Kominek's elements database.

       -i -I  These two options are different  from  all  other  formatting  options.   They  are
              intended  to resort (according to dictd requirement) an .index file given on stdin.
              That is .dict file is not generated at all. Only  resorting  is  made.   Three-  or
              four-column  .index  like input is expected.  -i expects decimal offset and length,
              while -I expects them in base64 format.

OPTIONS

       -u url Specifies the URL of the site from which the raw database was  obtained.   If  this
              option  is  specified,  00-database-url headword and appropriate definition will be
              ignored.

       -s name
              Specifies the name and, optionally, the version and date,  of  the  database.   (If
              this   contains  spaces,  it  must  be  quoted.)   If  this  option  is  specified,
              00-database-short headword and appropriate definition will be ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a help message

       --locale locale
              Specifies the locale used for sorting.  If no locale is specified, the  "C"  locale
              is used. For using UTF-8 mode, --utf8 is needed.

       --8bit generates database in 8-bit mode, see --locale option also.
              Note: This option is deprecated.  Use it for creating 8-bit (non-UTF8) dictionaries
              only.  In order to create UTF-8 dictionary, use --utf8 option instead.

       --utf8 If specified, UTF-8 database is created.

       --allchars
              Specifies that all characters should be  used  for  the  search,  by  default  only
              alphabetic,  numeric characters and spaces are put to .index file and therefore are
              used in search. Creates the special entry 00-database-allchars.

       --case-sensitive
              makes the search case  sensitive.   Creates  the  special  entry  00-database-case-
              sensitive.

       --headword-separator sep
              sets  the  headword  separator,  which  allows  several  words  to  have  the  same
              definition.  For example, if '--headword-separator %%%' is  given,  and  the  input
              file  contains  'autumn%%%fall',  both  'autumn'  and  'fall'  will  be  indexed as
              headwords, with the same definition.

       --index-data-separator sep
              sets the index/data separator, which allows one to set the first and fourth columns
              of  .index  file independently. That is the first column can be treated as an index
              column (where the MATCH command searches) and the fourth column as a result  column
              (where  the MATCH gets things to be returned), and they (1-st and 4-th columns) are
              completely independent of each other.  The default  value  for  this  separator  is
              ASCII symbol " \034".

       --break-headwords
              multiple  headwords  will  be written on separate lines in the .dict file.  For use
              with '--headword-separator.

       --index-keep-orig
              When --utf-8 is specified headwords are lowercased and non-alphanumeric  characters
              are  removed  from it before saving to .index file in order to simplify the search.
              When --index-keep-orig option is used fourth column is created  (if  necessary)  in
              .index  file, and contains an original headword which is returned by MATCH command.
              This option may be useful to prevent converting " AT&T" to " ATT" or to keep proper
              nouns with uppercased first letter.

       --without-headword
              headwords will not be included in .dict file

       --without-header
              header will not be copied to DB info entry

       --without-url
              URL will not be copied to DB info entry

       --without-time
              time of creation will not be copied to DB info entry

       --without-ver
              By  default dictfmt creates a special entry 00-database-dictfmt-X.Y.Z that contains
              (in .dict file) dictfmt version in format  dictfmt-X.Y.Z.  This  option  suppresses
              this.

       --without-info
              DB info entry will not be created.  This may be useful if 00-database-info headword
              is expected from stdin (dictunformat outputs it).

       --columns columns
              By default dictfmt wraps strings read  from  stdin  to  72  columns.   This  option
              changes this default. If it is set to zero or negative value, wrapping is off.

       --default-strategy strategy
              Sets  the  default  search  strategy  for the database.  It will be used instead of
              strategy '.'.  Special  entry  00-database-default-strategy  is  created  for  this
              purpose.   This  option  may  be  useful,  for example, for dictionaries containing
              mainly phrases but the single words.  In any case,  use  this  option  if  you  are
              absolutely sure what you are doing.

       --mime-header mime_header
              When  client  sends  OPTION  MIME  command to the dictd , definitions found in this
              database are prepended by the specified MIME  header.  Creates  the  special  entry
              00-database-mime-header.

CREDITS

       dictfmt  was  written  by  Rik  Faith (faith@cs.unc.edu) as part of the dict-misc package.
       dictfmt is distributed under the terms of the GNU General Public License.  If you need  to
       distribute under other terms, write to the author.

AUTHOR

       This manual page was written by Robert D. Hilliard <hilliard@debian.org> .

SEE ALSO

       dict(1), dictd(8), dictzip(1), dictunformat(1), http://www.dict.org, RFC 2229

                                         25 December 2000                              DICTFMT(1)