Provided by: dictd_1.13.1+dfsg-1build1_amd64 

NAME
dictd - a dictionary database server
SYNOPSIS
dictd [options]
DESCRIPTION
dictd is a server for the Dictionary Server Protocol (DICT), a TCP transaction based query/response
protocol that allows a client to access dictionary definitions from a set of natural language dictionary
databases.
For security reasons, dictd drops root permissions after startup. If user dictd exists on the system,
the daemon will run as that user, group dictd, otherwise it will run as user nobody, group nobody or
nogroup (depending on the operating system distribution).
Since startup time is significant, the server is designed to run continuously, and should not be run from
inetd(8). (However, with a fast processor, it is feasible to do so.)
Databases are distributed separately from the server.
By default, dictd assumes that the index files are sorted alphabetically, and only alphanumeric
characters from the 7-bit ASCII character set are used for search. This default may be overridden by a
header in the data file. The only such features implemented at this time are the headers "00-database-
allchars" which tells dictd that non-alphanumeric characters may also be used for search, the header
"00-database-utf8" which indicates that the database uses utf8 encoding, and the "00-database-8bit-new"
which indicates that the database is encoded and sorted according to a locale that uses an 8-bit
encoding.
BACKGROUND
For many years, the Internet community has relied on the "webster" protocol for access to natural
language definitions. The webster protocol supports access to a single dictionary and (optionally) to a
single thesaurus. In recent years, the number of publicly available webster servers on the Internet has
dramatically decreased.
Fortunately, several freely-distributable dictionaries and lexicons have recently become available on the
Internet. However, these freely-distributable databases are not accessible via a uniform interface, and
are not accessible from a single site. They are often small and incomplete individually, but would
collectively provide an interesting and useful database of English words. Examples include the Jargon
file, the WordNet database, MICRA's version of the 1913 Webster's Revised Unabridged Dictionary, and the
Free Online Dictionary of Computing. (See the DICT protocol specification (RFC) for references.)
Translating and non-English dictionaries are also becoming available (for example, the FOLDOC dictionary
is being translated into Spanish).
The webster protocol is not suitable for providing access to a large number of separate dictionary
databases, and extensions to the current webster protocol were not felt to be a clean solution to the
dictionary database problem.
The DICT protocol is designed to provide access to multiple databases. Word definitions can be
requested, the word index can be searched (using an easily extended set of algorithms), information about
the server can be provided (e.g., which index search strategies are supported, or which databases are
available), and information about a database can be provided (e.g., copyright, citation, or distribution
information). Further, the DICT protocol has hooks that can be used to restrict access to some or all of
the databases.
dictd(8) is a server that implements the DICT protocol. Bret Martin implemented another server, and
several people (including Bret and myself) have implemented clients in a variety of languages.
OPTIONS
-V or --version
Display version information.
--license
Display copyright and license information.
-h or --help
Display help information.
-v or --verbose or -dverbose
Be verbose.
-c file or --config file
Specify configuration file. The default is /etc/dictd/dictd.conf , but may be changed in the
defs.h file at compile time (DICTD_CONFIG_FILE).
-p port or --port port
Overrides the keyword port in Global Settings Specification section of configuration file.
-i or --inetd
Communicate on standard input/output, suitable for use from inetd. Although, due to its rather
large startup time, this daemon was not intended to run from inetd, with a fast processor it is
feasible to do so. This option also implies --fast-start.
--pp prog
Sets a preprocessor for configuration file. like m4 or cpp . See examples/dictd_complex.conf
file from distribution. By default configuration file is parsed without preprocessor.
--depth length
Overrides the keyword depth in Global Settings Specification section of configuration file.
--delay seconds
Overrides the keyword delay in Global Settings Specification section of configuration file.
--facility facility
The same as syslog_facility keyword in Global Settings Specification of configuration files.
-f or --force
Force the daemon to start even if an instance of the daemon is already running. (This is of
little value unless a non-default port is specified with -p, since, if one instance is bound to a
port, the second one fails when it can not bind to the port.)
--limit children
Overrides the keyword limit in Global Settings Specification section of configuration file.
--listen-to host
Overrides the keyword listen_to in Global Settings Specification section of configuration file.
--address-family family
Overrides the keyword address_family in Global Settings Specification section of configuration
file.
--locale locale
Overrides the keyword locale in Global Settings Specification section of configuration file.
-s The same as syslog keyword in Global Settings Specification of configuration files.
-L file or --logfile file
The same as log_file keyword in Global Settings Specification of configuration files.
--pid-file file
The same as pid_file keyword in Global Settings Specification of configuration files.
-m minutes or --mark minutes
Overrides the keyword timestamp in Global Settings Specification section of configuration file.
--default-strategy strategy
Overrides the keyword default_strategy in Global Settings Specification section of configuration
file.
--without-strategy strat1,strat2,...
The same as without_strategy keyword in Global Settings Specification of configuration files.
--add-strategy strategy_name:description
The same as add_strategy keyword in Global Settings Specification of configuration files.
--fast-start
The same as fast_start keyword in Global Settings Specification of configuration files.
--without-mmap
The same as without_mmap keyword in Global Settings Specification of configuration files.
--stdin2stdout
When applied with --inetd, each command obtained from stdin is output to stdout. This option is
useful for debugging.
-l option or --log option
The same as log_option keyword in Global Settings Specification of configuration files.
-d option
The same as debug_option keyword in Global Settings Specification of configuration files.
CONFIGURATION FILE
Introduction
The configuration file defaults to /etc/dictd/dictd.conf but can be specified on the command line
with the -c option (see above).
The configuration file is read into memory at startup, and is not referenced again by dictd unless
a signal 1 (SIGHUP) is received, which will cause dictd to reread the configuration file.
The file is divided into sections. The Access Section should come first, followed by the Database
Section, and the User Section. The Database Section is required; the others are optional, but
they must be in the order listed here.
Syntax The following keywords are valid in a configuration file: access, allow, deny, group, database,
data, index, filter, prefilter, postfilter, name, include, user, authonly, site. Keywords are
case sensitive. String arguments that contain spaces should be surrounded by double quotes.
Without quoting, strings may contain alphanumeric characters and _, -, ., and *, but not spaces.
Strings can be continued between lines. \", \\, \n, \<NL> are treated as double quote, backslash,
new line and no symbol respectively. Comments start with # and extend to the end of the line.
Global Settings Section
global { global settings specification }
Used to set global dictd setting such as log file, syslog facility, locale and so on.
EXAMPLE:
See examples/dictd4.conf file from the distribution.
Access Section
access { access specification }
This section contains access restrictions for the server and all of the databases
collectively. Per-database control is specified in the Database Section.
EXAMPLE:
See examples/dictd3.conf file from the distribution.
Database Section
database string { database specification }
The string specifies the name of the database (e.g., wn or web1913). (This is an arbitrary
name selected by the administrator, and is not necessarily related to the file name or any
name listed in the data file. A short, easy to type name is often selected for easy use
with dict -d.)
EXAMPLE: See examples/dictd*.conf files from the distribution.
NOTE: If the files specified in the database specification do not exist on the system,
dictd may silently fail.
database_virtual string { virtual database specification }
This section specifies the virtual database. The string specifies the name of the database
(e.g., en-ru or fren).
EXAMPLE: See examples/dictd_virtual.conf or examples/dictd_complex.conf files from the
distribution.
database_plugin string { plugin specification }
This section specifies the plugin. The string specifies the name of the database.
EXAMPLE: See examples/dictd_plugin_dbi.conf or examples/dictd_complex.conf files from the
distribution.
database_mime string { mime specification }
Traditionally, databases created for dictd contained plain text only because dictd releases
before 1.10.0 didn't have full support of OPTION MIME option (consult with RFC-2229). This
section describes the special database which behaves differently depending on whether
OPTION MIME command was received from client or was not, i.e. the database created by this
section allows one to return to the client either a plain text or specially formatted
content depending on whether DICT client supports (or wants to receive) MIMEized content or
doesn't. The string specifies the name of the database.
NOTE: All this is about DEFINE command only. MATCH, SHOW DB, SHOW STRAT, SHOW INFO, SHOW
SERVER and HELP commands return texts prepended with empty line only.
EXAMPLE: See examples/dictd_mime.conf file from the distribution.
database_exit
Excludes following databases from the '*' database. By default '*' means all databases
available. Look at 'examples/dictd_virtual.conf' file for example configuration.
NOTE: If you use 'virtual' dictionaries, you should use this directive, otherwise you will
search the same dictionary twice.
User Section
user string string
The first string specifies the username, and the second string specifies the shared
secret for this username. When the AUTH command is used, the client will provide
the username and a hashed version of the shared secret. If the shared secret
matches, the user is said to have authenticated, and will have access to databases
whose access specifications allow that user (by name, or by wildcard). If present,
this section must appear last in the configuration file. There may be many user
entries. The shared secret should be kept secret, as anyone who has access to it
can access the shared databases (assuming access is not denied by domain name).
Access Specification
Access specifications may occur in the Access Section or in the Database Section. The
access specification will be described here.
For allow, deny, and authonly, a star (*) may be used as a wild card that matches any
number of characters. A question mark (?) may be used as a wildcard that matches a single
character. For example, 10.0.0.* and *.edu are valid strings.
Further, a range of IP addresses and an IP address followed by a netmask may be specified.
For example, 10.0.0.0:10.0.0.255, 10.0.0.0/24, and 10.0.0.* all specify the same range of
IP numbers. Notation cannot be combined on the same line. If the notation does not make
sense, access will be denied by default. Use the --debug auth option to debug related
problems.
Note that these specifications take only one string per specification line. However, you
can have multiple lines of each type.
The syntax is as follows:
allow string
The string specifies a domain name or IP address which is allowed access to the
server (in the Access Section) or to a database (in the Database Section). Note
that more than one string is not permitted for a single "allow" line, but more than
one "allow" lines are permitted in the configuration file.
deny string
The string specifies a domain name or IP address which is denied access to the
server (in the Access Section) or to a database (in the Database Section). Note
that if reverse DNS is not working, then only the IP number will be checked.
Therefore, it is essential to deny networks based on IP number, since a denial based
on domain name may not always be checked.
authonly string
This form is only useful in the Access Section. The string specifies a domain name
or IP address which is allowed access to the server but not to any of the databases.
All commands are valid except DEFINE, MATCH, and SHOW DB. More specifically AUTH is
a valid command, and commands which access the databases are not allowed.
user string
This form is only useful in the Database Section. The string specifies a username
that is allowed to access this database after a successful AUTH command is executed.
Global Settings Specification
This section describes the following parameters:
port string_or_number
Specifies the port or service name (e.g., 2628). The default is 2628, as specified in the
DICT Protocol RFC, but may be changed in the defs.h file at compile time
(DICT_DEFAULT_SERVICE).
site string
Used to specify the filename for the site information file, a flat text file which will be
displayed in response to the SHOW SERVER command.
EXAMPLE: See examples/dictd4.conf file from the distribution.
site_no_banner boolean
By default SHOW SERVER command outputs information about dictd version and an operating
system type. This option disables this.
site_no_uptime boolean
By default SHOW SERVER command outputs information about uptime of dictd , a number of
forks since startup and forks per hour. This option disables this.
site_no_dblist boolean
By default SHOW SERVER command outputs internal information about databases, such as a
number of headwords, index size and so on. This option disables this.
delay number
Specifies the number of seconds a client may be idle before the server will close the
connection. Idle time is defined to be the time the server is waiting for input and does
not include the time the server spends searching the database. The default is 0 seconds (no
limit), but may be changed in the defs.h file at compile time (DICT_DEFAULT_DELAY).
NOTE: Setting delay option disables limit_time option. Only one of them (last specified in
dictd.conf ) is in effect.
NOTE: Connections are closed without warning since no provision for premature connection
termination is specified in the DICT protocol RFC.
depth number
Specify the queue length for listen(2). Specifies the number of pending socket connections
which are queued by the operating system. Some operating systems may silently limit this
value to 5 (older BSD systems) or 128 (Linux). The default is 10 but may be changed in the
defs.h file at compile time (DICT_QUEUE_DEPTH).
limit_childs number
Specifies the number of daemons that may be running simultaneously. Each daemon services a
single connection. If the limit is exceeded, a (serialized) connection will be made by the
server process, and a response code 420 (server temporarily unavailable) will be sent to
the client. This parameter should be adjusted to prevent the server machine from being
overloaded by dict clients, but should not be set so low that many clients are denied
useful connections. The default is 100, but may be changed in the defs.h file at compile
time (DICT_DAEMON_LIMIT_CHILDS).
limit number
Synonym for limit_childs. For backward compatibility only.
limit_matches number
Specifies the maximum number of matches that can be returned by MATCH query. Zero means no
limit. The default is 2000.
limit_definitions number
Specifies the maximum number of definitions that can be returned by DEFINE query. Zero
means no limit. The default is 200.
limit_time number
Specifies the number of seconds a client may talk to the server before the server will
close the connection. The default is 600 seconds (10 minutes), but may be changed in the
defs.h file at compile time (DICT_DEFAULT_LIMIT_TIME).
NOTE: Setting limit_time option disables delay option. Only one of them (last specified in
dictd.conf ) is in effect.
NOTE: Connections are closed without warning since no provision for premature connection
termination is specified in the DICT protocol RFC.
limit_queries number
Specifies the number of queries (MATCH, DEFINE, SHOW DB etc.) that client may send to the
server before the server will close the connection. Zero means no limit. The default is
2000, but may be changed in the defs.h file at compile time (DICT_DEFAULT_LIMIT_QUERIES).
timestamp number
How often a timestamp should be logged (int minutes). (This is effective only if logging
has been enabled with the -s or -L option, or with a debugging option.)
log_option option
Specify a logging option. This is effective only if logging has been enabled with the -s
or -L option or in configuration file, or logging to the console has been activated with a
debugging option (e.g., --debug nodetach. Only one option may be set with each invocation
of this option; however, multiple invocations of this option may be made in configuration
file or dictd command line. For instance:
dictd -s --log stats --log found --log notfound
is a valid command line, and sets three logging options.
Some of the more verbose logging options are used primarily for debugging the server code,
and are not practical for normal use.
server Log server diagnostics. This is extremely verbose.
connect
Log all connections.
stats Log all children terminations.
command
Log all commands. This is extremely verbose.
client Log results of CLIENT command.
found Log all words found in the databases.
notfound
Log all words not found in the databases.
timestamp
When logging to a file, use a full timestamp like that which syslog would produce.
Otherwise, no timestamp is made, making the files shorter.
host Log name of foreign host.
auth Log authentication failures.
min Set a minimal number of options. If logging is activated (to a file, or via
syslog), and no options are set, then the minimal set of options will be used. If
options are set, then only those options specified will be used.
all Set all of the options.
none Clear all of the options.
To facilitate location of interesting information in the log file, entries are marked with
initial letters indicating the class of the line being logged:
I Information about the server, connections, or termination statistics. These lines
are generally not designed to be parsed automatically.
E Error messages.
C CLIENT command information.
D Definitions found in the databases searched.
M Matches found in the database searched.
N Matches which were not found in the databases searched.
T Trace of exact line sent by client.
A Authentication information.
To preserve anonymity of the client, do not use the connect or host options. Clients may
or may not send host information using the CLIENT command, but this should be an option
that is selectable on the client side.
debug_option string
Activate a debugging option. There are several, all of which are only useful to
developers. They are documented here for completeness. A list can be obtained
interactively by using -d with an illegal option.
verbose
The same as -v or --verbose. Adds verbosity to other options.
scan Debug the scanner for the configuration file.
parse Debug the parser for the configuration file.
search Debug the character folding and binary search routines.
init Report database initialization.
port Log client-side port number to the log file.
lev Debug Levenshtein search algorithm.
auth Debug the authorization routines.
nodetach
Do not detach as a background process. Implies that a copy of the log file will
appear on the standard output.
nofork Do not fork daemons to service requests. Be a single-threaded server. This option
implies nodetach, and is most useful for using a debugger to find the point at which
daemon processes are dumping core.
alt Debugs altcompare in index.c.
locale string
Specifies the locale used for searching. If no locale is specified, the "C" locale is
used. The locale used for the server should be the same as that used for dictfmt when the
database was built (specifically, the locale under which the index was sorted). The locale
should be specified for both 8-bit and UTF-8 formats. If locale contains utf8 or utf-8
substring, UTF-8 format is expected. Note that if your database is not in ASCII7 or UTF-8
format, then the dictd server will not be compliant to RFC 2229.
NOTE If utf-8 or 8-bit dictionaries are included in the configuration file, and the
appropriate --locale has not been specified, dictd will fail to start. This implies that
dictd will not run with both utf-8 and 8-bit dictionaries in the configuration file.
add_strategy strategy_name description
Adds strategy strategy_name with the description description. This new search strategy may
be implemented with a help of plugins. Both strategy_name and description are strings.
default_strategy string
Set the server's default search strategy for MATCH search type. The compiled-in default is
'lev'. It is also possible to set default strategy per database. See default_strategy
keyword in Database specification section.
disable_strategy string
Disable specified strategies. By default all implemented search strategies are enabled.
It is also possible to disable strategies per database. See disable_strategy keyword in
Database specification section.
listen_to host
Local host name or IP address for bind. If unspecified or *, dictd will bind to all
interfaces. Otherwise, dictd will bind to this address only.
address_family family
If 4, address family is IPv4 (the default), if 6, address family is IPv6.
syslog string
Log using the syslog(3) facility.
syslog_facility string
Specifies the syslog facility to use. The use of this option implies the -s option to turn
on logging via syslog. When the operating system libraries support SYSLOG_NAMES, the names
used for this option should be those listed in syslog.conf(5). Otherwise, the following
names are used (assuming the particular facility is defined in the header files): auth,
authpriv, cron, daemon, ftp, kern, lpr, mail, news, syslog, user, uucp, local0, local1,
local2, local3, local4, local5, local6, and local7.
log_file string
Specify the file for logging. The filename specified is recomputed on each use using the
strftime(3) call. For example, a filename ending in ".%Y%m%d" will write to log files
ending in the year, month, and date that the log entry was written.
NOTE: If dictd does not have write permission for this file, it will silently fail.
pid_file string
The specified filename will be created to contain the process id of the main dictd process.
The default is /var/run/dictd.pid
fast_start
By default, dictd creates (in memory) additional index to make the search faster. This
option disables this behaviour and makes startup faster.
without_mmap
do not use the mmap(2) function and read entire files into memory instead. Use this
option, if you know exactly what you are doing.
Database Specification
The database specification describes the database:
data string
Specifies the filename for the flat text database. If the filename does not begin with '.'
or '/', it is prepended with $datadir/. It is a compile time option. You can change this
behaviour by editing Makefile or running ./configure --datadir=...
index string
Specifies the filename for the index file. Path matter is similar to that described above
in "data" option .
index_suffix string
This is optional index file to make 'suffix' search strategy faster (binary search). It is
generated by 'dictfmt_index2suffix'. Run "dictfmt_index2suffix --help" for more
information. Path matter is similar to that described above in "data" option .
index_word string
This is optional index file to make 'word' search strategy faster (binary search). It is
generated by 'dictfmt_index2word'. Run "dictfmt_index2word --help" for more information.
Path matter is similar to that described above in "data" option .
prefilter string
Specifies the prefilter command. When a chunk of the compressed database is read, it
will be filtered with this filter before being decompressed. This may be used to
provide some additional compression that knows about the data and can provide better
compression than the LZ77 algorithm used by zlib.
postfilter string
Specifies the postfilter command. When a chunk of the compressed database is read, it will
be filtered with this filter before the offset and length for the entry are used to access
data. This is provided for symmetry with the prefilter command, and may also be useful for
providing additional database compression.
filter string
Specifies the filter command. After the entry is extracted from the database, it will be
filtered with this filter. This may be used to provide formatting for the entry (e.g., for
html).
name string
Specifies the short name of the database (e.g., "1913 Webster's"). If the string begins
with @, then it specifies the headword to look up in the dictionary to find the short name
of the database. The default is "@00-database-short", but this may be changed in the
defs.h file at compile time (DICT_SHORT_ENTRY_NAME).
info string
Specifies the information about database. If the string begins with @, then it specifies
the headword to look up in the dictionary to find information. The default is
"@00-database-info", but this may be changed in the defs.h file at compile time
(DICT_INFO_ENTRY_NAME).
invisible
Makes dictionary invisible to the clients i.e. this dictionary will not be recognized or
shown by DEFINE, MATCH, SHOW INFO, SHOW SERVER and SHOW DB commands. If some definitions or
matches are found in invisible dictionary, the name of the upper visible virtual dictionary
is returned. Dictionaries '*' and '!' don't include invisible ones. NOTE: Invisible
dictionaries are completely inaccessible (and invisible) to the client unless they are
included to the virtual or MIME dictionary (See database_virtual or database_mime database
sections).
disable_strategy string
Disables the specified strategy for database. This may be useful for slow dictionaries
(plugins) or for dictionaries included to virtual ones. For an example see file
examples/dictd_complex.conf.
default_strategy string
Specifies the strategy which will be used if the database is accessed using the strategy
'.'. I.e. this directive is the way to set the preferred search strategy per database. For
example, instead of strategy lev , the strategy word may be preferred for databases mainly
containing the multiword phrases but the single words.
Virtual Database Specification
The virtual database specification describes the virtual database:
database_list string
Specifies a list of databases which are included into the virtual database. Database names
are in the string and are separated by comma.
name string
Specifies the short name of the database. See database specification
info string
Specifies the information about database. See database specification
invisible
Makes dictionary invisible to the clients. See database specification
disable_strategy string
Disables the specified strategy for database. See database specification
Plugin Specification
plugin string
Specifies a filename of the plugin.
data string
Specifies data for initializing plugin.
name string
Specifies the short name of the database. See Database Specification for more information.
info string
Specifies the information about database. See Database Specification for more information.
invisible
Makes dictionary invisible to the clients. See Database Specification for more
information.
disable_strategy string
Disables the specified strategy for database. See Database Specification for more
information.
default_strategy string
Sets the default search strategy for database. See Database Specification for more
information.
Mime Specification
dbname_nomime string
Specifies the real database name which is used in case OPTION MIME command was NOT received
from a client.
dbname_mime string
Specifies the real database name which is used in case OPTION MIME command WAS received
from a client. A necessary MIME header is set while creating a database. See dictfmt(1)
for option --mime-header.
name string
Specifies the short name of the database. See Database Specification for more information.
info string
Specifies the information about database. See Database Specification for more information.
invisible
Makes dictionary invisible to the clients. See Database Specification for more
information.
disable_strategy string
Disables the specified strategy for database. See Database Specification for more
information.
default_strategy string
Sets the default search strategy for database. See Database Specification for more
information.
include string
The text of the file "string" (usually a database specification) will be read as if it appeared at
this location in the configuration file. Nested includes are not permitted.
DETERMINATION OF ACCESS LEVEL
When a client connects, the global access specification is scanned, in order, until a specification
matches. If no access specification exists, all access is allowed (e.g., the action is the same as if
"allow *" was the only item in the specification). For each item, both the hostname and IP are checked.
For example, consider the following access specification:
allow 10.42.*
authonly *.edu
deny *
With this specification, all clients in the 10.42 network will be allowed access to unrestricted
databases; all clients from *.edu sites will be allowed to authenticate, but will be denied access to all
databases, even those which are otherwise unrestricted; and all other clients will have their connection
terminated immediately. The 10.42 network clients can send an AUTH command and gain access to restricted
databases. The *.edu clients must send an AUTH command to gain access to any databases, restricted or
unrestricted.
When the AUTH command is sent, the access list for each database is scanned, in order, just as the global
access list is scanned. However, after authentication, the client has an associated username. For
example, consider the following access specification:
user u1
deny *.com
user u2
allow *
If the client authenticated as u1, then the client will have access to this database, even if the client
comes from a *.com site. In contrast, if the client authenticated as u2, the client will only have
access if it does not come from a *.com site. In this case, the "user u2" is redundant, since that
client would also match "allow *".
Warning: Checks are performed for domain names and for IP addresses. However, if reverse DNS for a
specific site is not working, it is possible that a domain name may not be available for checking. Make
sure that all denials use IP addresses. (And consider a future enhancement: if a domain name is not
available, should denials that depend on a domain name match anything? This is the more conservative
viewpoint, but it is not currently implemented.)
SEARCH ALGORITHMS
The DICT standard specifies a few search algorithms that must be implemented, and permits others to be
supported on a server-dependent basis. The following search strategies are supported by this server.
Note that all strategies are case insensitive. Most ignore non-alphanumeric, non-whitespace characters.
exact An exact match. This algorithm uses a binary search and is one of the fastest search algorithms
available.
lev The Levenshtein algorithm (string edit distance of one). This algorithm searches for all words
which are within an edit distance of one from the target word. An "edit" means an insertion,
deletion, or transposition. This is a rapid algorithm for correcting spelling errors, since many
spelling errors are within a Levenshtein distance of one from the original word.
prefix Prefix match. This algorithm also uses a binary search and is very fast.
nprefix
Like prefix but returns the specified range of matches. For example, when prefix strategy returns
1000 matches, you can get only 100 ones skipping the first 800 matches. This is made by specified
these limits in a query like this: 800#100#app, where 800 is skip count, 100 is a number of
matches you want to get and "app" is your query. This strategy allows one to implement DICT
client with fast autocompletion (although it is not trivial) just like many standalone dictionary
programs do.
NOTE: If you access the dictionary "*" (or virtual one) with nprefix strategy, the same range is
set for each database in it, but globally for all matches found in all databases.
NOTE: In case you access non-english dictionary the returned matches may be (and mostly will be)
NOT ordered in alphabetic order.
re POSIX 1003.2 (modern) regular expression search. Modern regular expressions are the ones used by
egrep(1). These regular expressions allow predefined character classes (e.g., [[:alnum:]],
[[:alpha:]], [[:digit:]], and [[:xdigit:]] are useful for this application); uses * to match a
sequence 0 or more matches of the previous atom; uses + to match a sequence of 1 or more matches
of the previous atom; uses ? to match a sequence of 0 or 1 matches of the previous atom; used ^ to
match the beginning of a word, uses $ to match the end of a word, and allows nested subexpression
and alternation with () and |. For example, "(foo|bar)" matches all words that contain either
"foo" or "bar". To match these special characters, they must be quoted with two backslashes (due
to the quoting characteristics of the server). Warning: Regular expression matches can take 10 to
300 times longer than substring matches. On a busy server, with many databases, this can required
more than 5 minutes of waiting time, depending on the complexity of the regular expression.
regexp Old (basic) regular expressions. These regular expressions don't support |, +, or ?. Groups use
escaped parentheses. While modern regular expressions are generally easier to use, basic regular
expressions have a back reference feature. This can be used to match a second occurrence of
something that was already matched. For example, the following expression finds all words that
begin and end with the same three letters:
^\\(...\\).*\\1$
Note the use of the double backslashes to escape the special characters. This is required by the
DICT protocol string specification (a single backslash quotes the next character -- we use two to
get a single backslash through to the regular expression engine). Warning: Note that the use of
backtracking is even slower than the use of general regular expressions.
soundex
The Soundex algorithm, a classic algorithm for finding words that sound similar to each other.
The algorithm encodes each word using the first letter of the word and up to three digits. Since
the first letter is known, this search is relatively fast, and it sometimes good for correcting
spelling errors when the Levenshtein algorithm doesn't help.
substring
Match a substring anywhere in the headword. This search strategy uses a modified Boyer-Moore-
Horspool algorithm. Since it must search the whole index file, it is not as fast as the exact and
prefix matches.
suffix Suffix match. This search strategy also uses a modified Boyer-Moore-Horspool algorithm, and is as
fast as the substring search. If the optional index_suffix string file is listed in the
configuration file this search is much faster.
word Match any single word, even if part of a multi-word entry. If the optional index_word string file
is listed in the configuration file this search strategy works much faster.
first Match the first word that begins a multi-word entry.
last Match the last word that ends a multi-word entry. If the optional index_suffix string file is
listed in the configuration file this search strategy works much faster.
DATABASE FORMAT
Databases for dictd are distributed separately. A database consists of two files. One is a flat text
file, the other is the index.
The flat text file contains dictionary entries (or any other suitable data), and the index contains tab-
delimited tuples consisting of the headword, the byte offset at which this entry begins in the flat text
file, and the length of the entry in bytes. The offset and length are encoded using base 64 encoding
using the 64-character subset of International Alphabet IA5 discussed in RFC 1421 (printable encoding)
and RFC 1522 (base64 MIME). Encoding the offsets in base 64 saves considerable space when compared with
the usual base 10 encoding, while still permitting tab characters (ASCII 9) to be used for delimiting
fields in a record. Each record ends with a newline (ASCII 10), so the index file is human readable.
Some headwords are used by dictd especially
00-database-info Contains the information about database which is returned by SHOW INFO command, unless
it is specified in the configuration file.
00-database-short Contains the short name of the database which is returned by SHOW DB command, unless it
is specified in the configuration file. See dictfmt -s.
00-database-url URL where original dictionary sources were obtained from. See dictfmt -u. This headword
is not used by dictd
00-database-utf8 Presents if dictionary is encoded using UTF-8. See dictfmt --utf8
00-database-8bit-new Presents if dictionary is encoded using 8-BIT character set (not ASCII and not
UTF8). See dictfmt --locale.
The flat text file may be compressed using gzip(1) (not recommended) or dictzip(1) (highly recommended).
Optimal speed will be obtained using an uncompressed file. However, the gzip compression algorithm works
very well on plain text, and can result in space savings typically between 60 and 80%. Using a file
compressed with gzip(1) is not recommended, however, because random access on the file can only be
accomplished by serially decompressing the whole file, a process which is prohibitively slow. dictzip(1)
uses the same compression algorithm and file format as does gzip(1), but provides a table that can be
used to randomly access compressed blocks in the file. The use of 50-64kB blocks for compression
typically degrades compression by less than 10%, while maintaining acceptable random access capabilities
for all data in the file. As an added benefit, files compressed with dictzip(1) can be decompressed with
gzip(1) or zcat(1). (Note: recompressing a dictzip'd file using, for example, znew(1) will destroy the
random access characteristics of the file. Always compress data files using dictzip(1).)
SIGNALS
SIGHUP causes dictd to reread configuration file and reinitialize databases.
SIGUSR1 causes dictd to unload databases. Then dictd returns 420 status (instead of 220). To load
databases again, send SIGHUP signal. Because database files are mmap'ed(2) , it is impossible to update
them while dictd is running. So, if you need to update database files and reread configuration file,
first, send SIGUSR1 signal to dictd to unload databases, update files, and then send SUGHUP signal to
load them again.
COPYING
The main source files for the dictd server and the dictzip compression program were written by Rik Faith
(faith@dict.org) and are distributed under the terms of the GNU General Public License. If you need to
distribute under other terms, write to the author.
The main libraries used by these programs (zlib, regex, libmaa) are distributed under different terms, so
you may be able to use the libraries for applications which are incompatible with the GPL -- please see
the copyright notices and license information that come with the libraries for more information, and
consult with your attorney to resolve these issues.
BUGS
The regular expression searches do not ignore non-whitespace, non-alphanumeric characters as do the other
searches. In practice, this isn't much of a problem.
WARNINGS
Conformance of regular expressions (used by 're' and 'regexp' search strategies) to ERE and BRE depends
on library you build dictd with. Whether 're' and 'regex' strategies support utf8 depends on library you
build dictd with.
FILES
/etc/dictd/dictd.conf
dictd configuration file
/usr/sbin/dictd
dictd daemon itself
/var/run/dictd.pid
File for storing pid of dictd daemon
/usr/share/dictd
The default directory for dictd databases (.index and .dict[.dz] files)
SEE ALSO
examples/dictd*.conf, dictfmt(1), dict(1), dictzip(1), gunzip(1), zcat(1), webster(1), RFC 2229
29 March 2002 DICTD(8)