Ubuntu Manpage: dspamc - DSPAM Anti-Spam Agent (client)

NAME

       dspamc - DSPAM Anti-Spam Agent (client)

SYNOPSIS

       dspamc [--mode=[teft|toe|tum|notrain|unlearn]] [--user user1 user2 ... userN]
       [--feature=[ch,no,wh,tb=N,sb]] [--class=[spam|innocent]]
       [--source=[error|corpus|inoculation] ] [--profile=[PROFILE] ] --deliver=[spam,innocent] ]
       [--help ] [--process ] [--classify ] [--signature=[signature] ] [--stdout] [--debug]
       [--daemon] [--client] [--rcpt-to] [--mail-from] [ delivery_arguments ]

DESCRIPTION

       The  DSPAM  agent  provides  a  direct  interface  to  mail  servers for command-line spam
       filtering. The agent can masquerade as the mail server's local  delivery  agent  and  will
       process  any  email  passed  to  it.  The agent will then call whatever delivery agent was
       specified at compile time or quarantine/tag/drop messages identified as  spam.  The  DSPAM
       agent  can  function  locally  or  as  a  proxy.  It  is  also  responsible for processing
       classification errors so that DSPAM can learn from its mistakes.   This  version  (dspamc)
       uses a connection to a dspam server rather than re-create contexts on each execution.

OPTIONS

--user user1 user2 ... userNSpecifies the destination users of the incoming message. In
most cases this is
the local user on the system, however some implementations may call for virtual
usernames, specific to DSPAM, to be assigned. The agent processes an incoming
message once for each user specified. If the message is to be delivered, the $u (or
%u) parameters of the argument string will be interpolated for the current user
being processed.

--mode=[toe|tum|teft|notrain]Configures the training mode to be used for this process,
overriding any
defaults in dspam.conf:

teft : Train-Everything. Trains on all messages processed. This is a very
thorough training approach and should be considered the standard training approach
for most users. TEFT may, however, prove too volatile on installations with
extremely high per-user traffic, or prove not very scalable on systems with
extremely large user-bases. In the event that TEFT is proving ineffective, one of
the other modes is recommended.

toe : Train-on-Error. Trains only on a classification error, once the user's
metadata has matured to 2500 innocent messages. This training mode is much less
resource intensive, as only occasional metadata writes are necessary. It is also
far less volatile than the TEFT mode of training. One drawback, however, is that
TOE only learns when DSPAM has made a mistake - which means the data is sometimes
too static, and unable to "ease into" a different type of behavior.

tum : Train-until-Mature. This training mode is a hybrid between the other two
training modes and provides a great balance between volatility and static metadata.
TuM will train on a per-token basis only tokens which have had fewer than 25 "hits"
on them, unless an error is being retrained in which case all tokens are trained.
This training mode provides a solid core of stable tokens to keep accuracy
consistent, but also allows for dynamic adaptation to any new types of email
behavior a user might be experiencing.

notrain : No training. Do not train the user's data, and do not keep totals. This
should only be used in cases where you want to process mail for a particular user
(based on a group, for example), but don't want the user to accumulate any learning
data.

unlearn : Unlearn original training. Use this if you wish to unlearn a previously
learned message. Be sure to specify --source=error and --class to whatever the
original classification the message was learned under. If not using TrainPristine,
this will require the original signature from training.

--feature=[chained,noise,tb=N,whitelist]Specifies the features that should be activated
for this filter instance. The following features may be used individually or combined
using a comma as a delimiter:

chained : Chained Tokens (also known as biGrams). Chained Tokens combines adjacent
tokens, presently with a window size of 2, to form token "chains". Chained tokens
uses additional storage resources, but greatly improves accuracy. Recommended as a
default feature.

noise : Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in at 2500
innocent messages and provides an advanced progressive noise logic to reduce
Bayesian Noise (wordlist attacks) in spams. See http://bnr.nuclearelephant.com for
more information.

tb=N : Sets the training loop buffering level. Training loop buffering is the
amount of statistical sedation performed to water down statistics and avoid false
positives during the user's training loop. The training buffer sets the buffer
sensitivity, and should be a number between 0 (no buffering whatsoever) to 10
(heavy buffering). The default is 5, half of what previous versions of DSPAM used.
To avoid dulling down statistics at all during the training loop, set this to 0.

whitelist : Automatic whitelisting. DSPAM will keep track of the entire "From:"
line for each message received per user, and automatically whitelist messages from
senders with more than 20 innocent messages and zero spams. Once the user reports
a spam from the sender, automatic whitelisting will automatically be deactivated
for that sender. Since DSPAM uses the entire "From:" line, and not just the
sender's email address, automatic whitelisting is a very safe approach to improving
accuracy especially during initial training.

sbph : Sparse Binary Polynomial Hashing. Bill Yerazunis' tokenizer method from
CRM114. Tokenizer method only - works with existing combination algorithms.

--class=[spam|innocent]Identifies the disposition (if any) of the message being presented.
This flag
should be used when a misclassification has occured, when the user is
corpus-feeding a message, or when an inoculation is being presented. This flag
should not be used for standard processing. This flag must be used in conjunction
with the --source flag. Omitting this flag causes DSPAM to determine the
disposition of the message on its own (the standard operating mode).

--source=[error|corpus|inoculation]Where
--class is used, the source of the classification must also be provided. The source
tells dspam how to learn the message being presented:

error : The message being presented was a message previously misclassified by
DSPAM. When 'error' is provided as a source, DSPAM requires that the DSPAM
signature be present in the message, and will use the signature to recall the
original training metadata. If the signature is not present, the message will be
rejected. In this source mode, DSPAM will also decrement each token's previous
classification's count as well as the user totals.

You should use error only when DSPAM has made an error in classifying the message,
and should present the modified version of the message with the DSPAM signature
when doing so.

corpus : The message being presented is from a mail corpus, and should be trained
as a new message, rather than re-trained based on a signature. The message's full
headers and body will be analyzed and the correct classification will be
incremented, without its opposite being decremented.

You should use corpus only when feeding messages in from corpus.

inoculation : The message being presented is in pristine form, and should be
trained as an inoculation. Inoculations are a more intense mode of training
designed to cause DSPAM to train the user's metadata repeatedly on previoulsy
unknown tokens, in an attepmt to vaccinate the user from future messages similar to
the one being presented. You should use inoculation only on honeypots and the
like.

--profile=[PROFILE]Specify a storage profile from dspam.conf. The storage profile selected
will be used for all database connectivity. See dspam.conf for more information.

--deliver=[innocent,spam]Tells
DSPAM to deliver the message if its result falls within the criteria specified. For
example, --deliver=innocent will cause DSPAM to only deliver the message if its
classification has been determined as innocent. Providing --deliver=innocent,spam
will cause DSPAM to deliver the message regardless of its classification. This flag
provides a significant amount of flexibility for nonstandard implementations.

--stdout If the message is indeed deemed "deliverable" by the
--deliver flag, this flag will cause DSPAM to deliver the message to stdout, rather
than the configured delivery agent.

--process Tells
DSPAM to process the message. This is the default behavior, and the flag is implied
unless --classify is used.

--classifyTells
DSPAM to only classify the message, and not perform any writes to the user's data
or attempt to deliver/quarantine the message. The results of a classification are
printed to stdout in the following format:

X-DSPAM-Result: User; result="Spam"; probability=1.0000; confidence=0.80

NOTE : The output of the classification is specific to a user's own data, and does
not include the output of any groups they might be affiliated with, so it is
entirely possible that the message would be caught as spam by a group the user
belongs to, and appear as innocent in the output of a classification. To get the
classification for the group , use the group name as the user instead of an
individual.

--signature=[signature]
If only the signature is available for training, and not the entire message, the
--signature flag may be used to feed the signature into DSPAM and forego the
reading of stdin. DSPAM will process the signature with whatever commandline
classification was specified. NOTE: This should only be used with --source=error

--debugIf
DSPAM was compiled with --enable-debug then using --debug will turn on debugging
messages to /tmp/dspam.debug.

--daemonIf
DSPAM was compiled with --enable-daemon then using --daemon will cause DSPAM to
enter daemon mode, where it will listen for DSPAM clients to connect and actively
service requests.

--clientIf
DSPAM was compiled with --enable-daemon then using --client will cause DSPAM to act
as a client and attempt to connect to the DSPAM server specified in the client's
configuration within dspam.conf. If client behavior is desired, this option must be
specified, otherwise the agent simply operate as self-contained and processes the
message on its own, eliminating any benefit of using the daemon.

--rcpt-toIf
DSPAM will be configured to deliver via LMTP or SMTP, this flag may be used to
define the RCPT TOs which will be used for the delivery of each user specified with
--user. If no recipients are provided, the RCPT TOs will match the username. NOTE:
The recipient list should always be balanced with the user list, or empty.
Specifying an unbalanced number of recipients to users will result in undefined
behavior.

--mail-fromIf
DSPAM will be cofigured to deliver via LMTP or SMTP, this flag will set the MAIL
FROM sent on delivery of the message. The default MAIL FROM depends on how the
message was originally relayed to DSPAM. If it was relayed via the commandline, an
empty MAIL FROM will be used. If it was relayed via LMTP, the original MAIL FROM
will be used.

EXIT VALUE

       0      Operation was successful.
       other  Operation  resulted  in  an  error.  If  the error involved an error in calling the
              delivery agent, the exit value of the delivery agent will be returned.

AUTHORS