trusty (1) dspamc.1.gz

Provided by: dspam_3.10.2+dfsg-13_amd64 bug

NAME

       dspamc - DSPAM Anti-Spam Agent (client)

SYNOPSIS

       dspamc [--mode=[teft|toe|tum|notrain|unlearn]] [--user user1 user2 ... userN]
       [--feature=[ch,no,wh,tb=N,sb]] [--class=[spam|innocent]] [--source=[error|corpus|inoculation] ]
       [--profile=[PROFILE] ] --deliver=[spam,innocent] ] [--help ] [--process ] [--classify ]
       [--signature=[signature] ] [--stdout] [--debug] [--daemon] [--client] [--rcpt-to] [--mail-from]
       [ delivery_arguments ]

DESCRIPTION

       The  DSPAM  agent  provides a direct interface to mail servers for command-line spam filtering. The agent
       can masquerade as the mail server's local delivery agent and will process any email  passed  to  it.  The
       agent  will  then  call  whatever  delivery  agent  was  specified at compile time or quarantine/tag/drop
       messages identified as spam. The DSPAM agent can function locally or as a proxy. It is  also  responsible
       for  processing  classification  errors so that DSPAM can learn from its mistakes.  This version (dspamc)
       uses a connection to a dspam server rather than re-create contexts on each execution.

OPTIONS

       --user user1 user2 ... userNSpecifies the destination users of the incoming message. In most  cases  this
       is
              the  local  user  on  the  system,  however  some  implementations may call for virtual usernames,
              specific to DSPAM, to be assigned.  The agent processes an incoming message  once  for  each  user
              specified.  If  the  message  is to be delivered, the $u (or %u) parameters of the argument string
              will be interpolated for the current user being processed.

       --mode=[toe|tum|teft|notrain]Configures the training mode to be used for this process, overriding any
              defaults in dspam.conf:

              teft : Train-Everything.  Trains on all messages processed.  This  is  a  very  thorough  training
              approach  and  should  be  considered  the  standard  training approach for most users.  TEFT may,
              however, prove too volatile on installations with extremely high per-user traffic,  or  prove  not
              very  scalable  on  systems  with  extremely  large user-bases.  In the event that TEFT is proving
              ineffective, one of the other modes is recommended.

              toe : Train-on-Error.  Trains only on a classification error, once the user's metadata has matured
              to 2500 innocent messages.  This training mode is much less resource intensive, as only occasional
              metadata writes are necessary.  It is also far less volatile than the TEFT mode of training.   One
              drawback, however, is that TOE only learns when DSPAM has made a mistake - which means the data is
              sometimes too static, and unable to "ease into" a different type of behavior.

              tum : Train-until-Mature.  This training mode is a hybrid between the other two training modes and
              provides  a  great  balance between volatility and static metadata.  TuM will train on a per-token
              basis only tokens which have had fewer than 25 "hits" on them, unless an error is being  retrained
              in  which  case all tokens are trained.  This training mode provides a solid core of stable tokens
              to keep accuracy consistent, but also allows for dynamic adaptation to  any  new  types  of  email
              behavior a user might be experiencing.

              notrain : No training.  Do not train the user's data, and do not keep totals.  This should only be
              used in cases where you want to process mail  for  a  particular  user  (based  on  a  group,  for
              example), but don't want the user to accumulate any learning data.

              unlearn : Unlearn original training. Use this if you wish to unlearn a previously learned message.
              Be sure to specify --source=error and --class to whatever the original classification the  message
              was  learned  under.  If  not  using  TrainPristine, this will require the original signature from
              training.

       --feature=[chained,noise,tb=N,whitelist]Specifies the features that should be activated for  this  filter
       instance.  The following features may be used individually or combined using a comma as a delimiter:

              chained  :  Chained  Tokens  (also  known  as  biGrams).  Chained Tokens combines adjacent tokens,
              presently with a window size of 2, to form token "chains".  Chained tokens uses additional storage
              resources, but greatly improves accuracy.  Recommended as a default feature.

              noise  :   Bayesian  Noise  Reduction  (BNR).   Bayesian Noise Reduction kicks in at 2500 innocent
              messages and provides an advanced progressive noise  logic  to  reduce  Bayesian  Noise  (wordlist
              attacks) in spams.  See http://bnr.nuclearelephant.com for more information.

              tb=N  :   Sets  the  training  loop  buffering  level.   Training  loop buffering is the amount of
              statistical sedation performed to water down statistics  and  avoid  false  positives  during  the
              user's  training  loop.   The  training buffer sets the buffer sensitivity, and should be a number
              between 0 (no buffering whatsoever) to 10 (heavy buffering).  The  default  is  5,  half  of  what
              previous  versions  of  DSPAM  used.   To avoid dulling down statistics at all during the training
              loop, set this to 0.

              whitelist :  Automatic whitelisting.  DSPAM will keep track of the entire "From:"  line  for  each
              message  received  per  user,  and automatically whitelist messages from senders with more than 20
              innocent messages and zero spams.  Once the  user  reports  a  spam  from  the  sender,  automatic
              whitelisting  will  automatically  be  deactivated  for  that sender.  Since DSPAM uses the entire
              "From:" line, and not just the sender's email address,  automatic  whitelisting  is  a  very  safe
              approach to improving accuracy especially during initial training.

              sbph  :  Sparse Binary Polynomial Hashing. Bill Yerazunis' tokenizer method from CRM114. Tokenizer
              method only - works with existing combination algorithms.

       --class=[spam|innocent]Identifies the disposition (if any) of the message being presented. This flag
              should be used when a misclassification has occured, when the user is corpus-feeding a message, or
              when an inoculation is being presented. This flag should not be used for standard processing. This
              flag must be used in conjunction with the --source  flag.  Omitting  this  flag  causes  DSPAM  to
              determine the disposition of the message on its own (the standard operating mode).

       --source=[error|corpus|inoculation]Where
              --class  is  used,  the source of the classification must also be provided. The source tells dspam
              how to learn the message being presented:

              error : The message being presented was a message previously misclassified by DSPAM.  When 'error'
              is  provided  as  a source, DSPAM requires that the DSPAM signature be present in the message, and
              will use the signature to recall the original training metadata.  If the signature is not present,
              the  message  will  be  rejected.   In  this  source  mode, DSPAM will also decrement each token's
              previous classification's count as well as the user totals.

              You should use error only when DSPAM has made an error in  classifying  the  message,  and  should
              present the modified version of the message with the DSPAM signature when doing so.

              corpus  :  The  message  being  presented  is  from  a mail corpus, and should be trained as a new
              message, rather than re-trained based on a signature.  The message's full headers and body will be
              analyzed  and  the  correct  classification  will  be  incremented,  without  its  opposite  being
              decremented.

              You should use corpus only when feeding messages in from corpus.

              inoculation : The message being presented is in  pristine  form,  and  should  be  trained  as  an
              inoculation.   Inoculations  are  a more intense mode of training designed to cause DSPAM to train
              the user's metadata repeatedly on previoulsy unknown tokens, in an attepmt to vaccinate  the  user
              from  future  messages  similar  to  the  one being presented.  You should use inoculation only on
              honeypots and the like.

       --profile=[PROFILE]Specify a storage profile from dspam.conf. The storage profile selected will  be  used
       for all database connectivity. See dspam.conf for more information.

       --deliver=[innocent,spam]Tells
              DSPAM  to  deliver  the  message  if  its result falls within the criteria specified. For example,
              --deliver=innocent will cause DSPAM to only deliver the message if  its  classification  has  been
              determined  as innocent. Providing --deliver=innocent,spam will cause DSPAM to deliver the message
              regardless of its classification. This flag provides  a  significant  amount  of  flexibility  for
              nonstandard implementations.

       --stdout If the message is indeed deemed "deliverable" by the
              --deliver  flag,  this  flag  will  cause  DSPAM to deliver the message to stdout, rather than the
              configured delivery agent.

       --process Tells
              DSPAM to process the message. This is the  default  behavior,  and  the  flag  is  implied  unless
              --classify is used.

       --classifyTells
              DSPAM  to  only  classify the message, and not perform any writes to the user's data or attempt to
              deliver/quarantine the message. The results of a classification  are  printed  to  stdout  in  the
              following format:

              X-DSPAM-Result: User; result="Spam"; probability=1.0000; confidence=0.80

              NOTE : The output of the classification is specific to a user's own data, and does not include the
              output of any groups they might be affiliated with, so it is entirely possible  that  the  message
              would  be caught as spam by a group the user belongs to, and appear as innocent in the output of a
              classification. To get the classification for the group , use the group name as the  user  instead
              of an individual.

       --signature=[signature]
              If  only the signature is available for training, and not the entire message, the --signature flag
              may be used to feed the signature into DSPAM and forego the reading of stdin. DSPAM  will  process
              the  signature  with  whatever commandline classification was specified. NOTE: This should only be
              used with --source=error

       --debugIf
              DSPAM was compiled with --enable-debug then using --debug  will  turn  on  debugging  messages  to
              /tmp/dspam.debug.

       --daemonIf
              DSPAM was compiled with --enable-daemon then using --daemon will cause DSPAM to enter daemon mode,
              where it will listen for DSPAM clients to connect and actively service requests.

       --clientIf
              DSPAM was compiled with --enable-daemon then using --client will cause DSPAM to act  as  a  client
              and  attempt  to  connect  to  the  DSPAM  server  specified  in the client's configuration within
              dspam.conf. If client behavior is desired, this option must  be  specified,  otherwise  the  agent
              simply  operate as self-contained and processes the message on its own, eliminating any benefit of
              using the daemon.

       --rcpt-toIf
              DSPAM will be configured to deliver via LMTP or SMTP, this flag may be used to define the RCPT TOs
              which  will  be  used  for  the  delivery of each user specified with --user. If no recipients are
              provided, the RCPT TOs will match the  username.   NOTE:  The  recipient  list  should  always  be
              balanced with the user list, or empty. Specifying an unbalanced number of recipients to users will
              result in undefined behavior.

       --mail-fromIf
              DSPAM will be cofigured to deliver via LMTP or SMTP, this flag will set  the  MAIL  FROM  sent  on
              delivery  of  the message. The default MAIL FROM depends on how the message was originally relayed
              to DSPAM. If it was relayed via the commandline, an empty MAIL  FROM  will  be  used.  If  it  was
              relayed via LMTP, the original MAIL FROM will be used.

EXIT VALUE

       0      Operation was successful.
       other  Operation  resulted in an error. If the error involved an error in calling the delivery agent, the
              exit value of the delivery agent will be returned.

AUTHORS

       Jonathan A. Zdziarski

       For more information, see http://dspam.nuclearelephant.com.

SEE ALSO

       dspam_stats(1), dspam_corpus(1), dspam_clean(1), dspam_dump(1), dspam_merge(1)

Jonathan A. Zdziarski <jonathan@nuclearelephant.coS>ep 29, 2004                                         DSPAMC(1)