xenial (1) bmf.1.gz

Provided by: bmf_0.9.4-9build1_amd64 bug

NAME

       bmf - efficient Bayesian mail filter

SYNOPSIS

       bmf [-t] [-n] [-s] [-N] [-S] [-f fmt] [-d db] [-i file] [-k n] [-m type] [-p]
           [-v] [-V] [-h]

DESCRIPTION

       bmf  is  a Bayesian mail filter. In its normal mode of operation, it takes an email message or other text
       on standard input, does a statistical check against lists of "good" and "spam" words, registers  the  new
       data,  and returns a status code indicating whether or not the message is spam. BMF is written with fast,
       zero-copy algorithms, coded directly in C, and tuned for speed. It aims to be faster, smaller,  and  more
       versatile than similar applications.

       bmf  supports both mbox and maildir mail storage formats. It will automatically process multiple messages
       within an mbox file separately.

OPTIONS

       Without command-line options, bmf processes the input, registers it  as  either  "good"  or  "spam",  and
       returns  the  appropriate  error  code.  The  wordlist directory and nonexistent wordfiles are created if
       absent.

       -t Test to see if the input is spam. The word lists are not  updated.  A  report  is  written  to  stdout
       showing the final score and the tokens with the highest deviation form a mean of 0.5.

       -n Register the input as non-spam.

       -s Register the input as spam.

       -N Register the input as non-spam and undo a prior registration as spam.

       -S Register the input as spam and undo a prior registration as non-spam.

       -f  fmt  Specify database format. Valid formats are text, db, and mysql. Text is always valid. The others
       may not be available if the corresponding option was not enabled at compile time. The default  is  db  if
       available, else text.

       -d  db  Specify  database  or  directory for loading and saving word lists. The default is ~/.bmf in text
       mode.

       -i file Use file for input instead of stdin.

       -k n Specify the number of extrema (keepers) to use in the Bayes calculation. The default is 15.

       -m fmt Specify mail storage format. Valid formats are mbox and maildir. The default is  to  automatically
       detect the mail storage format. This option is deprecated.

       -p Copy the input to the output (passthrough) and insert spam headers in the style of SpamAssassin. An X-
       Spam-Status header is always inserted with processing details. The contents of this header  always  begin
       with  either  "Yes"  or  "No".  If  the input is judged to be spam, the header "X-Spam-Flag: YES" is also
       inserted.

       -v Be more verbose. This option is not well supported yet.

       -V Display version information.

       -h Display usage information.

THEORY OF OPERATION

       bmf treats its input as a bag of tokens. Each token is checked against "good" and "bad" wordlists,  which
       maintain  counts  of  the  numbers of times it has occurred in non-spam and spam mails. These numbers are
       used to compute the probability that a mail in which the token occurs is spam.  After  probabilities  for
       all  input  tokens  have  been  computed,  a fixed number of the probabilities that deviate furthest from
       average are combined using Bayes's theorem on conditional probabilities.

       While this method sounds crude compared to the more usual pattern-matching approach, it turns out  to  be
       extremely  effective.  Paul  Graham's  paper  A  Plan  For  Spam:  http://www.paulgraham.com/spam.html is
       recommended reading.

       bmf improves on Paul's proposal by doing smarter  lexical  analysis.  In  particular,  hostnames  and  IP
       addresses  are not discarded, and certain types of MTA information are discarded (such as message ids and
       dates).

       MIME and other attachments are not decoded. Experience from watching the token streams suggests that spam
       with  enclosures  invariably  gives  itself  away  through  cues  in the headers and non-enclosure parts.
       Nonetheless, I would like to add the ability to decode quoted-printable and perhaps base64 encodings  for
       textual attachments.

INTEGRATION WITH OTHER TOOLS

       Please see the /usr/share/doc/bmf/README.gz for samples and suggestions.

RETURN VALUES

       In passthrough mode: zero for success, nonzero for failure.

       In non-passthrough mode: 0 for spam; 1 for non-spam; 2 for I/O or other errors.

FILES

       ~/.bmf/goodlist.txt
              List of good tokens for text mode.

       ~/.bmf/spamlist.txt
              List of bad tokens for text mode.

       ~/.bmf/goodlist.db
              List of good tokens for libdb mode.

       ~/.bmf/spamlist.db
              List of bad tokens for libdb mode.

BUGS

       Only  one  copy  of bmf(1) instance can access the database (see options -d and -f). In Procmail recipes,
       ensure sequential access with a lock file:

               :0 fw: bmf.lock
               | bmf -p

       The lexer does not recognize multiline headers.

       The lexer does not recognize MIME attachments.

       Content-Transfer-Encoding is not decoded.

AUTHOR

       Tom Marshall <tommy@tig-grr.com>.

       The Bayes algorithm is from bogofilter by Eric S. Raymond <esr@thyrsus.com>. bogofilter can be  found  at
       the bogofilter project page: http://bogofilter.sourceforge.net/.

                                                                                                          BMF(1)