Provided by: qsf_1.2.7-1build3_amd64 bug

NAME

       qsf - quick spam filter

SYNOPSIS

       Filtering:       qsf [-snrAtav] [-d DB] [-g DB]
                            [-L LVL] [-S SUBJ] [-H MARK] [-Q NUM]
                            [-X NUM]
       Training:        qsf -T SPAM NONSPAM [MAXROUNDS] [-d DB]
       Retraining:      qsf -[m|M] [-d DB] [-w WEIGHT] [-ayN]
       Database:        qsf -[p|D|R|O] [-d DB]
       Database merge:  qsf -E OTHERDB [-d DB]
       Allowlist query: qsf -e EMAIL [-m|-M|-t] [-d DB] [-g DB]
       Denylist query:  qsf -y -e EMAIL [-m -m|-M -M|-t] [-d DB] [-g DB]
       Help:            qsf -[h|V]

DESCRIPTION

       qsf  reads a single email on standard input, and by default outputs it on standard output.
       If the email is determined to be spam, an additional header ("X-Spam: YES") will be added,
       and optionally the subject line can have "[SPAM]" prepended to it.

       qsf is intended to be used in a procmail(1) recipe, in a ruleset such as this:

               :0 wf
               | qsf -ra

               :0 H:
               * X-Spam: YES
               $HOME/mail/spam

       For more examples, including sample procmail(1) recipes, see the EXAMPLES section below.

TRAINING

       Before  qsf  can  be used properly, it needs to be trained.  A good way to train qsf is to
       collect a copy of all your email into two folders - one for spam, and  one  for  non-spam.
       Once you have done this, you can use the training function, like this:

               qsf -aT spam-folder non-spam-folder

       This  will  generate a database that can be used by qsf to guess whether email received in
       the future is spam or not.  Note that this initial training run may take a long time,  but
       you should only need to do it once.

       To  mark  a  single  message  as spam, pipe it to qsf with the --mark-spam or -m ("mark as
       spam") option.  This will update the database accordingly and discard the email.

       To mark a single message as non-spam, pipe it to qsf with the --mark-nonspam or -M  ("mark
       as non-spam") option.  Again, this will discard the email.

       If  a  message has been mis-tagged, simply send it to qsf as the opposite type, i.e. if it
       has been mistakenly tagged as spam, pipe it into qsf --mark-nonspam --weight=2 to  add  it
       to the non-spam side of the database with double the usual weighting.

OPTIONS

       The qsf options are listed below.

       -d, --database [TYPE:]FILE
              Use  FILE as the spam/non-spam database.  The default is to use /var/lib/qsfdb and,
              if that is not available or is read-only, $HOME/.qsfdb.  This option  can  also  be
              useful  if  there  is  a  system-wide  database  but  you  do  not want to use it -
              specifying your own here will override the default.

              If you prefix the filename with a TYPE, of the form btree:$HOME/.qsfdb,  then  this
              will  specify  what kind of database FILE is, such as list, btree, gdbm, sqlite and
              so on.  Check the output of qsf -V to see which database  backends  are  available.
              The default is to auto-detect the type, or, if the file does not already exist, use
              list.  Note that TYPE is not case-sensitive.

       -g, --global [TYPE:]FILE
              Use FILE as the default global database, instead of /var/lib/qsfdb.   If  you  also
              specify  a  database with -d, then this "global" database will be used in read-only
              mode in conjunction with the read-write database specified with -d.  The -g  option
              can  be  used a second time to specify a third database, which will also be used in
              read-only mode.  Again, the filename can optionally be prefixed with a  TYPE  which
              specifies the database type.

       -P, --plain-map FILE
              Maintain a mapping of all database tokens to their non-hashed counterparts in FILE,
              one token per line.  This can be useful if you want to be able to list the contents
              of  your database at a later date, for instance to get a list of email addresses in
              your allow-list.  Note that using this option may slow qsf down, and  only  entries
              written to the database while this option is active will be stored in FILE.

       -s, --subject
              Rewrite the Subject line of any email that turns out to be spam, adding "[SPAM]" to
              the start of the line.

       -S, --subject-marker SUBJECT
              Instead of adding "[SPAM]", add SUBJECT to the Subject line of any email that turns
              out to be spam.  Implies -s.

       -H, --header-marker MARK
              Instead of setting the X-Spam header to "YES", set it to MARK if email turns out to
              be spam.  This can be useful if your email client can only search all headers for a
              string,  rather than one particular header (so searching for "YES" might match more
              than just the output of qsf).

       -n, --no-header
              Do not add an X-Spam header to messages.

       -r, --add-rating
              Insert an additional header X-Spam-Rating which is a rating of the "spamminess"  of
              a message from 0 to 100; 90 and above are counted as spam, anything under 90 is not
              considered spam.  If combined with -t, then the rating (0-100) will be  output,  on
              its own, on standard output.

       -A, --asterisk
              Insert  an  additional  header  X-Spam-Level  which  will  contain between 0 and 20
              asterisks (*), depending on the spam rating.

       -t, --test
              Instead of passing the message out on standard output, output nothing, and  exit  0
              if the message is not spam, or exit 1 if the message is spam.  If combined with -r,
              then the spam rating will be output on standard output.

       -a, --allowlist
              Enable the allow-list.  This causes the email  addresses  given  in  the  message's
              "From:"  and  "Return-Path:"  headers  to  be checked against a list; if either one
              matches, then the message is always treated as non-spam,  regardless  of  what  the
              token  database  says.  When specified with a retraining flag, -a -m (mark as spam)
              will remove that address from the allow-list as well  as  marking  the  message  as
              spam,  and -a -M (mark as non-spam) will add that address to the allow-list as well
              as marking the message as non-spam.  The idea is that you add all of  your  friends
              to the allow-list, and then none of their messages ever get marked as spam.

       -y, --denylist
              Enable  the  deny-list.   This  causes  the  email addresses given in the message's
              "From:" and "Return-Path:" headers to be checked against a second list;  if  either
              one  matches,  then  theh message is always treated as spam.  Training works in the
              same way as with -a, except that you must specify -m or  -M  twice  to  modify  the
              deny-list instead of the allow-list, and with the reverse syntax: -y -m -m (mark as
              spam) will add that address to the deny-list, whereas -y -M -M (mark  as  non-spam)
              will  remove that address from the deny-list.  This double specification is so that
              the usual retraining process never touches the deny-list; the deny-list  should  be
              carefully maintained rather than automatically generated.

              Normally you would not need to use the deny-list.

       -L, --level, --threshold LEVEL
              Change  the  spam  scoring threshold level which must be reached before an email is
              classified as spam.  The default is 90.

       -Q, --min-tokens NUM
              Only give a score if more than NUM tokens are found in the message - otherwise  the
              message  is assumed to be non-spam, and it is not modified in any way.  The default
              is 0.  This option might be useful if you find that very short messages  are  being
              frequently miscategorised.

       -e, --email, --email-only EMAIL
              Query  or  update  the allow-list entry for the email address EMAIL.  With no other
              options, this will simply output "YES" if EMAIL is in the allow-list, or "NO" if it
              is not. With -t, it will not output anything, but will exit 0 (success) if EMAIL is
              in the allow-list, or 1 (failure) if it is not. With the -m (mark-spam) option, any
              previous  allow-list  entry  for EMAIL will be removed. Finally, with the -M (mark-
              nonspam) option, EMAIL will be added to the allow-list if it is not already on it.

              If EMAIL is just the word MSG on its own, then an email will be read from  standard
              input, and the email addresses given in the "From:" and "Return-Path:" headers will
              be used.

              Using -e automatically switches on -a.

              If you also specify -y, then the deny-list will be operated on.  Remember  that  -m
              and -M are reversed with the deny-list.

              If  you  specify  an email address of the form @domain (nothing before the @), then
              the whole domain will be allow or deny listed.

       -v, --verbose
              Add extra X-QSF-Info headers to any filtered email, containing error  messages  and
              so on if applicable.  Specify -v more than once to increase verbosity.

       -T, --train SPAM NONSPAM [MAXROUNDS]
              Train  the  database  using  the two mbox folders SPAM and NONSPAM, by testing each
              message  in  each  folder  and  updating  the  database  each  time  a  message  is
              miscategorised.   This is done several times, and may take a while to run.  Specify
              the -a (allow-list) flag to add every sender in the NONSPAM folder to  your  allow-
              list as a side-effect of the training process.  If MAXROUNDS is specified, training
              will end after this number of rounds if the results are still not good enough.  The
              default is a maximum of 200 rounds.

       -m, --mark-spam
              Instead  of  passing  the message out on standard output, mark its contents as spam
              and update the database accordingly.   If  the  allow-list  (-a)  is  enabled,  the
              message's "From:" and "Return-Path:" addresses are removed from the allow-list.  If
              the deny-list (-y) is enabled and you specify -m twice, the message's addresses are
              added to the deny-list instead.

       -M, --mark-nonspam
              Instead  of  passing  the message out on standard output, mark its contents as non-
              spam and update the database accordingly.  If the allow-list (-a) is  enabled,  the
              message's "From:" and "Return-Path:" addresses are added to the allow-list (see the
              -a option above).  If the deny-list (-y) is enabled and you specify -M  twice,  the
              message's addresses are removed from the deny-list instead.

       -w, --weight WEIGHT
              When  marking  as  spam or non-spam, update the database with a weighting of WEIGHT
              per token instead of the default of 1.   Useful  when  correcting  mistakes,  eg  a
              message  that  has  been  mistakenly  detected as spam should be marked as non-spam
              using a weighting of 2, i.e. double the usual weighting, to counteract the error.

       -D, --dump [FILE]
              Dump the contents of the database as a platform-independent text file, suitable for
              archival,  transfer to another machine, and so on.  The data is output on stdout or
              into the given FILE.

       -R, --restore [FILE]
              Rebuild the database from scratch from the text file on stdin.  If a FILE is given,
              data is read from there instead of from stdin.

       -O, --tokens
              Instead  of  filtering,  output a list of the tokens found in the message read from
              standard input, along with the number of times each token was found.  This is  only
              useful if you want to use qsf as a general tokeniser for use with another filtering
              package.

       -E, --merge OTHERDB
              Merge the OTHERDB database into the current database.  This can be  useful  if  you
              want to take one user's mailbox and merge it into the system-wide one, for instance
              (this would be done by, as root, doing qsf -d /var/lib/qsfdb  -E  /home/user/.qsfdb
              and then removing /home/user/.qsfdb).

       -B, --benchmark SPAM NONSPAM [MAXROUNDS]
              Benchmark  the  training  process  using  the two mbox folders SPAM and NONSPAM.  A
              temporary database is created and trained using the first 75% of  the  messages  in
              each  folder, and then the entire contents of each folder is tested to see how many
              false positives  and  false  negatives  occur.  Some  timing  information  is  also
              displayed.

              This  can be used to decide which backend is best on your system.  Use -d to select
              a backend, eg qsf -B spam nonspam -d GDBM - this will create a  temporary  database
              which is removed afterwards.

              The  exception  to  this  is the MySQL backend, where a full database specification
              must be given (-d MySQL:database=db;host=localhost;...)   and  the  database  table
              given will not be wiped beforehand or dropped afterwards.

              As  with  -T,  if MAXROUNDS is specified, training will never be done for more than
              this number of rounds; the default is 200.

       -h, --help
              Print a usage message on standard output and exit successfully.

       -V, --version
              Print version information, including a list  of  available  database  backends,  on
              standard output and exit successfully.

DEPRECATED OPTIONS

       The  following  options  are only for use with the old binary tree database backend or old
       databases that haven't been upgraded to the new format that came in with version 1.1.0.

       -N, --no-autoprune
              When marking as spam or nonspam, never automatically prune the  database.   Usually
              the database is pruned after every 500 marks; if you would rather --prune manually,
              use -N to disable automatic pruning.

       -p, --prune
              Remove redundant entries from the database and clean  it  up  a  little.   This  is
              automatically done after several calls to --mark-spam or --mark-nonspam, and during
              training with --train if the training takes a large number of rounds, so it  should
              rarely   be   necessary  to  use  --prune  manually  unless  you  are  using  -N  /
              --no-autoprune.

       -X, --prune-max NUM
              When the database is being pruned, no more than NUM entries will be considered  for
              removal.   This  is  to  prevent  CPU  and  memory resources being taken over.  The
              default is 100,000 but in some circumstances (if you find that  pruning  takes  too
              long) this option may be used to reduce it to a more manageable number.

FILES

       /var/lib/qsfdb
              The  default  (system-wide) spam database.  If you wish to install qsf system-wide,
              this should be read-only to everyone; there should be one user  with  write  access
              who  can update the spam database with qsf --mark-spam and qsf --mark-non-spam when
              necessary.

       /var/lib/qsfdb2
              A second, read-only, system-wide database. This can be useful when  installing  qsf
              system-wide  and using third-party spam databases; the first global database can be
              updated with system-specific changes, and this second database can be  periodically
              updated when the third-party spam database is updated.

       $HOME/.qsfdb
              The  default  spam  database  for per-user data.  Users without write access to the
              system-wide database will have their data written here, and the two databases  will
              be read together.  The per-user database will be given a weighting equivalent to 10
              times the weighting of the global database.

NOTES

       Currently, you cannot use qsf to check for spam while the database is being updated.  This
       means that while an update is in progress, all email is passed through as non-spam.

       There is an upper size limit of 512Kb on incoming email; anything larger than this is just
       passed through as non-spam, to avoid tying up machine resources.

       The plaintext token mapping maintained by --plain-map will never shrink, only grow.  It is
       intended  for  use by housekeeping and user interface scripts that, for instance, the user
       can use to list all email addresses on their allow-list.  These scripts should  take  care
       of weeding out entries for tokens that are no longer in the database.  If you have no such
       scripts, there is probably no point in using --plain-map anyway.

       Avoid using the deny-list (-y) in any automated retraining, as it can be cause the  filter
       to  reject  mail  unnecessarily.   In  general  the deny-list is probably best left unused
       unless explicitly required by your particular setup.

       If both the allow-list and the deny-list are enabled, then email addresses will  first  be
       checked  against  the deny-list, then the allow-list, then the domain of the email address
       will be checked for matching "@domain" entries in the deny-list and  then  in  the  allow-
       list.

EXAMPLES

       To  filter all of your mail through qsf, with the allow-list enabled and the "spam rating"
       header being added, add this to your .procmailrc file:

               :0 wf
               | qsf -ra

       If you want qsf to add "[SPAM]" to the subject line of any messages it thinks are spam, do
       this instead:

               :0 wf
               | qsf -sra

       To  automatically  mark  any  email  sent  to  spambox@yourdomain.com as spam (this is the
       "naive" version):

               :0 H
               * ^To:.*spambox@yourdomain.com
               | qsf -am

       To do the same, but cleverly, so that only email to spambox@yourdomain.com which qsf  does
       NOT  already classify as spam gets marked as spam in the database (this stops the database
       getting too heavily weighted):

               # If sent to spambox@yourdomain.com:
               :0
               * ^To:.*spambox@yourdomain.com
               {
                  :0 wf
                  | qsf -a

                  # The above two lines can be skipped if you've
                  # already piped the message through qsf.

                  # If the qsf database says it's not spam,
                  # mark it as spam!
                  :0 H
                  * ^X-Spam: NO
                  | qsf -am
               }

       Remove the -a option in the above examples if you don't want to use the allow-list.

       A more complicated filtering example - this will only run qsf on messages which don't have
       a  subject line saying "your <something> is on fire" and which don't have a sender address
       ending in "@foobar.com", meaning that messages with  that  subject  line  OR  that  sender
       address will NEVER be marked as spam, no matter what:

               :0 wf
               * ! ^Subject: Your .* is on fire
               * ! ^From: .*@foobar.com
               | qsf -ra

       For more on procmail(1) recipes, see the procmailrc(5) and procmailex(5) manual pages.

       A couple of macros to add to your .muttrc file, if you use mutt(1) as a mail user agent:

               # Press F5 to mark a message as spam and delete it
               macro index <f5> "<pipe-message>qsf -am\n<delete-message>"
               macro pager <f5> "<pipe-message>qsf -am\n<delete-message>"

               # Press F9 to mark a message as non-spam
               macro index <f9> "<pipe-message>qsf -aM\n"
               macro pager <f9> "<pipe-message>qsf -aM\n"

       Again, remove the -a option in the above examples if you don't want to use the allow-list.

       Note,  however,  that  the  above  macros  won't  work  when  operating on multiple tagged
       messages. For that, you'd need something like this:

               macro  index  <f5>  ":set   pipe_split\n<tag-prefix><pipe-message>qsf   -am\n<tag-
              prefix><delete-message>\n:unset pipe_split\n"

       If  you  use  qmail(7),  then  to get procmail working with it you will need to put a line
       containing just DEFAULT=./Maildir/ at the top of your ~/.procmailrc file, so that procmail
       delivers to your Maildir folder instead of trying to deliver to /var/spool/mail/$USER, and
       you will need to put this in your ~/.qmail file:

               | preline procmail

       This will cause all your mail to be delivered via  procmail  instead  of  being  delivered
       directly into your mail directory.

       See the qmail(7) documentation for more about mail delivery with qmail.

       If you use postfix(1), you can set up a system-wide mail filter by creating a user account
       for the purpose of filtering mail, populating that account's .qsfdb, and then  creating  a
       shell  script,  to  run  as  that  user,  which  runs  qsf  on  stdin and passes stdout to
       sendmail(8).

       Doing this requires some knowledge of postfix configuration and care needs to be taken  to
       avoid  mail  loops.  One qsf user's full HOWTO is included in the doc/ directory with this
       package.

THE ALLOW-LIST

       A feature called the "allow-list" can be switched on by specifying the --allowlist  or  -a
       option.   This causes messages' "From:" and "Return-Path:" addresses to be checked against
       a list of people you have said to allow all messages from, and if a message's  "From:"  or
       "Return-Path:" address is in the list, it is never marked as spam.  This means you can add
       all your friends to an "allow-list" and qsf will then never mis-file their  messages  -  a
       quick  way  to  do this is to use -a with -T (train); everyone in your non-spam folder who
       has sent you an email will be added to the allow-list automatically during training.

       You can manually add and remove addresses to and from the allow-list using the -e  (email)
       option. For instance, to add foo@bar.com to the allow-list, do this:

               qsf -e foo@bar.com -M

       To remove bad@nasty.com from the allow-list, do this:

               qsf -e bad@nasty.com -m

       And to see whether someone@somewhere.com is in the allow-list or not, just do this:

               qsf -e someone@somewhere.com

       In  general,  you  probably always want to enable the allow-list, so always specify the -a
       option when using qsf.  This will automatically maintain the allow-list based on what  you
       classify as spam or non-spam.

       The  only times you might want to turn it off are when people on your allow-list are prone
       to getting viruses or if a virus is causing email to be sent to you that is pretending  to
       be from someone on your allow-list.

BACKUP AND RESTORE

       Because  the  database format is platform-specific, it is a good idea to periodically dump
       the database to a text file using qsf -D so that, if necessary, it can be  transferred  to
       another machine and restored with qsf -R later on.

       Also  note  that  since  the  actual  contents  of  email messages are never stored in the
       database (see TECHNICAL DETAILS), you can safely share your qsf database  with  friends  -
       simply dump your database to a file, like this:

               qsf -D > your-database-dump.txt

       Once you have sent your-database-dump.txt to another person, they can do this:

               qsf -R < your-database-dump.txt

       They will then have an identical database to yours.

TECHNICAL DETAILS

       When  a  message  is  passed  to  qsf,  any attachments are decoded, all HTML elements are
       removed, and the message text is then broken up into "tokens", where a "token" is a single
       word  or  URL.  Each token is hashed using the MD5 algorithm (see below for why), and that
       hash is then used to look up each token in the qsf database.

       For full details of which parts of an email (headers, body, attachments, etc) are used  to
       calculate the spam rating, see the TOKENISATION section below.

       Within  the  database,  each token has two numbers associated with it: the number of times
       that token has been seen in spam, and the number of times it has been  seen  in  non-spam.
       These  two  numbers,  along  with the total number of spam and non-spam messages seen, are
       then used to give a "spamminess" value for that particular token.  This "spamminess" value
       ranges  from  "definitely  not  spammy"  at one end of the scale, through "neutral" in the
       middle, up to "definitely spammy" at the other end.

       Once a "spamminess" value has been calculated for all of the  tokens  in  the  message,  a
       summary calculation is made to give an overall "is this spam?"  probability rating for the
       message.  If the overall probability is 0.9 or above, the message is flagged as spam.

       In addition to the probability test is the "allow-list".  If enabled (with the -a option),
       the  whole  probability  check  is  skipped  if the sender of the message is listed in the
       allow-list, and the message is not marked as spam.

       When training the database, a message is split up into tokens as described above, and then
       the  numbers  in  the  database for each token are simply added to: if you tell qsf that a
       message is spam, it adds one to the "number of times seen in spam" counter for each token,
       and if you tell it a message is not spam, it adds one to the "number of times seen in non-
       spam" counter for each token.  If you specify a weight,  with  -w,  then  the  number  you
       specify is added instead of one.

       To  stop the database growing uncontrollably, the database keeps track of when a token was
       last used.  Underused tokens are automatically removed from the database.  (The old method
       was to "prune" every 500 updates).

       Finally,  the  reason  MD5  hashes  were  used  is privacy.  If the actual tokens from the
       messages, and the actual email addresses in the allow-list, were  stored,  you  could  not
       share  a  single  qsf  database between multiple users because bits of everyone's messages
       would be in the database - things like emailed passwords, keywords  relating  to  personal
       gossip,  and  so  on.  So a hash is stored instead.  A hash is a "one-way" function; it is
       easy to turn a token into a hash but very hard (some might say impossible) to turn a  hash
       back  into  the token that created it.  This means that you end up with a database with no
       personal information in it.

TOKENISATION

       When a message is broken up into tokens, various parts  of  the  message  are  treated  in
       different ways.

       First,  all header fields are discarded, except for the important ones: From, Return-Path,
       Sender, To, Reply-To, and Subject.

       Next, any MIME-encoded attachments are decoded.  Any attachments whose  MIME  type  starts
       with "text/" (i.e. HTML and text) are tokenised, after having any HTML tags stripped.  Any
       non-textual attachments are  replaced  with  their  MD5  hash  (such  that  two  identical
       attachments will have the same hash), and that hash is then used as a token.

       In  addition  to single-word tokens from textual message parts, qsf adds doubled-up tokens
       so that word pairs get added to the database.   This  makes  the  database  a  bit  bigger
       (although the automatic pruning tends to take care of that) but makes matching more exact.

SPECIAL FILTERS

       As  well  as  using  the  textual  content  of email to detect spam, qsf also uses special
       filters which create "pseudo-tokens" based on various rules.   This  means  that  specific
       patterns, not just individual words, can be used to determine whether a message is spam or
       not.

       For example,  if  a  message  contains  lots  of  words  with  multiple  consonants,  like
       "ashjkbnxcsdjh",  then  each  time a word like that is seen the special token ".GIBBERISH-
       CONSONANTS." is added to the list of tokens found in the message.  If it  turns  out  that
       most  messages with words that trigger this filter rule are spam, then other messages with
       gibberish consonant strings will be more likely to be flagged as spam.

       Currently the special filters are:

       GTUBE  Flags any message containing the string XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-
              ANTI-UBE-TEST-EMAIL*C.34X  as  spam - useful for testing that your qsf installation
              is working.

       ATTACH-SCR

       ATTACH-PIF

       ATTACH-EXE

       ATTACH-VBS

       ATTACH-VBA

       ATTACH-LNK

       ATTACH-COM

       ATTACH-BAT
              Adds a token for every attachment whose filename ends in  ".scr",  ".pif",  ".exe",
              ".vbs", ".vba", ".lnk", ".com", and ".bat" respectively (these are often viruses).

       ATTACH-GIF

       ATTACH-JPG

       ATTACH-PNG
              Adds a token for every attachment whose filename ends in ".gif", ".jpg" or ".jpeg",
              and ".png" respectively.

       ATTACH-DOC

       ATTACH-XLS

       ATTACH-PDF
              Adds a token for every attachment whose filename ends in ".doc", ".xls", or  ".pdf"
              respectively (these tend to indicate a non-spam email).

       SINGLE-IMAGE
              Adds a token if the message contains exactly one attached image.

       MULTIPLE-IMAGES
              Adds a token if the message contains more than one attached image.

       GIBBERISH-CONSONANTS
              Adds  a  token  for  every  word  found  that  has multiple consonants in a row, as
              described above.  Spam often contains strings of gibberish.

       GIBBERISH-VOWELS
              Adds a token  for  every  word  found  that  has  multiple  vowels  in  a  row,  eg
              "aeaiaiaeeio".

       GIBBERISH-FROMCONS
              Like GIBBERISH-CONSONANTS, but only for the "From:" and "Return-Path:" addresses on
              their own.

       GIBBERISH-FROMVOWL
              Like GIBBERISH-VOWELS, but only for the "From:"  and  "Return-Path:"  addresses  on
              their own.

       GIBBERISH-BADSTART
              Adds a token for every word that starts with a bad character such as %.

       GIBBERISH-HYPHENS
              Adds a token for every word with more than three hyphens or underscores in it.

       GIBBERISH-LONGWORDS
              Adds a token for every word with over 30 characters in it (but less than 60).

       HTML-COMMENTS-IN-WORDS
              Adds  a  token  for  every  HTML comment found in the middle of a word.  Spam often
              contains HTML inside words, like this: w<!--dsgfhsdgjgh-->ord

       HTML-EXTERNAL-IMG
              Adds a token for every HTML <img> (image) tag found that  contains  ://  (i.e.   it
              refers to an external image).

       HTML-FONT
              Adds a token for every HTML <font> tag found.

       HTML-IP-IN-URLS
              Adds a token for every URL found containing an IP address.

       HTML-INT-IN-URL
              Adds a token for every URL found containing an integer in its hostname.

       HTML-URLENCODED-URL
              Adds a token for every URL found containing a % sign in its hostname.

       Normally,  filters  will just cause a token to be added, and these tokens are processed by
       the normal weighting algorithm.  However  the  GTUBE  filter  will  immediately  flag  any
       matching message as spam, bypassing the token matching.

DATABASE BACKENDS

       The inbuilt "list" database backend will not necessarily provide the best performance, but
       is provided because using it requires no external libraries.

       If, when qsf was compiled, the correct libraries were available, then it will be  possible
       to  use  qsf  with  alternative  database  backends.   To find out which backends you have
       available, run qsf -V (capital V) and read the second line of output.  To see how  well  a
       backend  performs,  collect  some spam and non-spam and use qsf -d BACKEND -B SPAM NONSPAM
       (see the entry for -B above).

       Some people find that they get the best performance out of the gdbm  backend;  this  is  a
       library that is widely available on many systems.

       To  efficiently  share  a  qsf  database  across multiple machines, you may find the MySQL
       backend useful.  However, using it is a little more complicated.

       To use the MySQL backend you will need to create a  table  with  the  fields  key1,  key2,
       token,  value1,  value2  and value3.  The token, value1, value2, and value3 fields must be
       VARCHAR(64), BIGINT or INT, and BIGINT or INT respectively,  and  indexing  on  the  token
       field is a good idea. The key1 and key2 fields can be anything, but they must be present.

       For example:

                USE mydatabase;
                CREATE TABLE qsfdb (
                  key1      BIGINT UNSIGNED NOT NULL,
                  key2      BIGINT UNSIGNED NOT NULL,
                  token     VARCHAR(64) DEFAULT '' NOT NULL,
                  value1    INT UNSIGNED NOT NULL,
                  value2    INT UNSIGNED NOT NULL,
                  value3    INT UNSIGNED NOT NULL,
                  PRIMARY KEY (key1,key2,token),
                  KEY (key1),
                  KEY (key2),
                  KEY (token)
                );

       The  key1  and  key2  fields  allow  you  to  have multiple qsf databases in one table, by
       specifying different key1 and key2 values on invocation.

       Instead of specifying a database file with the --database / -d option,  you  must  specify
       either  a specification string as described below, or the name of a file containing such a
       string on its first line.

       The specification string is as follows:

                database=DATABASE;host=HOST;port=PORT;
                user=USER;pass=PASS;table=TABLE;
                key1=KEY1;key2=KEY2

       This string must be all on one line, with no spaces.

       DATABASE
              is the name of the MySQL database.

       HOST   is the hostname of the database server (eg "localhost").

       PORT   is the TCP port to connect on (eg 3306).

       USER   is the username to connect with.

       PASS   is the password to connect with.

       TABLE  is the database table to use.  If a table with this name does not exist when qsf is
              called  in  update  or  training mode, then it will be created if permissions allow
              this to be done.

       KEY1   is the value to use for the key1 field.

       KEY2   is the value to use for the key2 field.

       Since command lines can be seen in the process list, it is  probably  best  to  specify  a
       filename (eg qsf -d mysql:qsfdb.spec) and put the specification string inside that file.

TROUBLESHOOTING

       If  you  have problems with qsf, please check the list below; if this does not help, go to
       the qsf home page and investigate the mailing lists, or email the author.

       Nothing is being marked as spam.
              First, use the -r option to switch on the X-Spam-Rating header, and check that this
              header appears in email passed through qsf.  If it does not, then it is likely that
              qsf is not being run at all -  check  your  configuration  of  procmail(1)  or  its
              equivalent.

              If  you  are  seeing  X-Spam-Rating  headers,  and  different emails have different
              scores, then you may simply need to retrain your database a little more.  Take more
              spam email and pass it to qsf -m.

              If  you  are  seeing  X-Spam-Rating headers but they all give the same spam rating,
              then the most likely reason is that qsf is not reading  any  database.   Make  sure
              that whatever is processing the email has read permissions on /var/lib/qsfdb and/or
              ~/.qsfdb - and make sure that, if  you  are  using  ~/.qsfdb,  what  your  database
              creator  thought  was ~ ($HOME) is the same as it is for whatever is processing the
              email.

       Retraining sometimes takes a very long time.
              With the obtree backend or 2-column MySQL or SQLite tables, every 500th retrain (-m
              or  -M),  the  database  is  pruned.   On some systems this may take some time, and
              during this time the database is locked (except when  using  the  MySQL  or  SQLite
              backends).   If  you constantly do a lot of retraining and want to avoid this, then
              use the -N option to  suppress  auto-pruning,  and  then  have  a  cron(8)  job  or
              something run a manual prune (qsf -p) every now and again.

       Running qsf from procmail fails with an error.
              If  you  can  run  qsf from the command line, but in your procmail log file you get
              errors  about  "qsf:  cannot  execute  binary  file",  then  contact  your   system
              administrator  for  help.  It  may be that incoming email is handled by a different
              server to the one you normally shell into, and  either  they  are  of  a  different
              architecture  or  operating  system, or the mail server is not permitted to execute
              user-owned binaries.

ACKNOWLEDGEMENTS

       The following people have contributed suggestions, comments, patches, and testing:

              Tom Parker <http://www.bits.bris.ac.uk/palfrey/>
              Dr Kelly A. Parker
              Vesselin Mladenov <http://www.antipodes.bg/>
              Glyn Faulkner
              Mark Reynolds
              Sam Roberts
              Scott Allen
              Karsten Kankowski
              M. Kolbl
              Micha Holzmann
              Jef Poskanzer <http://www.acme.com/jef/>
              Clemens Fischer <http://ino-waiting.gmxhome.de/>
              Nelson A. de Oliveira
              Michal Vitecek
              Tommy Pettersson <http://www.lysator.liu.se/~ptp/>

AUTHOR

       The author:

              Andrew Wood <andrew.wood@ivarch.com>
              http://www.ivarch.com/

       Project home page:

              http://www.ivarch.com/programs/qsf/

BUGS

       If you find any bugs, please contact the author, either by email or by using  the  contact
       form on the web site.

SEE ALSO

       procmail(1), procmailrc(5), procmailex(5)

       Someone has written a guide to using qsf with KMail that can be found at:
       http://www.softwaredesign.co.uk/Information.SpamFilters.html

LICENSE

       This is free software, distributed under the ARTISTIC 2.0 license.