Provided by: sa-learn-cyrus_0.3.5-1.1_all bug

NAME

       sa-learn-cyrus - Train Spamassassin with spam/ham from user's imap mailboxes

USAGE

       sa-learn-cyrus [ options ] user-name(s)

         user-name(s)              One ore more user/mailbox name(s).

         options:
           --help                  Prints a brief help message and exits.
           -h

           --man                   Prints the manual page and exits.

           --verbose level         Be verbose if level > 0
           -v level

           --config file           Use a configuration file other than the default
           -c file                 one.

           --sa-debug              Run sa-learn in debug mode.
           -d

           --simulate              Run in simulation mode (show commands only).
           -s

           --imap-domains domains  Search mailboxes in list of domains.
           -D domains

DESCRIPTION

       sa-learn-cyrus feeds spam and non-spam (ham) messages to Spamassassin's database. Its main purpose is to
       train SA's bayes database with spam/ham messages sorted by the mailbox owners into special subfolders.

       It is intended to be used on smal mail systems (e.g. home office) with a single server-wide SA
       configuration.

       Launching sa-learn-cyrus at regular intervalls (cron job) may improve SA's hit rate considerably,
       provided that the users are well instructed what to move to their ham/spam folders and what not.

FUNCTION

       sa-learn-cyrus scans local mail spools as used by Cyrus IMAPd for special subfolders. These subfolders
       are supposed to contain mails which have been classfied as spam or ham by the mailbox owners.

       Example: The users move spam mails which have not been tagged as spam by SpamAssassin (false positives)
       to a subfolder INBOX.Learn.Spam. Other mails, which may be classified by SA as spam in the future because
       of certain characteristics are copied to a subfolder INBOX.Learn.Ham.

       sa-learn-cyrus feeds the content of these spam/ham folders to SA's Bayes database using the sa-learn tool
       which is shipped with the Spamassassin package.

       Afterwards these mails are deleted (optionally) by means of ipurge which is a helper tool coming along
       with the Cyrus IMAPd package.

ARGUMENTS

       sa-learn-cyrus optionally takes a list of mailbox/user names as agruments:

         sa-learn-cyrus fred wilma fritz hjb

       If not supplied all mailboxes found will be handled.

OPTIONS

       All options supplied on the comand line will override corresponding parameters given in the configuration
       file.

       Please note that the basic parameters of sa-learn-cyrus have to be defined in a configuration file. sa-
       learn-cyrus cannot be controlled solely by means of command.

       --config file, -c file
           Use configuration file other then the default one.  Always adopt the configuartion file to your needs
           before  using  sa-learn-cyrus  on a live system. Otherwise you may loose data or corrupt your SA data
           base!

       --verbose level, -v level
           Specify level of verbosity. (Default = 0)

       --sa-debug, -d
           Run sa-learn in debug mode. This may be useful to examine problems with sa-learn.

       --simulate, -s
           Run sa-learn-cyrus in simulation mode. This is useful for first tests after initial configuration  or
           if problem are encountered. In simulation mode sa-learn-cyrus doesn't execute any system commands nor
           does it touch any data. It just displays what it would do.

       --imap-domains list-of-domains, -D list-of-domains
           If your Cyrus installation uses the "domain support" you may use this option to tell what domains you
           want to be searched.

             --domains example.com,another.org

           is equivalent to

             [imap]
             ...
             domains = example.com another.org
             ...

           in the configuration file.

CONFIGURATION

       By default sa-learn-cyrus expects its configuration file as /etc/sapmasassin/sa-learn-cyrus.conf.

       One  has to change this setting in the code, if another default file is wanted.  Another than the default
       file can always be choosen with the "--config option".

       A sample configuration file is shipped with sa-learn-cyrus.

   Format
       The configuration file has a format as knwon from rsync or samba is very similar to the format of Windows
       ini files. The file consist of sequence sections. The begin of each section is designated with a  section
       name,  a  word  in square brackets, e.g. "[global]". The section entries consist of parameters, which are
       key/value pairs each on a single line. Key an value are separated by an equal sign like

         key = value

       The value is a single word or a list of words each of them representing a number or a string.  Words  may
       be  surrounded  ba any number of spaces for better readability. Empty lines and lines with a leading hash
       character "#" are ingored.

   Section [global]
       The [global] section contains all global controll parameters.

       tmp_dir = temporary-directory
           sa-learn-cyrus creates some temporary files during each run. This is the directory where  thes  files
           are created.

       lock_file = full-path-to-lock-file
           To  avoid  race  conditions, sa-learn-cyrus uses a simple file locking mechanism.  Each new sa-learn-
           cyrus process looks for this file before it realy does anything. If this  file  exists,  the  process
           exits with a warning, assuming that another sa-learn-cyrus process is running.

       verbose = level
           The level of verbosity. Values range from 0 (low) to 3 (high). A reasonable level to start with is 1.

       simulate = yes|no
           sa-learn-cyrus  should  be run in simulation mode ("simulate = yes") after the first customization of
           the configuration to avoid loss of data or corruption of SA's database in case of wrongly  configured
           parameters.

       log_with_tag = yes|no
           Prepend  the  ouput  (log) with a tag (date, time, pid). Set to "no" to avoid additional tagging when
           piped to syslog. Default is "yes".

   Section [mailbox]
       Section [mailbox] contains all parameters to select the mailboxes, to specify the special subfolders, and
       to define the actions to apply.

       include_list = list-of-mailboxes
           Only spam/ham mails of these mailboxes are fed to Spamassassin's database. If this  List  ist  empty,
           all mailboxes will be used. "include_list" may be used instead of the list on the command line.

           Example:

             include_list = fred wilma fritz hjb

       include_regexp = regular-expression
           If  include_list  is empty, a regular expression given here is applied to all mailbox names to select
           mailboxes. This parameter is ignored if include_list is not empty.

           Example: Include all mailboxes beginning with 'knf-'.


             include_regexp = ^knf-
       exclude_list = list-of-mailboxes
           A list of mailboxes wich will be excluded. If include_list is not empty, this parameter is ignored.

       exclude_regexp = regular-expression
           Mailbox names which match with this regular expresson are excluded from processing.

           Example: Ignore all mailboxes ending with '.beie'

             exclude_regexp = \.beie$

       spam_folder = folder-name
           The name of the special subfolder in each mailbox which contains spam. The name should be a  complete
           folder  path  relative  to  the  root  folder  INBOX. The Cyrus nomenclature is applied (same as with
           cyradm).

           Example:

             spam_folder = Learn.Spam

           This is a subfolder in a folder tree like this:

               INBOX
               +--Drafts
               +--Templates
               +--Sent
               +--Learn
               |  +--Ham
               |  +--Spam  <-- spam subfolder
               |

       ham_folder = folder-name
           The name of the special subfolder in each mailbox which contains ham.  (Same naming  scheme  as  with
           "spam_folder", see above.)

       remove_spam = yes|no
           Are  the  spam  messages  in the "spam_folder" to be removed after feeding them to the SA database or
           not?

       remove_ham = yes|no
           Are the ham messages in the "ham_folder" to be removed after feeding them to the SA database or not?

   Section [sa]
       Spamassassin (SA) configuration items.

       site_config_path = path
           Path to system-wide SA preferences.

           Example:

             site_config_path = /etc/spamassassin

       bayes_storage = berkely|sql
           Bayes storage mechanism (berkely|sql)

           berkely: Berkely DB (default)

           sql: SQL Database

       prefs_file = file
           Path of the system-wide SA configuartin file.

           Example:

             prefs_file = /etc/spamassassin/local.cf

       learn_cmd = path
           Path to the sa-learn utility.

           Example:

             learn_cmd = /usr/bin/sa-learn

       fix_db_permissions = yes|no
           Should permissions of DB files be fixed? Ignored unless "bayes_storage = berkely"

       user = user-id
           The user id SA runs with. Required if "fix_db_permissions = yes".

           Example:

             user = mail

       group = group-id
           The group id SA runs with. Required if "fix_db_permissions = yes".

           Example:

             group = mail

       sync_once = yes|no
           Skip synchronization after every change of database, but sync  once  after  all  messages  have  been
           learned. May speed up learning from many folders.  Default is "yes".

       virtual_config_dir = pattern
           Use this if you use the "--virtual-config-dir" option of "spamd" (it needs to match exactly). See the
           "spamd" man page for more information.

       debug = yes|no
           Run sa-learn in debug mode or not. "debug = yes" may be useful to examine problems.

   Section [imap]
       The  section  [imap] contains the necessary configuration parameter to locate an manage the (Cyrus) IMAPd
       spool files.

       base_dir = dir
           The root of the base directory of the IMAP spool (below that the mailboxes are located).

       initial_letter = yes|no
           If base_dir is divided in subdirectories  named  with  the  initial  letters  of  mailbox  names  set
           "initial_letter = yes" (default), otherwise choose no.

           Examples for joe's mailbox:

             <base_dir>/j/user/joe/ : initial_letter = yes
             <base_dir>/user/joe/ : initial_letter = no

       domains = list-of-domains
           If  your  Cyrus  spool  uses domain hierarchy supply a list of domains. If domain support is not used
           leave this entry empty. The "initial_letter" option (see above) is applied to domains, too.

           Example for mailboxes fritz@bar.org and joe@foo.com :

           The mail files within the Cyrus spool are located at

             <base_dir>/domain/b/bar.org/f/fritz
             <base_dir>/domain/f/foo.com/j/joe

           List the domains as

             domains = foo.com bar.org

       unixhierarchysep = yes|no
           Choose   "unixhierarchysep   =   yes"   if   Cyrus   is   configured   to   accept   usernames   like
           'hans.mueller.somedomain.tld'. Otherwise set "unixhierarchysep = no".

       purge_cmd = path-to-command
           The path to the Cyrus ipurge utility for purging mail messages.

           Example:

             purge_cmd = /usr/sbin/ipurge

       user = user
           The user Cyrus-IMAPd runs as.

           Example:

             user = cyrus

FILES

       /etc/spamassassin/sa-learn-cyrus.conf

SEE ALSO

       "sa-learn(1)", spamassassin(1), Mail::SpamAssassin(3), Mail::SpamAssassin::Conf(3), imapd(8), spamd(8)

       The        current        version        of        this        script        is       available        at
       http://www.pollux.franken.de/mail-server-tools/sa-learn-cyrus/ <http://www.pollux.franken.de/mail-server-
       tools/sa-learn-cyrus/>

PREREQUISITES

       sa-learn (part of the SpamAssassin package), ipurge (part of Cyrus IMAPd)

AUTHOR

       Hans-Juergen Beie <hjb@pollux.franken.de>

COPYRIGHT AND LICENSE

       Copyright 2004-2011 by Hans-Juergen Beie.

       This program is free software; you can redistribute it and/or modify it under the terms of  the  Artistic
       License               2.0              (http://foundation.perl.org/legal/licenses/artistic-2_0-plain.html
       <http://foundation.perl.org/legal/licenses/artistic-2_0-plain.html>) or the GNU General Public License as
       published   by   the    Free    Software    Foundation;    either    version    2    of    the    license
       (http://www.gnu.org/licenses/old-licenses/gpl-2.0.html                  <http://www.gnu.org/licenses/old-
       licenses/gpl-2.0.html>), or (at your option) any later version.

DISCLAIMER

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;  without  even
       the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ACKNOWLEDGMENTS

       Thanks  to  Robert  Carnecky and Jan Hauke Rahm for testing and suggestions for the implementation of the
       domain support. David Caldwell contributed the the virtual_config_dir feature.  Some  other  contributers
       are listed in the CHANGELOG. Many thanks to them for their help and suggestions.

perl v5.14.2                                       2011-11-10                                  SA-LEARN-CYRUS(8)