Provided by: mon_0.99.2-9ubuntu1_i386 bug

NAME

       mon  - monitor services for availability, sending alarms upon failures.

SYNOPSIS

       mon [-dfhlMSv] [-a dir] [-A authfile] [-b dir] [-B dir] [-c config] [-D
       dir]  [-i  secs]  [-k  num] [-l dir] [-m num] [-p num] [-P pidfile] [-r
       delay] [-s dir]

DESCRIPTION

       mon is a general-purpose scheduler for monitoring service  availability
       and  triggering alerts upon detecting failures.  mon was designed to be
       open in the sense that it supports arbitrary monitoring facilities  and
       alert  methods  via  a  common  interface, which are easily implemented
       through programs (in C, Perl, shell, etc.), SNMP traps, and special Mon
       (UDP packet) traps.

OPTIONS

       -a dir Path        to       alert       scripts.       Default       is
              /usr/local/lib/mon/alert.d:alert.d.  Multiple alert paths may be
              specified  by  separating them with a colon.  Non-absolute paths
              are taken to be relative to the base directory (/usr/lib/mon  by
              default).

       -b dir Base  directory  for  mon. scriptdir, alertdir, and statedir are
              all relative to this directory unless specified from /.  Default
              is /usr/lib/mon.

       -B dir Configuration  file base directory. All config files are located
              here, including mon.cf, monusers.cf, and auth.cf.

       -A authfile
              Authentication  configuration   file.   By   default   this   is
              /etc/mon/auth.cf   if   the   /etc/mon   directory   exists,  or
              /usr/lib/mon/auth.cf otherwise.

       -c file
              Read configuration from file.  This defaults to  /etc/mon/mon.cf
              if the /etc/mon directory exists, otherwise to /etc/mon.cf.

       -d     Enable debugging mode.

       -D dir Path   to   state   directory.    Default   is   the   first  of
              /var/state/mon,  /var/lib/mon,  and  /usr/lib/mon/state.d  which
              exists.

       -f     Fork  and  run as a daemon process. This is the preferred way to
              run mon.

       -h     Print help information.

       -i secs
              Sleep interval, in seconds. Defaults to 1. This  shouldn’t  need
              to be adjusted for any reason.

       -k num Set log history to a maximum of num entries. Defaults to 100.

       -l     Load  state  from  the last saved state file. Currently the only
              supported saved state is disabled watches, services, and  hosts.

       -L dir Sets  the  log  dir.  See also logdir in the configuration file.
              The default is /var/log/mon if that directory exists,  otherwise
              log.d in the base directory.

       -M     Pre-process  the  configuration  file  with  the macro expansion
              package m4.

       -m num Set the throttle for the maximum number of processes to num.

       -p num Make server listen on port num.  This defaults to 2583.

       -S     Start with the scheduler stopped.

       -P pidfile
              Store the server’s pid in pidfile, the default is the  first  of
              /var/run/mon/mon.pid,  /var/run/mon.pid,  and /etc/mon.pid whose
              directory exists.  An empty value tells mon not  to  use  a  pid
              file.

       -r delay
              Sets  the  number of seconds used to randomize the startup delay
              before each service is scheduled. Refer to the global  randstart
              variable in the configuration file.

       -s dir Path       to       monitor       scripts.       Default      is
              /usr/local/lib/mon/mon.d:mon.d.  Multiple  alert  paths  may  be
              specified  by  separating them with a colon.  Non-absolute paths
              are taken to be relative to the base directory (/usr/lib/mon  by
              default).

       -v     Print version information.

DEFINITIONS

       monitor
              A  program  which  tests for a certain condition, returns either
              true or false, and optionally produces output to be passed  back
              to  the scheduler.  Common monitors detect host reachability via
              ICMP echo messages, or connection to TCP services.

       period A period in time as interpreted by the Time::Period module.

       alert  A program which sends a message when invoked by  the  scheduler.
              The scheduler calls upon an alert when it detects a failure from
              a monitor.  An alert  program  accepts  a  set  of  command-line
              arguments  from  the scheduler, in addition to data via standard
              input.

       hostgroup
              A single host or  list  of  hosts,  specified  as  names  or  IP
              addresses.

       service
              A  collection  of  parameters  used  to  deal  with monitoring a
              particular resource which is provided by a group.  Services  are
              usually  modeled  after things such as an SMTP server, ICMP echo
              capability, server disk space availability, or SNMP events.

       watch  A collection of services which apply to a particular group.

OPERATION

       When the mon  scheduler  starts,  it  reads  a  configuration  file  to
       determine  the  services  it  needs  to monitor. The configuration file
       defaults to /etc/mon.cf, and can be specified using the  -c  parameter.
       If  the  -M  option  is  specified, then the configuration file is pre-
       processed with m4.  If the configuration file ends with .m4,  the  file
       is also processed by m4 automatically.

       The  scheduler  enters a loop which handles client connections, monitor
       invocations, and failure alerts. Each service has a timer, specified in
       the  configuration  file  as  the  interval  variable,  which tells the
       scheduler how frequently to invoke a monitor  process.   The  scheduler
       may  be  temporarily  stopped. While it is stopped, client access still
       functions, but it just doesn’t  schedule  things.  This  is  useful  in
       conjunction  while  resetting the server, because you can do this: save
       the hosts and services which are disabled, reset the  server  with  the
       scheduler stopped, re-disabled those hosts and services, then start the
       scheduler. It also allows making atomic changes across  several  client
       connections.  See the moncmd man page for more information.

MONITOR PROGRAMS

       Monitor  processes  are  invoked  with  the  arguments specified in the
       configuration file, appended by the  hosts  from  the  applicable  host
       group.  For example, if the watch group is "servers", which contain the
       hostnames "smtp", "nntp", and "ns",  and  the  monitor  line  reads  as
       follows,

       monitor fping.monitor -t 4000 -r 2
       then  the  executable  "fping.monitor"  will  be  executed  with  these
       parameters:

       MONITOR_DIR/fping.monitor -t 4000 -r 2 smtp nntp ns

       MONITOR_DIR    is    actually    a    search    path,    by     default
       /usr/local/lib/mon/mon.d   then   /usr/lib/mon/mon.d,  but  it  can  be
       overridden by the -s option or in the configuration file.  If all hosts
       in  the  hostgroup have been disabled, then a warning is sent to syslog
       and the monitor is not run. This behavior may be  overridden  with  the
       "allow_empty_group"  option  in  the  service definition.  If the final
       argument to the  "monitor"  line  is  ";;"  (it  must  be  preceded  by
       whitespace),  then  the host list will not be appended to the parameter
       list.

       In addition to environment variables defined by the user in the service
       definition, mon passes certain variables to monitor process.

       MON_LAST_SUMMARY
              The  first  line  of  the  output from the last time the monitor
              exited.

       MON_LAST_OUTPUT
              The entire output of the monitor from the last time it exited.

       MON_LAST_FAILURE
              The time(2) of the last failure for this service.

       MON_FIRST_FAILURE
              The time(2) of the first time this service failed.

       MON_LAST_SUCCESS
              The time(2) of the last time this service passed.

       MON_DESCRIPTION
              The description of this service, as defined in the configuration
              file using the description tag.

       MON_DEPEND_STATUS
              The depend status, "o" if dependency failure, "1" otherwise.

       MON_LOGDIR
              The  directory  log  files should be placed, as indicated by the
              logdir global configuration variable.

       MON_STATEDIR
              The directory where state files should be kept, as indicated  by
              the statedir global configuration variable.

       "fping.monitor"  should  return  an  exit  status  of 0 if it completed
       successfully (found no problems), or nonzero if a problem was detected.
       The first line of output from the monitor script has a special meaning:
       it is used as a brief summary of the exact failure which was  detected,
       and is passed to the alert program. All remaining output is also passed
       to the alert program, but it has no required interpretation.

       If a monitor for a particular service is still running,  and  the  time
       comes  for  mon  to  run  another monitor for that service, it will not
       start another monitor. For example, if the interval  is  10s,  and  the
       monitor  does  not finish running within 10 seconds, then mon will wait
       until the first monitor exits before running another one.

ALERT DECISION LOGIC

       Upon a non-zero or zero exit status, the associated  alert  or  upalert
       program (respectively) is started, pending the following conditions: If
       an alert for a specific service is disabled, do not send an alert.   If
       dep_behavior  is  set  to a, and a parent dependency is failing, then
       suppress the alert.  If the alert has previously been acknowledged,  do
       not send the alert, unless it is an upalert.  If an alert is not within
       the specified period, record the failure via syslog(3) and do not  send
       an alert.  If the failure does not fall within a defined period, do not
       send an alert.  No upalerts are sent without corresponding down alerts,
       unless  no_comp_alerts  is  defined in the period section.  If an alert
       was already sent within the  last  alertevery  interval,  do  not  send
       another  alert,  unless  the  summary  output  from the current monitor
       program differs from the last monitor process. Otherwise, send an alert
       using  each  alert  program  listed for that period. The observe_detail
       argument to alertevery affects this behavior by observing  the  changes
       in the detail part of the output in addition to the summary line.  If a
       monitor has successive failures and the summary output changes in  each
       of them, alertevery will not suppress multiple consecutive alerts.  The
       reasoning is that if the summary output  changes,  then  a  significant
       event occurred and the user should be alerted.

ALERT PROGRAMS

       Alert programs are found in the path supplied with the -a parameter, or
       in the /usr/local/lib/mon/alert.d and /usr/lib/mon/alert.d  directories
       if  not  specified.   They  are invoked with the following command-line
       parameters:

       -s service
              Service tag from the configuration file.

       -g group
              Host group name from the configuration file.

       -h hosts
              The expanded version of the host  group,  space  delimited,  but
              contained in one shell "word".

       -l alertevery
              The number of seconds until the next alarm will be sent.

       -O     This  option   is   supplied   to  an alert only if the alert is
              being generated as a result of an expected traap timing out

       -t time
              The time (in time(2) format) of when this failure condition  was
              detected.

       -T     This  option  is  supplied  to  an  alert  only if the alert was
              triggered by a trap

       -u     This option is supplied to an alert only if it is  being  called
              as an upalert.

       The  remaining  arguments  are supplied from the trailing parameters in
       the configuration file, after the "alert" service parameter.

       As with monitor programs, alert programs are invoked  with  environment
       variables defined by the user in the service definition, in addition to
       the following which are explicitly set by the server:

       MON_LAST_SUMMARY
              The first line of the output from  the  last  time  the  monitor
              exited.

       MON_LAST_OUTPUT
              The entire output of the monitor from the last time it exited.

       MON_LAST_FAILURE
              The time(2) of the last failure for this service.

       MON_FIRST_FAILURE
              The time(2) of the first time this service failed.

       MON_LAST_SUCCESS
              The time(2) of the last time this service passed.

       MON_DESCRIPTION
              The description of this service, as defined in the configuration
              file using the description tag.

       MON_GROUP
              The watch group which triggered this alarm

       MON_SERVICE
              The service heading which generated this alert

       MON_RETVAL
              The exit value of the failed monitor program, or return value as
              accepted from a trap.

       MON_OPSTATUS
              The operational status of the service.

       MON_ALERTTYPE
              Has  one  of  the  following values: "failure", "up", "startup",
              "trap", or "traptimeout", and signifies the type of alert  which
              was triggered.

       MON_TRAP_INTENDED
              This is only set when an unknown mon trap is received and caught
              by  the  default/defaut  watch/service.  This   contains   colon
              separated entries of the trap’s intended watch group and service
              name.

       MON_LOGDIR
              The directory log files should be placed, as  indicated  by  the
              logdir global configuration variable.

       MON_STATEDIR
              The  directory where state files should be kept, as indicated by
              the statedir global configuration variable.

       The first line from standard input must be used as a brief  summary  of
       the problem, normally supplied as the subject line of an email, or text
       sent to an alphanumeric pager. Interpretation of all  subsequent  lines
       read  from  stdin  is  left  up  to  the  alerting  program.  The usual
       parameters are a list of recipients to  deliver  the  notification  to.
       The interpretation of the recipients is not specified, and is up to the
       alert program.

CONFIGURATION FILE

       The configuration file consists of zero or more hostgroup  definitions,
       and  one  or more watch definitions. Each watch definition may have one
       or more service definitions. A line  beginning  with  optional  leading
       whitespace  and a pound ("#") is regarded as a comment, and is ignored.

       Lines are parsed as they are read.  Long  lines  may  be  continued  by
       ending  them  with a backslash ("\").  If a line is continued, then the
       backslash, the trailing whitespace after the backslash, and the leading
       whitespace  of  the  following  line  are  removed.  The  end result is
       assembled into a single line.

   Global Variables
       The following variables may be set to  override  compiled-in  defaults.
       Command-line   options   will  have  a  higher  precedence  than  these
       definitions.

       alertdir = dir
              dir is the full path to the alert scripts. This is the value set
              by the -a command-line parameter.

              Multiple  alert paths may be specified by separating them with a
              colon.  Non-absolute paths are taken to be relative to the  base
              directory (/usr/lib/mon by default).

              When  the configuration file is read, all alerts referenced from
              the configuration will be looked up in each of these paths,  and
              the full path to the first instance of the alert found is stored
              in a hash. This hash is only generated upon startup or  after  a
              "reset"  command,  so  newly  added  alert  scripts  will not be
              recognized until a "reset" is performed.

       mondir = dir
              dir is the full path to the monitor scripts. This value may also
              be set by the -s command-line parameter.

              Multiple  monitor paths may be specified by separating them with
              a colon.  Non-absolute paths are taken to  be  relative  to  the
              base directory (/usr/lib/mon by default).

              When  the  configuration  file  is read, all monitors referenced
              from the configuration will be looked up in each of these paths,
              and  the full path to the first instance of the monitor found is
              stored in a hash. This hash is only generated  upon  startup  or
              after a "reset" command, so newly added monitor scripts will not
              be recognized until a "reset" is performed.

       statedir = dir
              dir is the full path to the  state  directory.   mon  uses  this
              directory to save various state information.

       logdir = dir
              dir  is  the  full  path  to  the  log directory.  mon uses this
              directory to save various logs, including the downtime log.

       basedir = dir
              dir is the full path for the state, script, and alert directory.

       cfbasedir = dir
              dir  is  the  full  path where all the config files can be found
              (monusers.cf, auth.cf, etc.).

       authfile = file
              file is the full path to the authentication file.

       authtype = type [type...]
              type is the type of authentication  to  use.  A  space-separated
              list  of  types  may  be specified, and they will be checked the
              order they are listed. As soon as a successful authentication is
              performed,  the  user is considered authenticated by mon for the
              duration of the session and no more  authentication  checks  are
              performed.

              If  type  is  getpwnam,  then  the  standard  Unix  passwd  file
              authentication method will be used  (calls  getpwnam(3)  on  the
              user  and  compares  the crypt(3)ed version of the password with
              what it gets from  getpwnam).  This  will  not  work  if  shadow
              passwords are enabled on the system.

              If  type  is  userfile,  then usernames and hashed passwords are
              read  from  userfile,  which  is  defined   via   the   userfile
              configuration variable.

              If type is pam, then PAM (pluggable authentication modules) will
              be used  for  authentication.   The  service  specified  by  the
              pamservice  global  will be used. If no global is given, the PAM
              passwd service will be used.

       userfile = file
              This file is used when authtype is set to userfile.  It consists
              of  a  sequence  of  lines  of the format ’username : password’.
              password is stored as the hash returned  by  the  standard  Unix
              crypt(3)  function.  NOTE: the format of this file is compatible
              with the Apache file based username/password file format. It  is
              possible  to  use  the  htpasswd program supplied with Apache to
              manage the mon userfile.

              Blank lines and lines beginning with # are ignored.

       pamservice = service
              The PAM service used for authentication. This is applicable only
              if "pam" is specified as a parameter to the authtype setting. If
              this global is not defined, it defaults to passwd.

       snmpport = portnum
              Set the SNMP port that the server binds to.

       serverbind = addr

       trapbind = addr

              serverbind and trapbind specify which address to bind the server
              and  trap ports to, respectively.  If these are not defined, the
              default address is INADDR_ANY, which allows connections  on  all
              interfaces.  For  security  reasons,  it could be a good idea to
              bind only to the loopback interface.

       snmp ={yes|no}
              Turn on/off SNMP support (currently unimplemented).

       dtlogfile = file
              file is a file which will be used to record  the  downtime  log.
              Whenever  a  service fails for some amount of time and then stop
              failing, this event is written to the log. If this parameter  is
              not  set,  no  logging  is  done.  The  format of the file is as
              follows (# is a comment and may be ignored):

              timenoticed group service firstfail downtime interval summary.

              timenoticed is the time(2) the service came back up.

              group service is the group and service which failed.

              firstfail is the time(2) when the service began to fail.

              downtime is the number of seconds the service failed.

              interval is the frequency  (in  seconds)  that  the  service  is
              polled.

              summary is the summary line from when the service was failing.

       dtlogging = yes/no

              Turns downtime logging on or off. The default is off.

       histlength = num
              num  is  the  the  maximum  number  of  events to be retained in
              history list. The default is 100.  This value may also be set by
              the -k command-line parameter.

       historicfile = file
              If  this  variable  is  set, then alerts are logged to file, and
              upon startup, some (or all) of the past  history  is  read  into
              memory.

       historictime = timeval
              num  is  the  amount  of  the history file to read upon startup.
              "Now" - timeval is read. See the explanation of interval in  the
              "Service Definitions" section for a description of timeval.

       serverport = port
              port is the TCP port number that the server should bind to. This
              value may also be set by the -p command-line parameter. Normally
              this  port is looked up via getservbyname(3), and it defaults to
              2583.

       trapport = port
              port is the UDP port number that the trap server should bind to.
              Normally  this  port  is  looked up via getservbyname(3), and it
              defaults to 2583.

       pidfile = path
              path is the file the sever will store its pid  in.   This  value
              may also be set by the -P command-line parameter.

       maxprocs = num
              Throttles  the  number  of concurrently forked processes to num.
              The intent is to provide a safety net for the unlikely situation
              when  the  server tries to take on too many tasks at once.  Note
              that this situation has only been reported to happen when trying
              to  use  a  garbled  configuration file! You don’t want to use a
              garbled configuration file now, do you?

       cltimeout = secs
              Sets the client inactivity timeout to secs.  This  is  meant  to
              help  thwart  denial  of service attacks or recover from crashed
              clients.  secs is interpreted as a "1h/1m/1s" string, where "1m"
              = 60 seconds.

       randstart = interval
              When  the  server  starts,  normally  all  services  will not be
              scheduled until the interval defined in the  respective  service
              section.  This can cause long delays before the first check of a
              service, and possibly a high load  on  the  server  if  multiple
              things are scheduled at the same intervals.  This option is used
              to randomize the scheduling of the first test for  all  services
              during  the  startup  period,  and  immediately  after the reset
              command.  If randstart is defined, the scheduled run time of all
              services  of  all  watch  groups will be a random number between
              zero and randstart seconds.

       dep_recur_limit = depth
              Limit  dependency  recursion  level  to  depth.   If  dependency
              recursion  (dependencies  which  depend  on  other dependencies)
              tries to go beyond depth, then the recursion is  aborted  and  a
              messages is logged to syslog.  The default limit is 10.

       dep_behavior = {a|m}
              dep_behavior   controls   whether   the   dependency  expression
              suppresses either the running of alerts or monitors when a  node
              in  the  dependency graph fails. Read more about the behavior in
              the "Service Definitions" section below.

              This is a global setting which controls the default settings for
              the service-specified variable.

       syslog_facility = facility
              Specifies  the  syslog facility used for logging.  daemon is the
              default.

       startupalerts_on_reset = {yes|no}

              If set to "yes", startupalerts will be invoked  when  the  reset
              client command is executed. The default is "no".

   Hostgroup Entries
       Hostgroup entries begin with the keyword hostgroup, and are followed by
       a hostgroup tag and one or more hostnames or IP addresses, separated by
       whitespace.   The  hostgroup  tag  must  be  composed  of  alphanumeric
       characters, a dash ("-"), a period ("."), or an underscore ("_").  Non-
       blank  lines following the first hostgroup line are interpreted as more
       hostnames.  The hostgroup  definition  ends  with  a  blank  line.  For
       example:

              hostgroup servers nameserver smtpserver nntpserver
                   nfsserver httpserver smbserver

              hostgroup router_group cisco7000 agsplus

   Watch Group Entries
       Watch  entries  begin  with  a line that starts with the keyword watch,
       followed by whitespace and a single word which  normally  refers  to  a
       pre-defined  hostgroup.  If  the  second  word  is  not recognized as a
       hostgroup tag, a new hostgroup is created whose tag is that  word,  and
       that word is its only member.

       Watch entries consist of one or more service definitions.

       There  is  a  special  watch group entry called "default". If a default
       watch group is defined  with  a  "default"  service  entry,  then  this
       definition will be used in handling unknown mon traps.

   Service Definitions
       service servicename
              A  service  definition begins with they keyword service followed
              by a word which is the tag for this service.

              The components of a service are an interval, monitor, and one or
              more time period definitions, as defined below.

              If  a  service name of "default" is defined within a watch group
              called  "dafault"  (see   above),   then   the   default/default
              definition will be used for handling unknown mon traps.

       interval timeval
              The  keyword  interval  followed  by  a time value specifies the
              frequency that a monitor script will be triggered.  Time  values
              are defined as "30s", "5m", "1h", or "1d", meaning 30 seconds, 5
              minutes, 1 hour,  or  1  day.  The  numeric  portion  may  be  a
              fraction, such as "1.5h" or an hour and a half. This format of a
              time specification will be referred to as timeval.

       failure_interval timeval
              Adjusts the polling interval to timeval when the  service  check
              is failing. Resets the interval to the original when the service
              succeeds.

       traptimeout timeval
              This keyword takes  the  same  time  specification  argument  as
              interval,  and  makes the service expect a trap from an external
              source at least that often, else a failure will  be  registered.
              This is used for a heartbeat-style service.

       trapduration timeval
              If  a  trap  is received, the status of the service the trap was
              delivered to will normally remain constant. If  trapduration  is
              specified,  the  status  of the service will remain in a failure
              state for the duration specified by timeval, and then it will be
              reset to "success".

       randskew timeval
              Rather  than  schedule the monitor script to run at the start of
              each interval, randomly adjust the  interval  specified  by  the
              interval  parameter  by plus-or-minus randskew .  The skew value
              is specified as the interval parameter: "30s", "5m", etc...  For
              example  if  interval is 1m, and randskew is "5s", then mon will
              schedule the monitor script some time between every  55  seconds
              and  65  seconds.   The intent is to help distribute the load on
              the  server  when  many  services  are  scheduled  at  the  same
              intervals.

       monitor monitor-name [arg...]
              The  keyword  monitor  followed  by  a script name and arguments
              specifies the monitor to run when the timer expires.  Shell-like
              quoting  conventions  are followed when specifying the arguments
              to send to the monitor script.  The script is invoked  from  the
              directory  given  with  the -s argument, and all following words
              are supplied as arguments to the monitor  program,  followed  by
              the  list of hosts in the group referred to by the current watch
              group.  If the monitor line ends with ";;" as a  separate  word,
              the  host  groups are not appended to the argument list when the
              program is invoked.

       allow_empty_group
              The allow_empty_group option will allow a monitor to be  invoked
              even  when  the  hostgroup  for  that  watch is empty because of
              disabled hosts. The  default  behavior  is  not  to  invoke  the
              monitor when all hosts in a hostgroup have been disabled.

       description descriptiontext
              The  text  following  description is queried by client programs,
              passed to alerts and monitors via an  environment  variable.  It
              should  contain a brief description of the service, suitable for
              inclusion in an email or on a web page.

       exclude_hosts host [host...]
              Any hosts listed after exclude_hosts will be excluded  from  the
              service check.

       exclude_period periodspec
              Do  not  run  a  scheduled monitor during the time identified by
              periodspec.

       depend dependexpression
              The depend keyword is used to specify a  dependency  expression,
              which  evaluates  to either true of false, in the boolean sense.
              Dependencies are actual Perl  expressions,  and  must  obey  all
              syntactical  rules.  The  expressions are evaluated in their own
              package space so as to not accidentally have some unwanted side-
              effect.   If  a  syntax  error  is  found  when  evaluating  the
              expression, it is logged via syslog.

              Before evaluation, the following substitutions on the expression
              occur:  phrases  which look like "group:service" are substituted
              with the  value  of  the  current  operational  status  of  that
              specified  service.  These  opstatus  substitutions are computed
              recursively, so if service A depends upon service B, and service
              B depends upon service C, then service A depends upon service C.
              Successful operational statuses  (which  evaluate  to  "1")  are
              "STAT_OK",      "STAT_COLDSTART",      "STAT_WARMSTART",     and
              "STAT_UNKNOWN".  The word "SELF" (in all caps) can be  used  for
              the  group (e.g. "SELF:service"), and is an abbreviation for the
              current watch group.

              This feature can be used to control alerts  for  services  which
              are  dependent  on  other  services,  e.g. an SMTP test which is
              dependent upon the machine being ping-reachable.

       dep_behavior {a|m}
              The evaluation of dependency graphs can control the  suppression
              of either alert or monitor invocations.

              Alert  suppression.   If  this  option  is  set to "a", then the
              dependency expression will be evaluated after  the  monitor  for
              the  service  exits  or after a trap is received.  An alert will
              only be sent if the evaluation succeeds, meaning  that  none  of
              the nodes in the dependency graph indicate failure.

              Monitor  suppression.   If it is set to "m", then the dependency
              expression will be evaulated before the monitor for the  service
              is  about  to run.  If the evaulation succeeds, then the monitor
              will be run. Otherwise, the monitor will  not  be  run  and  the
              status of the service will remain the same.

   Period Definitions
       Periods  are used to define the conditions which should allow alerts to
       be delivered.

       period [label:] periodspec
              A period groups one or more alarms and variables  which  control
              how  often an alert happens when there is a failure.  The period
              keyword has two forms. The first takes an argument  which  is  a
              period  specification  from  Patrick  Ryan’s Time::Period Perl 5
              module. Refer to "perldoc Time::Period" for more information.

              The  second  form  requires  a  label  followed  by   a   period
              specification,  as  defined above. The label is a tag consisting
              of an alphabetic character or underscore  followed  by  zero  or
              more  alphanumerics or underscores and ending with a colon. This
              form allows multiple periods with the  same  period  definition.
              One  use  is to have a period definition which has no alertafter
              or alertevery parameters  for  a  particular  time  period,  and
              another  for the same time period with a different set of alerts
              that does contain those parameters.

       alertevery timeval [observe_detail]
              The alertevery keyword (within a period  definition)  takes  the
              same  type  of argument as the interval variable, and limits the
              number of times an alert is sent when the service  continues  to
              fail.   For  example,  if  the  interval  is "1h", then only the
              alerts in the period section will only be triggered  once  every
              hour. If the alertevery keyword is omitted in a period entry, an
              alert will be sent out every time  a  failure  is  detected.  By
              default,  if  the  summary  output  of  two  successive failures
              changes, then the alertevery  interval  is  overridden,  and  an
              alert  will be sent.  If the string "observe_detail" is the last
              argument, then both the summary and detail output lines will  be
              considered  when  comparing  the  output of successive failures.
              Please refer to the ALERT DECISION LOGIC section for a  detailed
              explanation of how alerts are suppressed.

       alertafter num

       alertafter num timeval

       alertafter timeval
              The  alertafter  keyword  (within  a  period  section) has three
              forms: only with the "num" argument, or with the  "num  timeval"
              arguments,  or  only  with the "timeval" argument.  In the first
              form, an alert will only  be  invoked  after  "num"  consecutive
              failures.

              In  the  second  form,  the  arguments  are  a  positive integer
              followed by an interval, as described by the  interval  variable
              above.   If  these parameters are specified, then the alerts for
              that period will only be called after that many failures  happen
              within  that  interval.  For example, if alertafter is given the
              arguments "3 30m", then the alert will be called if  3  failures
              happen within 30 minutes.

              In  the third form, the argument is an interval, as described by
              the interval variable above.  Alerts for that period  will  only
              be  called  if  the service has been in a failure state for more
              than the length of time desribed by the interval, regardless  of
              the number of failures noticed within that interval.

       numalerts num

              This  variable  tells the server to call no more than num alerts
              during a failure. The alert counter  is  kept  on  a  per-period
              basis, and is reset upon each success.

       no_comp_alerts

              If  this  option  is  specified,  then  upalerts  will be called
              whenever the service state  changes  from  failure  to  success,
              rather than only after a corresponding "down" alert.

       alert alert [arg...]
              A  period  may contain multiple alerts, which are triggered upon
              failure of the service. An alert is  specified  with  the  alert
              keyword,  followed  by an optional exit parameter, and arguments
              which are interpreted the same as the  monitor  definition,  but
              without the ";;" exception. The exit parameter takes the form of
              exit=x or exit=x-y and has the effect that  the  alert  is  only
              called if the exit status of the monitor script falls within the
              range of the exit parameter. If, for example, the alert line  is
              alert  exit=10-20  mail.alert  mis  then mail-alert will only be
              invoked with mis as its arguments if the monitor program’s  exit
              value  is  between 10 and 20. This feature allows you to trigger
              different alerts at different severity levels  (like  when  free
              disk space goes from 8% to 3%).

              See  the  ALERT  PROGRAMS  section  above  for  a  list  of  the
              pramaeters mon will pass automatically to alert programs.

       upalert alert [arg...]
              An upalert is the compliment of an alert.  An upalert is  called
              when  a  services  makes  the  state  transition from failure to
              success, if a corresponding "down" alert  was  previously  sent.
              The  upalert  script  is called supplying the same parameters as
              the alert script, with the addition of the -u parameter which is
              simply  used to let an alert script know that it is being called
              as an upalert. Multiple  upalerts  may  be  specified  for  each
              period  definition.  Set the per-period no_comp_alerts option to
              send an upalert regardless if whether or not a "down" alert  was
              sent.

       startupalert alert [arg...]
              A  startupalert  is  only  called  when  the  mon  server starts
              execution.

       upalertafter timeval
              The upalertafter parameter is specified as a string that follows
              the  syntax  of  the interval parameter ("30s", "1m", etc.), and
              controls the triggering of an upalert.  If a service comes  back
              up  after  being  down  for  a time greater than or equal to the
              value of this option, an upalert will be called. Use this option
              to  prevent  upalerts  to  be  called  because of "blips" (brief
              outages).

AUTHENTICATION CONFIGURATION FILE

       The file specified by the authfile variable in the  configuration  file
       (or  passed  via  the  -A parameter) will be loaded upon startup.  This
       file defines restrictions upon which client commands may be executed by
       which  users.  It  is  a  text file which consists of comments, command
       definitions, and trap authentication parameters.  A comment line begins
       with  optional  whitespace  followed  by  pound  sign.  Blank lines are
       ignored.

       The file is separated into  a  command  section  and  a  trap  section.
       Sections are specified by a single line containing one of the following
       statements:

                   command section

       or

                   trap section

       Lines following one of the above statements apply to that section until
       either the end of the file or another section begins.

       A  command  definition  consists  of  a  command,  followed by a colon,
       followed by a  comma-separated  list  of  users  who  may  execute  the
       command.   The default is that no users may execute any commands unless
       they are explicitly allowed in this configuration file. For clarity,  a
       user  can  be  denied  by prefixing the user name with "!". If the word
       "AUTH_ANY" is used for a username, then any authenticated user will  be
       allowed to execute the command.

       The  trap  section  allows  configuration of which users may send traps
       from which hosts. The syntax is a source host  (name  or  ip  address),
       whitespace,  a  username, whitespace, and a plaintext password for that
       user. If the source host is "*", then allow traps from any host. If the
       username  is  "*", then accept traps without regard for the username or
       password. If no hosts or users are specified, then  no  traps  will  be
       accepted.

       An example configuration file:

              command section
              list:          all
              reset:         root,admin
              loadstate:          root
              savestate:          root

              trap section
              127.0.0.1 root r@@tp4sswrd

       This  means  that  all  clients  are  able to perform the list command,
       "root" is  able  to  perform  "reset",  "loadstate",  "savestate",  and
       "admin" is able to execute the "reset" command.

CLIENT-SERVER INTERFACE

       The  server listens on TCP port 2583, which may be overridden using the
       -p port option. Commands are  a  single  line  each,  terminated  by  a
       newline.   Currently the server is iterative, accepting a single client
       at a time. This will change in future releases.

CLIENT INTERFACE COMMANDS

       See manual page for moncmd.

MON TRAPPING

       Mon has the facility to receive special "mon traps" from any  local  or
       remote  machine.  Currently,  the only available method for sending mon
       traps are through the Mon::Client perl interface, though the UDP packet
       format  is  defined well enough to permit the writing of traps in other
       languages.

       Traps are handled similarly to monitors: a trap  sends  an  operational
       status,  summary line, and description text, and mon generates an alert
       or upalert as necessary.

       Traps can be caught by any  watch/service  group  set  up  in  the  mon
       configuration   file,  however  it  is  suggested  that  you  configure
       watch/service groups specifically for the traps you expect to  receive.
       When defining a special watch/service group for traps, do not include a
       "monitor" directive (as no monitor need be invoked). Since a monitor is
       not being invoked, it is not necessary for the watch definition to have
       a hostgroup which contains real host names.   Just  make  up  a  useful
       name, and mon will automatically create the watch group for you.

       Here is a simple config file example:

              watch trap-service
                   service host1-disks
                        description TRAP: for host1 disk status
                        period wd {Sun-Sat}
                             alert mail.alert someone@your.org
                             upalert mail.alert -u someone@your.org

       Since  mon  listens  on  a UDP port for any trap, a default facility is
       available for handling traps to unknown groups or services.  To  enable
       this  facility,  you  must  include  a  "default"  watch  group  with a
       "default" service entry containing  the  specifics  of  alarms.   If  a
       default/default  watch  group  and  service  are  not  configured, then
       unknown traps get logged via syslog, and no alarm is sent.   NOTE:  The
       default/default  facility  is  a single entity as far as accounting and
       alarming go. Alarm programs which are not aware of this fact  may  send
       confusing  information  when  a  failure  trap  comes from one machine,
       followed by a success (ok) trap from a different machine. See the alarm
       environment  variable MON_TRAP_INTENDED above for a possible way around
       this. It is intended that default/default be  used  as  a  facility  to
       catch  unknown  traps, and should not be relied upon to catch all traps
       in a production environment. If you are  lazy  and  only  want  to  use
       default/default  for  catching  all  traps, it would be best to disable
       upalerts, and use the MON_TRAP_INTENDED environment variable  in  alert
       scripts to make the alerts more meaningful to you.

       Here is an example default facility:

              watch default
                   service default
                        description Default trap service
                        period wd {Sun-Sat}
                             alert mail.alert someone@your.org
                             upalert mail.alert -u someone@your.org

EXAMPLES

       The  mon  distribution  comes  with  an  example  configuration  called
       example.cf.  Refer to that file for more information.

SEE ALSO

       moncmd(1), Time::Period(3pm), Mon::Client(3pm)

HISTORY

       mon was written because I couldn’t find anything  out  there  that  did
       just what I needed, and nothing was worth modifying to add the features
       I wanted. It doesn’t have a cool name, and that bothers  me  because  I
       couldn’t think of one.

BUGS

       Report bugs to the email address below.

AUTHOR

       Jim Trocki <trockij@transmeta.com>