Provided by: hobbit_4.2.0.dfsg-16build1_i386 bug

NAME

       hobbit-clients.cfg - Configuration file for the hobbitd_client module

SYNOPSIS

       ~hobbit/server/etc/hobbit-clients.cfg

DESCRIPTION

       The  hobbit-clients.cfg  file  controls  what  color is assigned to the
       status-messages that are  generated  from  the  Hobbit  client  data  -
       typically  the  cpu,  disk,  memory,  procs- and msgs-columns. Color is
       decided on the basis of some settings defined in  this  file;  settings
       apply to specific hosts through a set of rules.

       Note:  This  file is only used on the Hobbit server - it is not used by
       the Hobbit client, so there is no need to distribute it to your  client
       systems.

FILE FORMAT

       Blank  lines  and  lines  starting  with a hash mark (#) are treated as
       comments and ignored.

CPU STATUS COLUMN SETTINGS

       LOAD warnlevel paniclevel

       If the system load  exceeds  "warnlevel"  or  "paniclevel",  the  "cpu"
       status  will go yellow or red, respectively. These are decimal numbers.

       Defaults: warnlevel=5.0, paniclevel=10.0

       UP bootlimit toolonglimit

       The cpu status goes yellow if the system has  been  up  for  less  than
       "bootlimit"  time,  or  longer  than  "toolonglimit".  The  time  is in
       minutes, or you can add h/d/w for hours/days/weeks - eg. "2h"  for  two
       hours, or "4w" for 4 weeks.

       Defaults: bootlimit=1h, toolonglimit=-1 (infinite).

       CLOCK max.offset

       The  cpu  status  goes yellow if the system clock on the client differs
       more than "max.offset" seconds from that of  the  Hobbit  server.  Note
       that  this is not a particularly accurate test, since it is affected by
       network delays between the client and the server, and the load on  both
       systems.  You  should therefore not rely on this being accurate to more
       than +/- 5 seconds, but it will let you catch a client clock that  goes
       completely wrong. The default is NOT to check the system clock.
       NOTE: Correct operation of this test obviously requires that the system
       clock of the Hobbit server is correct. You should therefore  make  sure
       that the Hobbit server is synchronized to the real clock, e.g. by using
       NTP.

       Example: Go yellow if the load average exceeds 5, and red if it exceeds
       10.  Also,  go  yellow for 10 minutes after a reboot, and after 4 weeks
       uptime. Finally, check that the system clock  is  at  most  15  seconds
       offset from the clock of the Hobbit system.

              LOAD 5 10
              UP 10m 4w
              CLOCK 15

DISK STATUS COLUMN SETTINGS

       DISK filesystem warnlevel paniclevel DISK filesystem IGNORE

       If the utilization of "filesystem" is reported to exceed "warnlevel" or
       "paniclevel", the "disk" status will go yellow  or  red,  respectively.
       "warnlevel"  and  "paniclevel"  are  either the percentage used, or the
       space available as reported by the local "df" command on the host.  For
       the  latter  type  of  check,  the  "warnlevel" must be followed by the
       letter "U", e.g. "1024U".

       The special keyword "IGNORE"  causes  this  filesystem  to  be  ignored
       completely,  i.e. it will not appear in the "disk" status column and it
       will not be tracked in a graph.  This  is  useful  for  e.g.  removable
       devices, backup-disks and similar hardware.

       "filesystem"  is  the mount-point where the filesystem is mounted, e.g.
       "/usr"  or  "/home".  A  filesystem-name  that  begins  with   "%"   is
       interpreted    as    a   Perl-compatible   regular   expression;   e.g.
       "%^/oracle.*/" will match any filesystem whose mountpoint  begins  with
       "/oracle".

       Defaults: warnlevel=90%, paniclevel=95%

MEMORY STATUS COLUMN SETTINGS

       MEMPHYS warnlevel paniclevel
       MEMACT warnlevel paniclevel
       MEMSWAP warnlevel paniclevel

       If  the memory utilization exceeds the "warnlevel" or "paniclevel", the
       "memory" status will change to yellow or red, respectively.  Note:  The
       words "PHYS", "ACT" and "SWAP" are also recognized.

       Example:  Go yellow if more than 20% swap is used, and red if more than
       40% swap is used or the actual memory  utilisation  exceeds  90%.  Dont
       alert on physical memory usage.

              MEMSWAP 20 40
              MEMACT 90 90
              MEMPHYS 101 101

       Defaults:

              MEMPHYS warnlevel=100 paniclevel=101 (i.e. it will never go red).
              MEMSWAP warnlevel=50 paniclevel=80
              MEMACT  warnlevel=90 paniclevel=97

PROCS STATUS COLUMN SETTINGS

       PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text]

       The "ps" listing sent by the  client  will  be  scanned  for  how  many
       processes  containing  "processname"  are  running,  and  this  is then
       matched against the min/max settings defined here. If the running count
       is  outside  the thresholds, the color of the "procs" status changes to
       "color".

       To check for a process that  must  NOT  be  running:  Set  minimum  and
       maximum to 0.

       "processname"  can  be  a simple string, in which case this string must
       show up in the "ps" listing as a command. The scanner will find  a  ps-
       listing  of  e.g. "/usr/sbin/cron" if you only specify "processname" as
       "cron".   "processname"  can  also  be   a   Perl-compatiable   regular
       expression,  e.g.   "%java.*inst[0123]"  can be used to find entries in
       the ps-listing for "java -Xmx512m inst2" and "java -Xmx256  inst3".  In
       that  case,  "processname"  must begin with "%" followed by the regular
       expression.  Note that  Hobbit  defaults  to  case-insensitive  pattern
       matching; if that is not what you want, put "(?-i)" between the "%" and
       the regular expression to turn this off. E.g. "%(?-i)HTTPD" will  match
       the word HTTPD only when it is upper-case.
       If  "processname" contains whitespace (blanks or TAB), you must enclose
       the full string in double quotes - including the "%" if you use regular
       expression matching. E.g.

           PROC "%hobbitd_channel --channel=data.*hobbitd_rrd" 1 1 yellow

       or

           PROC "java -DCLASSPATH=/opt/java/lib" 2 5

       You  can  have  multiple  "PROC"  entries for the same host, all of the
       checks are merged into the "procs" status and  the  most  severe  check
       defines the color of the status.

       The  optional  TRACK=id  setting  causes  Hobbit to track the number of
       processes found in an RRD file, and put this  into  a  graph  which  is
       shown  on  the  "procs" status display. The id setting is a simple text
       string which will be used as the legend for the graph, and also as part
       of  the  RRD  filename. It is recommended that you use only letters and
       digits for the ID.
       Note that the process counts which are tracked are only performed  once
       when the client does a poll cycle - i.e. the counts represent snapshots
       of the system state, not an average value over the client  poll  cycle.
       Therefore there may be peaks or dips in the actual process counts which
       will not show up in the graphs, because they happen  while  the  Hobbit
       client is not doing any polling.

       The  optional  TEXT=text  setting is used in the summary of the "procs"
       status. Normally, the summary will show the "processname"  to  identify
       the process and the related count and limits. But this may be a regular
       expression which is not easily recognizable, so if  defined,  the  text
       setting  string  will  be  used  instead. This only affects the "procs"
       status display - it has no effect on how the rule counts or  recognizes
       processes in the "ps" output.

       Example: Check that "cron" is running:
            PROC cron

       Example:  Check  that at least 5 "httpd" processes are running, but not
       more than 20:
            PROC httpd 5 20

       Defaults:
            mincount=1, maxcount=-1 (unlimited), color="red".
            Note that no processes are checked by default.

MSGS STATUS COLUMN SETTINGS

       LOG logfilename pattern [COLOR=color] [IGNORE=excludepattern]

       The Hobbit client extracts interesting lines from one or more  logfiles
       -  see  the  client-local.cfg(5)  man-page for information about how to
       configure which logs a client should look at.

       The LOG setting  determine  how  these  extracts  of  log  entries  are
       processed, and what warnings or alerts trigger as a result.

       "logfilename"  is  the  name  of the logfile. Only logentries from this
       filename will be matched against this rule.   Note  that  "logfilename"
       can be a regular expression (if prefixed with a ’%’ character).

       "pattern"  is  a  string  or  regular  expression.  If the logfile data
       matches "pattern", it will trigger the "msgs" column to  change  color.
       If no "color" parameter is present, the default is to go "red" when the
       pattern is matched. To match against a  regular  expression,  "pattern"
       must begin with a ’%’ sign - e.g "%WARNING|NOTICE" will match any lines
       containing either of these two words.  Note  that  Hobbit  defaults  to
       case-insensitive  pattern  matching;  if that is not what you want, put
       "(?-i)" between the "%" and the regular expression to  turn  this  off.
       E.g. "%(?-i)WARNING" will match the word WARNING only when it is upper-
       case.

       "excludepattern" is a string or regular expression that can be used  to
       filter out any unwanted strings that happen to match "pattern".

       Example:  Trigger  a  red  alert when the string "ERROR" appears in the
       "/var/adm/syslog" file:
            LOG /var/adm/syslog ERROR

       Example: Trigger a yellow  warning  on  all  occurrences  of  the  word
       "WARNING"  or  "NOTICE" in the "daemon.log" file, except those from the
       "lpr" system:
            LOG /var/log/daemon.log %WARNING|NOTICE COLOR=yellow IGNORE=lpr

       Defaults:
            color="red", no "excludepattern".

       Note that no logfiles are checked by default. Any log data reported  by
       a client will just show up on the "msgs" column with status OK (green).

FILES STATUS COLUMN SETTINGS

       FILE filename [color] [things to check] [TRACK]

       DIR directoryname [color] [size<MAXSIZE] [size>MINSIZE] [TRACK]

       These entries control the status of the "files" column. They allow  you
       to check on various data for files and directories.

       filename  and  directoryname  are names of files or directories, with a
       full path. You can use a regular expression to match the names of files
       and  directories  reported  by the client, if you prefix the expression
       with a ’%’ character.

       color is the color that triggers when one or more of the checks fail.

       The TRACK keyword causes the size  of  the  file  or  directory  to  be
       tracked  in an RRD file, and presented in a graph on the "files" status
       display.

       For files, you can check one or more of the following:

       noexist
              triggers a warning if the file exists. By default, a warning  is
              triggered  for  files  that  have a FILE entry, but which do not
              exist.

       type=TYPE
              where TYPE is one of "file", "dir", "char", "block", "fifo",  or
              "socket".  Triggers  warning if the file is not of the specified
              type.

       ownerid=OWNER
              triggers a warning if the owner does not match  what  is  listed
              here.   OWNER  is  specified either with the numeric uid, or the
              user name.

       groupid=GROUP
              triggers a warning if the group does not match  what  is  listed
              here.   GROUP  is  specified either with the numeric gid, or the
              group name.

       mode=MODE
              triggers a warning if the file permissions are  not  as  listed.
              MODE  is written in the standard octal notation, e.g.  "644" for
              the rw-r--r-- permissions.

       size<MAX.SIZE and size>MIN.SIZE
              triggers a warning it the file size is greater than MAX.SIZE  or
              less than MIN.SIZE, respectively. For filesizes, you can use the
              letters "K", "M", "G" or "T" to indicate that the filesize is in
              Kilobytes,  Megabytes,  Gigabytes or Terabytes, respectively. If
              there is no such modifier, Kilobytes is assumed. E.g. to warn if
              a file grows larger than 1MB, use size<1024M.

       mtime>MIN.MTIME mtime<MAX.MTIME
              checks  how  long  ago  the file was last modified (in seconds).
              E.g.  to check if a file was updated within the past 10  minutes
              (600  seconds):  mtime<600. Or to check that a file has NOT been
              updated in the past 24 hours: mtime>86400.

       mtime=TIMESTAMP
              checks if a file was last modified at TIMESTAMP.  TIMESTAMP is a
              unix epoch time (seconds since midnight Jan 1 1970 UTC).

       ctime>MIN.CTIME, ctime<MAX.CTIME, ctime=TIMESTAMP
              acts  as the mtime checks, but for the ctime timestamp (when the
              directory entry of the file was  last  changed,  eg.  by  chown,
              chgrp or chmod).

       md5=MD5SUM, sha1=SHA1SUM, rmd160=RMD160SUM
              trigger  a  warning  if the file checksum using the MD5, SHA1 or
              RMD160 message digest algorithms do not match the one configured
              here.  Note:  The  "file"  entry in the client-local.cfg(5) file
              must specify which algorithm to use.

       For directories, you can check one or more of the following:

       size<MAX.SIZE and size>MIN.SIZE
              triggers a  warning  it  the  directory  size  is  greater  than
              MAX.SIZE  or  less  than MIN.SIZE, respectively. Directory sizes
              are reported in whatever unit the du command on the client  uses
              -  often  KB  or  diskblocks  - so MAX.SIZE and MIN.SIZE must be
              given in the same unit.

       Experience shows that it can be difficult to  get  these  rules  right.
       Especially  when  defining  minimum/maximum values for file sizes, when
       they were last modified etc. The  one  thing  you  must  remember  when
       setting  up  these checks is that the rules describe criteria that must
       be met - only when they are met will the status be green.

       So "mtime<600" means "the difference between current time and the mtime
       of  the  file  must  be less than 600 seconds - if not, the file status
       will go red".

PORTS STATUS COLUMN SETTINGS

       PORT criteria [MIN=mincount]  [MAX=maxcount]  [COLOR=color]  [TRACK=id]
       [TEXT=displaytext]

       The  "netstat"  listing sent by the client will be scanned for how many
       sockets match the criteria listed.  The criteria you can use are:

       LOCAL=addr
              "addr" is a (partial) local address specification in the  format
              used on the output from netstat.

       EXLOCAL=addr
              Exclude certain local adresses from the rule.

       REMOTE=addr
              "addr" is a (partial) remote address specification in the format
              used on the output from netstat.

       EXREMOTE=addr
              Exclude certain remote adresses from the rule.

       STATE=state
              Causes only the sockets in the specified state to  be  included,
              "state"  is  usually LISTEN or ESTABLISHED but can be any socket
              state reported by the clients "netstat" command.

       EXSTATE=state
              Exclude certain states from the rule.

       "addr" is typically "10.0.0.1:80" for the IP  10.0.0.1,  port  80.   Or
       "*:80"  for  any  local  address, port 80. Note that the Hobbit clients
       normally report only the numeric data for IP-adresses and port-numbers,
       so  you must specify the port number (e.g. "80") instead of the service
       name ("www").
       "addr" and "state" can also be a Perl-compatiable  regular  expression,
       e.g.   "LOCAL=%[.:](80|443)" can be used to find entries in the netstat
       local port for both http (port 80) and https (port 443). In that  case,
       portname or state must begin with "%" followed by the reg.expression.

       The  socket  count  found  is then matched against the min/max settings
       defined here. If the count is outside the thresholds, the color of  the
       "ports" status changes to "color".  To check for a socket that must NOT
       exist: Set minimum and maximum to 0.

       The optional TRACK=id setting causes Hobbit  to  track  the  number  of
       sockets  found in an RRD file, and put this into a graph which is shown
       on the "ports" status display. The id setting is a simple  text  string
       which will be used as the legend for the graph, and also as part of the
       RRD filename. It is recommended that you use only  letters  and  digits
       for the ID.
       Note  that the sockets counts which are tracked are only performed once
       when the client does a poll cycle - i.e. the counts represent snapshots
       of  the  system state, not an average value over the client poll cycle.
       Therefore there may be peaks or dips in the actual sockets counts which
       will  not  show  up in the graphs, because they happen while the Hobbit
       client is not doing any polling.

       The TEXT=displaytext option affects how the port appears on the "ports"
       status page. By default, the port is listed with the local/remote/state
       rules  as  identification,  but  this  may  be  somewhat  difficult  to
       understand.  You  can  then  use e.g. "TEXT=Secure Shell" to make these
       ports appear with the name "Secure Shell" instead.

       Defaults: mincount=1, maxcount=-1 (unlimited), color="red".   Note:  No
       ports are checked by default.

       Example:  Check  that the SSH daemon is listening on port 22. Track the
       number of active SSH connections, and warn if there are more than 5.
               PORT LOCAL=%[.:]22$ STATE=LISTEN "TEXT=SSH listener"
               PORT LOCAL=%[.:]22$ STATE=ESTABLISHED MAX=5 TRACK=ssh TEXT=SSH

CHANGING THE DEFAULT SETTINGS

       If you would like to use different defaults for the settings  described
       above,  then you can define the new defaults after a DEFAULT line. E.g.
       this would explicitly define all of the default settings:

              DEFAULT
                   UP      1h
                   LOAD    5.0 10.0
                   DISK    * 90 95
                   MEMPHYS 100 101
                   MEMSWAP 50 80
                   MEMACT  90 97

RULES TO SELECT HOSTS

       All of the settings can be applied to a group of  hosts,  by  preceding
       them  with  rules.  A  rule  defines of one of more filters using these
       keywords (note that this is identical to the rule definitions  used  in
       the hobbit-alerts.cfg(5) file).

       PAGE=targetstring Rule matching an alert by the name of the page in BB.
       "targetstring" is the path of the page as defined in the bb-hosts file.
       E.g. if you have this setup:

              page servers All Servers
              subpage web Webservers
              10.0.0.1 www1.foo.com
              subpage db Database servers
              10.0.0.2 db1.foo.com

       Then   the   "All   servers"  page  is  found  with  PAGE=servers,  the
       "Webservers" page is PAGE=servers/web and the "Database  servers"  page
       is  PAGE=servers/db.  Note that you can also use regular expressions to
       specify the page  name,  e.g.  PAGE=%.*/db  would  find  the  "Database
       servers"  page  regardless  of  where  this  page  was  placed  in  the
       hierarchy.

       The top-level page has a the fixed name /, e.g. PAGE=/ would match  all
       hosts  on the Hobbit frontpage. If you need it in a regular expression,
       use PAGE=%^/ to avoid matching the forward-slash  present  in  subpage-
       names.

       EXPAGE=targetstring Rule excluding a host if the pagename matches.

       HOST=targetstring Rule matching a host by the hostname.  "targetstring"
       is either a comma-separated list of hostnames (from the bb-hosts file),
       "*"  to  indicate "all hosts", or a Perl-compatible regular expression.
       E.g.  "HOST=dns.foo.com,www.foo.com"  identifies  two  specific  hosts;
       "HOST=%www.*.foo.com  EXHOST=www-test.foo.com" matches all hosts with a
       name beginning with "www", except the "www-test" host.

       EXHOST=targetstring Rule excluding a host by matching the hostname.

       CLASS=classname Rule match by the client class-name.  You  specify  the
       class-name   for   a   host   when  starting  the  client  through  the
       "--class=NAME" option to  the  runclient.sh  script.  If  no  class  is
       specified, the host by default goes into a class named by the operating
       system.

       EXCLASS=classname Exclude all hosts belonging to "classname" from  this
       rule.

       TIME=timespecification  Rule  matching  by  the  time-of-day.  This  is
       specified as the DOWNTIME time  specification  in  the  bb-hosts  file.
       E.g.  "TIME=W:0800:2200"  applied  to a rule will make this rule active
       only on week-days between 8AM and 10PM.

DIRECTING ALERTS TO GROUPS

       For some tests - e.g. "procs" or "msgs" - the right group of people  to
       alert  in case of a failure may be different, depending on which of the
       client rules actually detected a problem. E.g. if you have PROCS  rules
       for  a  host  checking  both "httpd" and "sshd" processes, then the Web
       admins  should  handle  httpd-failures,  whereas  "sshd"  failures  are
       handled by the Unix admins.

       To handle this, all rules can have a "GROUP=groupname" setting.  When a
       rule with this setting triggers a yellow or red status,  the  groupname
       is  passed  on  to  the  Hobbit alerts module, so you can use it in the
       alert rule definitions in hobbit-alerts.cfg(5) to direct alerts to  the
       correct group of people.

RULES: APPLYING SETTINGS TO SELECTED HOSTS

       Rules must be placed after the settings, e.g.

              LOAD 8.0 12.0  HOST=db.foo.com TIME=*:0800:1600

       If you have multiple settings that you want to apply the same rules to,
       you can write the rules *only* on one line, followed by  the  settings.
       E.g.

              HOST=%db.*.foo.com TIME=W:0800:1600
                   LOAD 8.0 12.0
                   DISK /db  98 100
                   PROC mysqld 1

       will  apply  the  three  settings to all of the "db" hosts on week-days
       between 8AM and 4PM. This can be combined with  per-settings  rule,  in
       which case the per-settings rule overrides the general rule; e.g.

              HOST=%.*.foo.com
                   LOAD 7.0 12.0 HOST=bax.foo.com
                   LOAD 3.0 8.0

       will  result  in  the  load-limits being 7.0/12.0 for the "bax.foo.com"
       host, and 3.0/8.0 for all other foo.com hosts.

       The entire file is evaluated from the top  to  bottom,  and  the  first
       match found is used. So you should put the specific settings first, and
       the generic ones last.

NOTES

       For the LOG, FILE and DIR checks, it is necessary also to configure the
       actual  file-  and  directory-names in the client-local.cfg(5) file. If
       the filenames are not listed there, the clients will  not  collect  any
       data  about  these  files/directories,  and the settings in the hobbit-
       clients.cfg file will be silently ignored.

       The ability to compute file checksums with MD5, SHA1 or  RMD160  should
       not  be  used  for  general-purpose  file integrity checking, since the
       overhead of calculating these  on  a  large  number  of  files  can  be
       significant.  If you need this, look at tools designed for this purpose
       - e.g. Tripwire or AIDE.

       At the time of writing (april 2006), the SHA-1  and  RMD160  algorithms
       are considered cryptographically safe. The MD5 algorithm has been shown
       to have some weaknesses, and is not considered  strong  enough  when  a
       high level of security is required.

SEE ALSO

       hobbitd_client(8), client-local.cfg(5), hobbitd(8), hobbit(7)