Provided by: watchdog_5.16-1.1_amd64 bug

NAME

       watchdog.conf - configuration file for the watchdog daemon

DESCRIPTION

       This  file  carries  all configuration options for the Linux watchdog daemon.  Each option
       has to be written on a line for itself. Comments  start  with  '#'.   Blanks  are  ignored
       except  after  the '=' sign. An empty text after the '=' sign disables the feature as long
       as that makes sense.

OPTIONS

       interval = <interval>
              Set the highest possible interval between two writes to the watchdog  device.   The
              device  is  triggered  after  each  check  regardless  of  the  time it took. After
              finishing all checks watchdog goes to sleep for a full cycle of <interval> seconds.
              Default  value  is  1  second. The kernel drivers typically expects a write command
              every minute otherwise the system will be rebooted.  Therefore an interval of  more
              than a minute can only be used with the force command-line option [--force | -f].

       logtick = <logtick>
              If  you  enable verbose logging, a message is written into the syslog or a logfile.
              While this is nice, it is not necessary to  get  a  message  every  interval  which
              really  fills  up  disk  and  needs CPU. logtick allows adjustment of the number of
              intervals skipped before a log message is written. If you  use  logtick  =  60  and
              interval  =  10, only every 10 minutes (600 seconds) a message is written. This may
              make the exact time of a crash harder to find but greatly reduces  disk  usage  and
              administrator  nerves if you're looking for a particular syslog entry in between of
              watchdog messages.

       max-load-1 = <load1>
              Set the maximal allowed load average for a 1 minute span. Once this load average is
              reached  the  system  is  rebooted. Default value is 0. That means the load average
              check is disabled. Be careful not to set this parameter too low.  To  set  a  value
              less  then  the  predefined minimal value of 2, you have to use the -f command line
              option.

       max-load-5 = <load5>
              Set the maximal allowed load average for a 5 minute span. Once this load average is
              reached the system is rebooted. Default value is 3/4*max-load-1.  Be careful not to
              this parameter too low. To set a value less then the predefined minimal value of 2,
              you have to use the -f command line option.

       max-load-15 = <load15>
              Set  the  maximal allowed load average for a 15 minute span. Once this load average
              is reached the system is rebooted. Default value is 1/2*max-load-1.  Be careful not
              to this parameter too low. To set a value less then the predefined minimal value of
              2, you have to use the -f command line option.

       min-memory = <minpage>
              Set the minimal amount of memory that has to stay free. Note that this is in memory
              pages (4kB on x86). Default value is 0 pages which means this test is disabled. The
              page size is taken from the system include files.  The usable  memory  is  computed
              from  MemFree + Buffers + Cached since buffer and cache use typically expand to use
              most free memory but the kernel will reclaim this as needed. NOTE: If this  measure
              gets below a few tens of MB then the system will page swap aggressively have poorer
              file system performance due to the lack of caching.  This is a 'passive'  test  and
              works by reading /proc/meminfo

       allocatable-memory = <minpage>
              Set  the  minimum  amount of allocatable memory available on the system.  Note that
              this is in pages.  Default value is 0 pages which means the test is  disabled.   As
              with  min-memory,  the page size is taken from the system include files. This is an
              'active' test and it works by attempting to memory-map a block  of  the  configured
              size.

       max-swap = <maxpage>
              Set the maximum amount of swap use. Note that this is in memory pages (4kB on x86).
              Default value is 0 pages which means this test is disabled. Often this should be  a
              large  portion  of  available  swap,  but remember that paging 1GB of swap can take
              several/tens  of  seconds.   This  is  a  'passive'  test  and  works  by   reading
              /proc/meminfo

       watchdog-device = <device>
              Set  the  watchdog device name, typically /dev/watchdog. Default is to disable keep
              alive support. This should be tested by running the daemon from  the  command  line
              before configuring it to start automatically on booting.

       watchdog-refresh-use-settimeout = <auto|yes|no>
              Refresh  watchdog  timer  by setting its timeout instead of using a normal watchdog
              refresh operation. Might help if your watchdog  trips  by  itself  when  the  first
              timeout  interval  elapses.  Default  is  'auto'  for  IT87  fix-up but this can be
              disabled with 'no' or forced for other modules with 'yes'.

       watchdog-refresh-ignore-errors = <yes|no>
              Ignore errors reported by writing to the watchdog device. Typically  this  is  used
              for  systems  that have broken implementations of the IPMI driver to avoid a reboot
              loop.

       watchdog-timeout = <timeout>
              Set the watchdog device timeout during startup.  If not set, a default is used that
              should be set to the kernel timer margin at compile time.

       temperature-sensor = <temp-virtual-file>
              Set  the  temperature sensor name. This is normally a 'virtual file' under /sys and
              it contains the temperature in milli-Celsius. Usually these are  generated  by  the
              sensors  package,  but take care as device enumeration may not be fixed. Default is
              to disable temperature checking. Multiple sensors can be used  by  having  repeated
              temperature-sensor  entries. Due to the enumeration problem any missing temp sensor
              is simply ignored and not treated as a reboot trigger.

       max-temperature = <temp>
              Set the maximal allowed temperature in Celsius. Once this  temperature  is  reached
              the system is stopped. Default value is 90 C. Watchdog will issue warnings once the
              temperature increases 90%, 95% and 98% of this temperature.

       temp-power-off = <yes|no>
              Set the watchdog action on overheating.  Yes  option  (default)  is  to  power  the
              machine off, no option is to halt machine and allow Ctrl-Alt-Del reboot.

       file = <filename>
              Set  file  name  for  file  mode.  This option can be given as often as you like to
              check several files.

       change = <mtime>
              Set the change interval time for file mode. This  options  always  belongs  to  the
              active filename, that is when finding a 'change =' line watchdog assumes it belongs
              to the most recently read 'file =' line.  They don't  necessarily  have  to  follow
              each  other  directly.  But you cannot specify a 'change =' before a 'file ='.  The
              default is to only stat the file and don't look for changes.  Using this feature to
              monitor  changes  in  /var/log/messages  might  require  some special syslog daemon
              configuration, e.g. rsyslog needs "$ActionWriteAllMarkMessages on"  to  be  set  to
              make sure the marks are written no matter what.

       pidfile = <pidfilename>
              Set  pidfile  name  for daemon test mode.  This option can be given as often as you
              like to check several daemons, assuming they write their post-forking  PID  to  the
              specified files.

       ping = <ip-addr>
              Set  IPv4  address  for ping mode.  This option can be used more than once to check
              different connections.

       ping-count = <ping-per-interval>
              Set the number of ping attempts in each 'interval' of time. Default  is  3  and  it
              completes on the first successful ping.

       interface = <if-name>
              Set  interface  name  for  network mode.  This option can be used more than once to
              check different interfaces. Note it is only possible to check physical  interfaces,
              and not aliased IP interfaces.

       test-binary = <testbin>
              Execute the given binary to do some user defined tests.

       test-timeout = <timeout in seconds>
              User defined tests may only run for <timeout> seconds. Set to 0 for unlimited.

       repair-binary = <repbin>
              Execute the given binary in case of a problem instead of shutting down the system.

       repair-timeout = <timeout in seconds>
              repair  command  may  only run for <timeout> seconds. Set to 0 for 'unlimited', but
              note that the hardware timer is not refreshed in this case so the system will hard-
              reset at some point.

       retry-timeout = <timeout in seconds>
              Allow  most  error  conditions  to  persist  for  <timeout>  seconds.  Set to 0 for
              immediate action (like softboot behaviour).

       repair-maximum = <count>
              This allows no more then <count> repair attempts against a given fault that  report
              success  (i.e. return 0), but fail to clear the fault, before a reboot is initiated
              anyway. If set to zero then a repairable fault can always be blocked  by  a  repair
              program reporting success (previous daemon behaviour).

       softboot-option = <yes|no>
              This  acts  like the -b / --softboot command line and simply sets the retry timeout
              to zero.

       admin = <mail-address>
              Email address to send admin mail to. That  is,  who  shall  be  notified  that  the
              machine  is  being  halted  or  rebooted. Default is 'root'. If you want to disable
              notification via email just set admin to en empty string.

       realtime = <yes|no>
              If set to yes watchdog will lock itself into memory so it is never swapped out.

       priority = <schedule priority>
              Set the schedule priority for realtime mode passed to sched_setscheduler().

       test-directory = <test directory>
              Set the directory to run user test/repair scripts.   Default  is  '/etc/watchdog.d'
              See the Test Directory section in watchdog(8) for more information.

       log-dir = <log directory>
              Set  the  log  directory  to  capture  the  standard output and standard error from
              repair-binary and test-binary execution. Default is '/var/log/watchdog'.

       sigterm-delay = <time in seconds>
              Set the time on shut down between first sending SIGTERM to all processes, and  then
              sending  SIGKILL.  Default is 5 seconds which is generally enough, but systems with
              large databases or virtual machines might need longer.

       verbose = <level>
              This overrides the command line --verbose option. Generally  the  verbose  mode  is
              only  enabled  for  debugging  as  it  creates a lot of syslog chatter, so use this
              option with consideration. Zero is "normal" operation (quiet), while 1 is typically
              used for debugging. Values of 2 or more usually generate far too many messages.

       heartbeat-file = <filename>
              For debugging this allows a rolling set of status values to be kept on disk

       heartbeat-stamps = <interval>
              For debugging this sets the number of entries in the <heartbeat-file>

       log-killed-pids = <yes|no>
              This  acts  like enabling 'verbose' logging, but only for a system reboot, where it
              enables the logging of the PID values for all processes that are being killed.  The
              results  are  written  to  the  killall5.log  file  in the log directory (if at all
              possible) in this case.  Intended for debugging cases where you would like to  know
              what  was  running  at the point the machine triggered the watchdog, but don't want
              syslog filling up with the usual chatter of activity.

FILES

       /etc/watchdog.conf
              The watchdog configuration file

       /etc/watchdog.d
              A directory containing test-or-repair commands. See the Test Directory  section  in
              watchdog(8) for more information.

SEE ALSO

       watchdog(8)