Ubuntu Manpage: watchdog.conf - configuration file for the watchdog daemon

NAME

       watchdog.conf - configuration file for the watchdog daemon

DESCRIPTION

       This  file  carries  all configuration options for the Linux watchdog daemon.  Each option
       has to be written on a line for itself. Comments  start  with  '#'.   Blanks  are  ignored
       except  after  the '=' sign. An empty text after the '=' sign disables the feature as long
       as that makes sense.

OPTIONS

interval = <interval>
Set the highest possible interval between two writes to the watchdog device. The
device is triggered after each check regardless of the time it took. After
finishing all checks watchdog goes to sleep for a full cycle of <interval> seconds.
Default value is 1 second. The kernel drivers typically expects a write command
every minute otherwise the system will be rebooted. Therefore an interval of more
than a minute can only be used with the force command-line option [--force | -f].

logtick = <logtick>
If you enable verbose logging, a message is written into the syslog or a logfile.
While this is nice, it is not necessary to get a message every interval which
really fills up disk and needs CPU. logtick allows adjustment of the number of
intervals skipped before a log message is written. If you use logtick = 60 and
interval = 10, only every 10 minutes (600 seconds) a message is written. This may
make the exact time of a crash harder to find but greatly reduces disk usage and
administrator nerves if you're looking for a particular syslog entry in between of
watchdog messages.

max-load-1 = <load1>
Set the maximal allowed load average for a 1 minute span. Once this load average is
reached the system is rebooted. Default value is 0. That means the load average
check is disabled. Be careful not to set this parameter too low. To set a value
less then the predefined minimal value of 2, you have to use the -f command line
option.

max-load-5 = <load5>
Set the maximal allowed load average for a 5 minute span. Once this load average is
reached the system is rebooted. Default value is 3/4*max-load-1. Be careful not to
this parameter too low. To set a value less then the predefined minimal value of 2,
you have to use the -f command line option.

max-load-15 = <load15>
Set the maximal allowed load average for a 15 minute span. Once this load average
is reached the system is rebooted. Default value is 1/2*max-load-1. Be careful not
to this parameter too low. To set a value less then the predefined minimal value of
2, you have to use the -f command line option.

min-memory = <minpage>
Set the minimal amount of memory that has to stay free. Note that this is in memory
pages (4kB on x86). Default value is 0 pages which means this test is disabled. The
page size is taken from the system include files. The usable memory is computed
from MemFree + Buffers + Cached since buffer and cache use typically expand to use
most free memory but the kernel will reclaim this as needed. NOTE: If this measure
gets below a few tens of MB then the system will page swap aggressively have poorer
file system performance due to the lack of caching. This is a 'passive' test and
works by reading /proc/meminfo

allocatable-memory = <minpage>
Set the minimum amount of allocatable memory available on the system. Note that
this is in pages. Default value is 0 pages which means the test is disabled. As
with min-memory, the page size is taken from the system include files. This is an
'active' test and it works by attempting to memory-map a block of the configured
size.

max-swap = <maxpage>
Set the maximum amount of swap use. Note that this is in memory pages (4kB on x86).
Default value is 0 pages which means this test is disabled. Often this should be a
large portion of available swap, but remember that paging 1GB of swap can take
several/tens of seconds. This is a 'passive' test and works by reading
/proc/meminfo

watchdog-device = <device>
Set the watchdog device name, typically /dev/watchdog. Default is to disable keep
alive support. This should be tested by running the daemon from the command line
before configuring it to start automatically on booting.

watchdog-refresh-use-settimeout = <auto|yes|no>
Refresh watchdog timer by setting its timeout instead of using a normal watchdog
refresh operation. Might help if your watchdog trips by itself when the first
timeout interval elapses. Default is 'auto' for IT87 fix-up but this can be
disabled with 'no' or forced for other modules with 'yes'.

watchdog-refresh-ignore-errors = <yes|no>
Ignore errors reported by writing to the watchdog device. Typically this is used
for systems that have broken implementations of the IPMI driver to avoid a reboot
loop.

watchdog-timeout = <timeout>
Set the watchdog device timeout during startup. If not set, a default is used that
should be set to the kernel timer margin at compile time.

temperature-sensor = <temp-virtual-file>
Set the temperature sensor name. This is normally a 'virtual file' under /sys and
it contains the temperature in milli-Celsius. Usually these are generated by the
sensors package, but take care as device enumeration may not be fixed. Default is
to disable temperature checking. Multiple sensors can be used by having repeated
temperature-sensor entries. Due to the enumeration problem any missing temp sensor
is simply ignored and not treated as a reboot trigger.

max-temperature = <temp>
Set the maximal allowed temperature in Celsius. Once this temperature is reached
the system is stopped. Default value is 90 C. Watchdog will issue warnings once the
temperature increases 90%, 95% and 98% of this temperature.

temp-power-off = <yes|no>
Set the watchdog action on overheating. Yes option (default) is to power the
machine off, no option is to halt machine and allow Ctrl-Alt-Del reboot.

file = <filename>
Set file name for file mode. This option can be given as often as you like to
check several files.

change = <mtime>
Set the change interval time for file mode. This options always belongs to the
active filename, that is when finding a 'change =' line watchdog assumes it belongs
to the most recently read 'file =' line. They don't necessarily have to follow
each other directly. But you cannot specify a 'change =' before a 'file ='. The
default is to only stat the file and don't look for changes. Using this feature to
monitor changes in /var/log/messages might require some special syslog daemon
configuration, e.g. rsyslog needs "$ActionWriteAllMarkMessages on" to be set to
make sure the marks are written no matter what.

pidfile = <pidfilename>
Set pidfile name for daemon test mode. This option can be given as often as you
like to check several daemons, assuming they write their post-forking PID to the
specified files.

ping = <ip-addr>
Set IPv4 address for ping mode. This option can be used more than once to check
different connections.

ping-count = <ping-per-interval>
Set the number of ping attempts in each 'interval' of time. Default is 3 and it
completes on the first successful ping.

interface = <if-name>
Set interface name for network mode. This option can be used more than once to
check different interfaces. Note it is only possible to check physical interfaces,
and not aliased IP interfaces.

test-binary = <testbin>
Execute the given binary to do some user defined tests.

test-timeout = <timeout in seconds>
User defined tests may only run for <timeout> seconds. Set to 0 for unlimited.

repair-binary = <repbin>
Execute the given binary in case of a problem instead of shutting down the system.

repair-timeout = <timeout in seconds>
repair command may only run for <timeout> seconds. Set to 0 for 'unlimited', but
note that the hardware timer is not refreshed in this case so the system will hard-
reset at some point.

retry-timeout = <timeout in seconds>
Allow most error conditions to persist for <timeout> seconds. Set to 0 for
immediate action (like softboot behaviour).

repair-maximum = <count>
This allows no more then <count> repair attempts against a given fault that report
success (i.e. return 0), but fail to clear the fault, before a reboot is initiated
anyway. If set to zero then a repairable fault can always be blocked by a repair
program reporting success (previous daemon behaviour).

softboot-option = <yes|no>
This acts like the -b / --softboot command line and simply sets the retry timeout
to zero.

admin = <mail-address>
Email address to send admin mail to. That is, who shall be notified that the
machine is being halted or rebooted. Default is 'root'. If you want to disable
notification via email just set admin to en empty string.

realtime = <yes|no>
If set to yes watchdog will lock itself into memory so it is never swapped out.

priority = <schedule priority>
Set the schedule priority for realtime mode passed to sched_setscheduler().

test-directory = <test directory>
Set the directory to run user test/repair scripts. Default is '/etc/watchdog.d'
See the Test Directory section in watchdog(8) for more information.

log-dir = <log directory>
Set the log directory to capture the standard output and standard error from
repair-binary and test-binary execution. Default is '/var/log/watchdog'.

sigterm-delay = <time in seconds>
Set the time on shut down between first sending SIGTERM to all processes, and then
sending SIGKILL. Default is 5 seconds which is generally enough, but systems with
large databases or virtual machines might need longer.

verbose = <level>
This overrides the command line --verbose option. Generally the verbose mode is
only enabled for debugging as it creates a lot of syslog chatter, so use this
option with consideration. Zero is "normal" operation (quiet), while 1 is typically
used for debugging. Values of 2 or more usually generate far too many messages.

heartbeat-file = <filename>
For debugging this allows a rolling set of status values to be kept on disk

heartbeat-stamps = <interval>
For debugging this sets the number of entries in the <heartbeat-file>

log-killed-pids = <yes|no>
This acts like enabling 'verbose' logging, but only for a system reboot, where it
enables the logging of the PID values for all processes that are being killed. The
results are written to the killall5.log file in the log directory (if at all
possible) in this case. Intended for debugging cases where you would like to know
what was running at the point the machine triggered the watchdog, but don't want
syslog filling up with the usual chatter of activity.

FILES

       /etc/watchdog.conf
              The watchdog configuration file

       /etc/watchdog.d
              A directory containing test-or-repair commands. See the Test Directory  section  in
              watchdog(8) for more information.

NAME

DESCRIPTION

OPTIONS

FILES

SEE ALSO