oracular (1) pmie.1.gz

Provided by: pcp_6.3.0-1_amd64 bug

NAME

       pmie - inference engine for performance metrics

SYNOPSIS

       pmie  [-bCdeFfPqvVWxXz?]   [-a  archive]  [-A  align]  [-c filename] [-h host] [-l logfile] [-m note] [-j
       stompfile] [-n pmnsfile] [-o format] [-O offset] [-S starttime] [-t interval] [-T endtime] [-U  username]
       [-Z timezone] [filename ...]

DESCRIPTION

       pmie  accepts  a  collection  of  arithmetic,  logical, and rule expressions to be evaluated at specified
       frequencies.  The base data for the expressions consists of performance metrics values delivered in real-
       time  from  any  host  running the Performance Metrics Collection Daemon (PMCD), or using historical data
       from Performance Co-Pilot (PCP) archives.

       As well as computing arithmetic and logical values, pmie can execute actions (popup alarms, write  system
       log  messages,  and  launch  programs)  in  response to specified conditions.  Such actions are extremely
       useful in detecting, monitoring and correcting performance related problems.

       The expressions to be evaluated are read from configuration files  specified  by  one  or  more  filename
       arguments.  In the absence of any filename, expressions are read from standard input.

       Output from pmie is directed to standard output and standard error as follows:

       stdout
            Expression values printed in the verbose -v mode and the output of print actions.

       stderr
            Error and warning messages for any syntactic or semantic problems during expression parsing, and any
            semantic or performance metrics availability problems during expression evaluation.

OPTIONS

       The available command line options are:

       -a archive, --archive=archive
            archive which is a comma-separated list of names, each of which may be the base name of  an  archive
            or  the  name  of  a  directory  containing  one  or more archives written by pmlogger(1).  Multiple
            instances of the -a flag may appear on the command line to specify a list of sets of  archives.   In
            this  case,  it  is  required  that only one set of archives be present for any one host.  Also, any
            explicit host names occurring in a pmie expression must match the host name recorded in one  of  the
            archive  labels.   In the case of multiple sets of archives, timestamps recorded in the archives are
            used to ensure temporal consistency.

       -A align, --align=align
            Force the initial time window to be aligned on the boundary of a natural time unit align.  Refer  to
            PCPIntro(1) for a complete description of the syntax for align.

       -b, --buffer
            Output will be line buffered and standard output is attached to standard error.  This is most useful
            for background execution in conjunction with the -l option.  The -b option is always used  for  pmie
            instances launched from pmie_check(1).

       -c config, --config=config
            An alternative to specifying filename at the end of the command line.

       -C, --check
            Parse  the  configuration  file(s)  and  exit  before performing any evaluations.  Any errors in the
            configuration file are reported.

       -d, --interact
            Normally pmie would be launched as a non-interactive process to monitor and manage  the  performance
            of one or more hosts.  Given the -d flag however, execution is interactive and the user is presented
            with a menu of options.  Interactive mode is useful mainly for debugging new expressions.

       -e, --timestamp
            When used with -V, -v or -W, this option forces timestamps to be reported with each expression.  The
            timestamps  are in ctime(3) format, enclosed in parenthesis and appear after the expression name and
            before the expression value, e.g.
                 expr_1 (Tue Feb  6 19:55:10 2001): 12

       -f, --foreground
            If the -l option is specified and there is no -a option (i.e. real-time monitoring) then pmie is run
            as  a  daemon in the background (in all other cases foreground is the default).  The -f (and -F, see
            below) options force pmie to be run in the foreground, independent of any other options.

       -F, --systemd
            Like -f, the -F option runs pmie in the foreground, but also does some housekeeping (like  create  a
            pid  file, change user id and notify systemd(1) when pmie has started or is shutting down).  This is
            intended for use when pmie is launched from systemd(1) and the daemonising has  already  been  done.
            The -f and -F options are mutually exclusive.

       -h host, --host=host
            By  default  performance data is fetched from the local host (in real-time mode) or the host for the
            first named set of archives on the command line (in archive mode).  The host argument overrides this
            default.   It does not override hosts explicitly named in the expressions being evaluated.  The host
            argument is interpreted as a connection specification for pmNewContext, and is later mapped  to  the
            remote  pmcd's self-reported host name for reporting purposes.  See also the %h vs. %c substitutions
            in rule action strings below.

       -j file
            An alternative STOMP protocol configuration is loaded from stompfile.  If this option is  not  used,
            and  the  stomp  action is used in any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
            will be used.

       -l logfile, --logfile=logfile
            Standard error is sent to logfile.

       -m note, --note=note
            Used to indicate where pmie has been launched from, e.g.  pmie_check(1)  and  pmie_daily(1)  use  -m
            pmie_check  and  this  is  used  by  pmie  to  determine if it needs to be restarted should the PMCD
            hostname change, as described in the HOSTNAME CHANGES section below.

       -n pmnsfile, --namespace=pmnsfile
            An alternative Performance Metrics Name Space (PMNS) is loaded from the file pmnsfile.

       -o format, --format=format
            When precessing performance data from an archive, the -o option may be used to specify an  alternate
            output format when a rule action is executed.  See the DIFFERENCES IN HOST AND ARCHIVE MODES section
            for a description of how the output format may be constructed.

       -O origin, --origin=origin
            Specify the origin of the time window.  See PCPIntro(1) for complete description of this option.

       -P, --primary
            Identifies this as the primary pmie instance for a host.   See  the  ``AUTOMATIC  RESTART''  section
            below for further details.

       -q, --quiet
            Suppresses  diagnostic  messages that would be printed to standard output by default, especially the
            "evaluator exiting" message as this can confuse scripts.

       -S starttime, --start=starttime
            Specify the starttime of the time window.  See PCPIntro(1) for complete description of this option.

       -t interval, --interval=interval
            The interval argument follows the syntax described in PCPIntro(1), and in the simplest form  may  be
            an  unsigned  integer  (the implied units in this case are seconds).  The value is used to determine
            the sample interval for expressions that do not explicitly set their sample interval using the  pmie
            variable delta described below.  The default is 10.0 seconds.

       -T endtime, --finish=endtime
            Specify the endtime of the time window.  See PCPIntro(1) for complete description of this option.

       -U username, --username=username
            User  account under which to run pmie.  The default is the current user account for interactive use.
            When run as a daemon, the unprivileged "pcp" account is used in current  versions  of  PCP,  but  in
            older versions the superuser account ("root") was used by default.

       -v   Unless  one  of  the  verbose  options  -V,  -v  or  -W appears on the command line, expressions are
            evaluated silently, the only output is as a result of any actions being executed.   In  the  verbose
            mode,  specified using the -v flag, the value of each expression is printed as it is evaluated.  The
            values are in canonical units; bytes in the dimension of ``space'',  seconds  in  the  dimension  of
            ``time'' and events in the dimension of ``count''.  See pmLookupDesc(3) for details of the supported
            dimension and scaling mechanisms for performance metrics.  The verbose mode is useful in  monitoring
            the  value  of given expressions, evaluating derived performance metrics, passing these values on to
            other tools for further processing and in debugging new expressions.

       -V, --verbose
            This option has the same effect as the -v option, except that the name of the host and instance  (if
            applicable) are printed as well as expression values.

       -W   This  option  has  the  same  effect  as  the  -V  option  described  above, except that for boolean
            expressions, only those names and values that make the expression true are printed.  These  are  the
            same  names  and  values  accessible to rule actions as the %h, %i, %c and %v bindings, as described
            below.

       -x, --secret-agent
            Execute in domain agent mode.  This mode is used within the Performance Co-Pilot product  to  derive
            values  for summary metrics, see pmdasummary(1).  Only restricted functionality is available in this
            mode (expressions with actions may not be used).

       -X, --secret-applet
            Run in secret applet mode (thin client).

       -z, --hostzone
            Change the reporting timezone to the timezone of the host that is  the  source  of  the  performance
            metrics,  as  identified  via  either the -h option or the first named set of archives (as described
            above for the -a option).

       -Z timezone, --timezone=timezone
            Change the reporting timezone to timezone in the format of the environment variable TZ as  described
            in environ(7).

       -?, --help
            Display usage message and exit.

EXAMPLES

       The following example expressions demonstrate some of the capabilities of the inference engine.

       The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated examples of pmie expressions.

       The  variable  delta  controls  expression  evaluation frequency.  Specify that subsequent expressions be
       evaluated once a second, until further notice:

            delta = 1 sec;

       If the total context switch rate exceeds 10000 per second per CPU, then display an alarm notifier:

            kernel.all.pswitch / hinv.ncpu > 10000 count/sec
            -> alarm "high context switch rate %v";

       If the high context switch rate is sustained for  10  consecutive  samples,  then  launch  top(1)  in  an
       xterm(1) window to monitor processes, but do this at most once every 5 minutes:

            all_sample (
                kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
            ) -> shell 5 min "xterm -e 'top'";

       The following rules are evaluated once every 20 seconds:

            delta = 20 sec;

       If any disk is performing more than 60 I/Os per second, then print a message identifying the busy disk to
       standard output and launch dkvis(1):

            some_inst (
                disk.dev.total > 60 count/sec
            ) -> print "busy disks:" " %i" &
                 shell 5 min "dkvis";

       Refine the preceding rule to apply only between the hours  of  9am  and  5pm,  and  to  require  3  of  4
       consecutive samples to exceed the threshold before executing the action:

            $hour >= 9 && $hour <= 17 &&
            some_inst (
              75 %_sample (
                disk.dev.total @0..3 > 60 count/sec
              )
            ) -> print "disks busy for 20 sec:" " [%h]%i";

       The following two rules are evaluated once every 10 minutes:

            delta = 10 min;

       If  either  the / or the /usr filesystem is more than 95% full, display an alarm popup, but not if it has
       already been displayed during the last 4 hours:

            filesys.free #'/dev/root' /
                filesys.capacity #'/dev/root' < 0.05
            -> alarm 4 hour "root filesystem (almost) full";

            filesys.free #'/dev/usr' /
                filesys.capacity #'/dev/usr' < 0.05
            -> alarm 4 hour "/usr filesystem (almost) full";

       The following rule requires a machine that supports the lmsensors metrics.  If  the  machine  environment
       temperature rises more than 2 degrees over a 10 minute interval, write an entry in the system log:

            lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
            -> alarm "temperature rising fast" &
               syslog "machine room temperature rise alarm";

       And something interesting if you have performance problems with your Oracle database:

            // back to 30sec evaluations
            delta = 30 sec;
            sid = "ptg1";       # $ORACLE_SID setting
            lid = "223";        # latch ID from v$latch
            lru = "#'$sid/$lid cache buffers lru chain'";
            host = ":moomba.melbourne.sgi.com";
            gets = "oracle.latch.gets $host $lru";
            total = "oracle.latch.gets $host $lru +
                     oracle.latch.misses $host $lru +
                     oracle.latch.immisses $host $lru";

            $total > 100 && $gets / $total < 0.2
            -> alarm "high lru latch contention in database $sid";

       The  following  ruleset  will  emit  exactly  one  message depending on the availability and value of the
       1-minute load average.

            delta = 1 minute;
            ruleset
                 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
                     print "extreme load average %v"
            else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
                     print "moderate load average %v"
            unknown ->
                     print "load average unavailable"
            otherwise ->
                     print "load average OK"
            ;

       The following rule will emit a message when some filesystem is more than 75% full and  is  filling  at  a
       rate that if sustained would fill the filesystem to 100% in less than 30 minutes.

            some_inst (
                100 * filesys.used / filesys.capacity > 75 &&
                filesys.used + 30min * (rate filesys.used) > filesys.capacity
            ) -> print "filesystem will be full within 30 mins:" " %i";

       If  the  metric  mypmda.errors  counts  errors then the following rule will emit a message if the rate of
       errors exceeds 1 per second provided the error count is less than 100.

            mypmda.errors > 1 && instant mypmda.errors < 100
            -> print "high error rate: %v";

QUICK START

       The pmie specification language is powerful and large.

       To expedite rapid development of pmie rules, the pmieconf(1) tool provides a facility  for  generating  a
       pmie  configuration  file  from a set of generalized pmie rules.  The supplied set of rules covers a wide
       range of performance scenarios.

       The Performance Co-Pilot User's and Administrator's Guide  provides  a  detailed  tutorial-style  chapter
       covering pmie.

EXPRESSION SYNTAX

       This  description  is  terse  and informal.  For a more comprehensive description see the Performance Co-
       Pilot User's and Administrator's Guide.

       A pmie specification is a sequence of semicolon terminated expressions.

       Basic operators are modeled on the arithmetic, relational and Boolean  operators  of  the  C  programming
       language.   Precedence  rules  are  as expected, although the use of parentheses is encouraged to enhance
       readability and remove ambiguity.

       Operands are performance metric names (see PMNS(5)) and the normal literal constants.

       Operands involving performance metrics may produce sets of values, as a  result  of  enumeration  in  the
       dimensions  of  hosts, instances and time.  Special qualifiers may appear after a performance metric name
       to define the enumeration in each dimension.  For example,

           kernel.percpu.cpu.user :foo :bar #cpu0 @0..2

       defines 6 values corresponding to the time spent executing in user mode on CPU 0 on the hosts ``foo'' and
       ``bar''  over  the  last 3 consecutive samples.  The default interpretation in the absence of : (host), #
       (instance) and @ (time) qualifiers is all instances at the most recent sample time for the default source
       of PCP performance metrics.

       Host  and  instance  names  that  do  not  follow  the rules for variables in programming languages, i.e.
       alphabetic optionally followed by alphanumerics, should be enclosed in single quotes.

       Expression evaluation follows the law  of  ``least  surprises''.   Where  performance  metrics  have  the
       semantics  of a counter, pmie will automatically convert to a rate based upon consecutive samples and the
       time interval between these samples.  All numeric expressions are  evaluated  in  double  precision,  and
       where appropriate, automatically scaled into canonical units of ``bytes'', ``seconds'' and ``counts''.

       A  rule  is  a  special  form  of  expression that specifies a condition or logical expression, a special
       operator (->) and actions to be performed when the condition is found to be true.

       The following table summarizes the basic pmie operators:

                          ┌────────────────┬────────────────────────────────────────────────┐
                          │   Operators    │                  Explanation                   │
                          ├────────────────┼────────────────────────────────────────────────┤
                          │+ - * /         │ Arithmetic                                     │
                          │< <= == >= > != │ Relational (value comparison)                  │
                          │! && ||         │ Boolean                                        │
                          │->              │ Rule                                           │
                          │rising          │ Boolean, false to true transition              │
                          │falling         │ Boolean, true to false transition              │
                          │rate            │ Explicit rate conversion (rarely required)     │
                          │instant         │ No automatic rate conversion (rarely required) │
                          └────────────────┴────────────────────────────────────────────────┘
       All operators are supported for numeric-valued operands and  expressions.   For  string-valued  operands,
       namely  literal  string  constants  enclosed  in  double  quotes  or  metrics  with a data type of string
       (PM_TYPE_STRING), only the operators == and != are supported.

       The rate and instant operators are the logical inverse of one another, so an arithmetic  expression  expr
       is  equal  to  rate  instant  expr.  The more useful cases involve using rate with a metric that is not a
       counter to determine the rate of change over time or instant with a metric that is a counter to determine
       if the current value is above or below some threshold.

       Aggregate operators may be used to aggregate or summarize along one dimension of a set-valued expression.
       The following aggregate operators map from  a  logical  expression  to  a  logical  expression  of  lower
       dimension.

                          ┌─────────────────────────┬─────────────┬──────────────────────────┐
                          │       Operators         │    Type     │       Explanation        │
                          ├─────────────────────────┼─────────────┼──────────────────────────┤
                          │some_inst                │ Existential │ True if at least one set │
                          │some_host                │             │ member is true in the    │
                          │some_sample              │             │ associated dimension     │
                          ├─────────────────────────┼─────────────┼──────────────────────────┤
                          │all_inst                 │ Universal   │ True if all set members  │
                          │all_host                 │             │ are true in the          │
                          │all_sample               │             │ associated dimension     │
                          ├─────────────────────────┼─────────────┼──────────────────────────┤
                          │N%_inst                  │ Percentile  │ True if at least N       │
                          │N%_host                  │             │ percent of set members   │
                          │N%_sample                │             │ are true in the          │
                          │                         │             │ associated dimension     │
                          └─────────────────────────┴─────────────┴──────────────────────────┘
       The following instantial operators may be used to filter or limit a set-valued logical expression,  based
       on  regular  expression  matching  of instance names.  The logical expression must be a set involving the
       dimension of instances, and the regular expression is of the  form  used  by  egrep(1)  or  the  Extended
       Regular Expressions of regcomp(3).

                               ┌─────────────┬──────────────────────────────────────────┐
                               │ Operators   │               Explanation                │
                               ├─────────────┼──────────────────────────────────────────┤
                               │match_inst   │ For each value of the logical expression │
                               │             │ that is ``true'', the result is ``true'' │
                               │             │ if the associated instance name matches  │
                               │             │ the regular expression.  Otherwise the   │
                               │             │ result is ``false''.                     │
                               ├─────────────┼──────────────────────────────────────────┤
                               │nomatch_inst │ For each value of the logical expression │
                               │             │ that is ``true'', the result is ``true'' │
                               │             │ if the associated instance name does not │
                               │             │ match the regular expression.  Otherwise │
                               │             │ the result is ``false''.                 │
                               └─────────────┴──────────────────────────────────────────┘
       For  example,  the  expression below will be ``true'' for disks attached to controllers 2 or 3 performing
       more than 20 operations per second:
            match_inst "^dks[23]d" disk.dev.total > 20;

       The following aggregate operators map from an arithmetic expression to an arithmetic expression of  lower
       dimension.

                           ┌─────────────────────────┬───────────┬──────────────────────────┐
                           │       Operators         │   Type    │       Explanation        │
                           ├─────────────────────────┼───────────┼──────────────────────────┤
                           │min_inst                 │ Extrema   │ Minimum value across all │
                           │min_host                 │           │ set members in the       │
                           │min_sample               │           │ associated dimension     │
                           ├─────────────────────────┼───────────┼──────────────────────────┤
                           │max_inst                 │ Extrema   │ Maximum value across all │
                           │max_host                 │           │ set members in the       │
                           │max_sample               │           │ associated dimension     │
                           ├─────────────────────────┼───────────┼──────────────────────────┤
                           │sum_inst                 │ Aggregate │ Sum of values across all │
                           │sum_host                 │           │ set members in the       │
                           │sum_sample               │           │ associated dimension     │
                           ├─────────────────────────┼───────────┼──────────────────────────┤
                           │avg_inst                 │ Aggregate │ Average value across all │
                           │avg_host                 │           │ set members in the       │
                           │avg_sample               │           │ associated dimension     │
                           └─────────────────────────┴───────────┴──────────────────────────┘
       The  aggregate  operators  count_inst,  count_host  and  count_sample map from a logical expression to an
       arithmetic expression of lower dimension by counting the number of set members for which  the  expression
       is true in the associated dimension.

       For action rules, the following actions are defined:

                                 ┌──────────┬────────────────────────────────────────┐
                                 │Operators │              Explanation               │
                                 ├──────────┼────────────────────────────────────────┤
                                 │alarm     │ Raise a visible alarm with xconfirm(1) │
                                 │print     │ Display on standard output             │
                                 │shell     │ Execute with sh(1)                     │
                                 │stomp     │ Send a STOMP message to a JMS server   │
                                 │syslog    │ Append a message to system log file    │
                                 └──────────┴────────────────────────────────────────┘
       Multiple  actions  may be separated by the & and | operators to specify respectively sequential execution
       (both actions are executed) and alternate execution (the second action  will  only  be  executed  if  the
       execution of the first action returns a non-zero error status.

       Arguments  to  actions are an optional suppression time, and then one or more expressions (a string is an
       expression in this context).  Strings appearing as arguments to  an  action  may  include  the  following
       special selectors that will be replaced at the time the action is executed.

       %h  Host name(s) that make the left-most top-level expression in the condition true.

       %c  Connection  specification  string(s) or files for a PCP tool to reach the hosts or archives that make
           the left-most top-level expression in the condition true.

       %i  Instance(s) that make the left-most top-level expression in the condition true.

       %v  One value from the left-most top-level expression in the condition for each host  and  instance  pair
           that makes the condition true.

       Note that expansion of the special selectors is done by repeating the whole argument once for each unique
       binding to any of the qualifying special selectors.  For example if a rule were true for the host  mumble
       with  instances  grunt  and  snort,  and  for host fumble the instance puff makes the rule true, then the
       action
            ...
            -> shell myscript "Warning: %h:%i busy ";
       will execute myscript with the argument string "Warning: mumble:grunt  busy  Warning:  mumble:snort  busy
       Warning: fumble:puff busy".

       By comparison, if the action
            ...
            -> shell myscript "Warning! busy:" " %h:%i";
       were  executed  under  the  same  circumstances, then myscript would be executed with the argument string
       "Warning! busy: mumble:grunt mumble:snort fumble:puff".

       The semantics of the expansion of the special selectors leads to a common usage  pattern  in  an  action,
       where one argument is a constant (contains no special selectors) the second argument contains the desired
       special selectors with minimal separator characters, and an optional third argument provides  a  constant
       postscript  (e.g.  to  terminate  any  argument  quoting  from  the  first argument).  If necessary post-
       processing (e.g. in myscript) can provide the necessary enumeration over each  unique  expansion  of  the
       string containing just the special selectors.

       For  complex conditions, the bindings to these selectors is not obvious.  It is strongly recommended that
       pmie be used in the debugging mode (specify the  -W  command  line  option  in  particular)  during  rule
       development.

BOOLEAN EXPRESSIONS

       pmie  expressions  that have the semantics of a Boolean, e.g.  foo.bar > 10 or some_inst ( my.table < 0 )
       are assigned the values true or false or unknown.  A value is unknown if one or more  of  the  underlying
       metric values is unavailable, e.g.  pmcd(1) on the host cannot be contacted, the metric is not in the PCP
       archive, no values are currently available, insufficient  values  have  been  fetched  to  allow  a  rate
       converted  value  to  be  computed  or  insufficient values have been fetched to instantiate the required
       number of samples in the temporal domain.

       Boolean operators follow the normal rules of Kleene logic (aka 3-valued logic) when combining values that
       include unknown:

                                       ┌────────────┬───────────────────────────┐
                                       │            │             B             │
                                       │  A and B   ├─────────┬───────┬─────────┤
                                       │            │  truefalseunknown │
                                       ├──┬─────────┼─────────┼───────┼─────────┤
                                       │  │  truetruefalseunknown │
                                       │  ├─────────┼─────────┼───────┼─────────┤
                                       │A │  falsefalsefalsefalse  │
                                       │  ├─────────┼─────────┼───────┼─────────┤
                                       │  │ unknownunknownfalseunknown │
                                       └──┴─────────┴─────────┴───────┴─────────┘
                                       ┌────────────┬──────────────────────────┐
                                       │            │            B             │
                                       │  A or B    ├──────┬─────────┬─────────┤
                                       │            │ truefalseunknown │
                                       ├──┬─────────┼──────┼─────────┼─────────┤
                                       │  │  truetruetruetrue   │
                                       │  ├─────────┼──────┼─────────┼─────────┤
                                       │A │  falsetruefalseunknown │
                                       │  ├─────────┼──────┼─────────┼─────────┤
                                       │  │ unknowntrueunknownunknown │
                                       └──┴─────────┴──────┴─────────┴─────────┘
                                                  ┌────────┬─────────┐
                                                  │   A    │  not A  │
                                                  ├────────┼─────────┤
                                                  │ truefalse  │
                                                  ├────────┼─────────┤
                                                  │ falsetrue   │
                                                  ├────────┼─────────┤
                                                  │unknownunknown │
                                                  └────────┴─────────┘

RULESETS

       The  ruleset  clause  is used to define a set of rules and actions that are evaluated in order until some
       action is executed, at which point the remaining rules and actions are skipped until the ruleset is again
       scheduled  for  evaluation.  The keyword else is used to separate rules.  After one or more regular rules
       (with a predicate and an action), a ruleset may include an optional
            unknown -> action
       clause, optionally followed by a
            otherwise -> action
       clause.

       If all of the predicates in the rules evaluate to unknown and an unknown clause has been  specified  then
       action associated with the unknown clause will be executed.

       If  no  rule  predicate  is  true  and  the unknown action is either not specified or not executed and an
       otherwise clause has been specified, then the  action  associated  with  the  otherwise  clause  will  be
       executed.

SCALE FACTORS

       Scale  factors  may  be  appended  to  arithmetic  expressions  and  force linear scaling of the value to
       canonical units.  Simple scale factors are constructed from  the  keywords:  nanosecond,  nanosec,  nsec,
       microsecond,  microsec,  usec,  millisecond, millisec, msec, second, sec, minute, min, hour, byte, Kbyte,
       Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and the operator /, for example ``Kbytes / hour''.

MACROS

       Macros are defined using expressions of the form:

            name = constexpr;

       Where name follows the normal rules for variables in programming languages,  i.e.  alphabetic  optionally
       followed  by alphanumerics.  constexpr must be a constant expression, either a string (enclosed in double
       quotes) or an arithmetic expression optionally followed by a scale factor.

       Macros are expanded when their name, prefixed by a dollar ($) appears in an expression, and macros may be
       nested within a constexpr string.

       The following reserved macro names are understood.

       minute    Current minute of the hour.

       hour      Current hour of the day, in the range 0 to 23.

       day       Current day of the month, in the range 1 to 31.

       month     Current month of the year, in the range 0 (January) to 11 (December).

       year      Current year.

       day_of_week
                 Current day of the week, in the range 0 (Sunday) to 6 (Saturday).

       delta     Sample interval in effect for this expression.

       Dates  and  times  are  presented  in  the reporting time zone (see description of -Z and -z command line
       options above).

AUTOMATIC RESTART

       It is often useful for pmie processes to be started  and  stopped  when  the  local  host  is  booted  or
       shutdown,  or  when  they have been detected as no longer running (when they have unexpectedly exited for
       some reason).  Refer to pmie_check(1) for details on automating this process.

       Optionally, each system running pmcd(1) may also be configured to run a ``primary'' pmie instance.   This
       pmie    instance    is    launched    by    $PCP_RC_DIR/pmie,    and    is    affected   by   the   files
       $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use chkconfig(8), systemctl(1) or similar
       platform-specific    commands    to    activate    or    disable   the   primary   pmie   instance)   and
       $PCP_VAR_DIR/config/pmie/config.default (the default initial configuration file for the primary pmie).

       The primary pmie instance is identified by the -P option.  There may be  at  most  one  ``primary''  pmie
       instance  on  each  system.   The  primary pmie instance (if any) must be running on the same host as the
       pmcd(1) to which it connects (if any), so the -h and -P options are mutually exclusive.

EVENT MONITORING

       It is common for production systems to be monitored in a central location.  Traditionally on UNIX systems
       this  has  been  performed  by  the  system  log facilities - see logger(1), and syslogd(1).  On Windows,
       communication with the system event log is handled by pcp-eventlog(1).

       pmie fits into this model when rules use the syslog action.  Note that if the action string  begins  with
       -p  (priority) and/or -t (tag) then these are extracted from the string and treated in the same way as in
       logger(1) and pcp-eventlog(1).

       However, it is common to have other event  monitoring  frameworks  also,  into  which  you  may  wish  to
       incorporate  performance  events  from  pmie.  You can often use the shell action to send events to these
       frameworks, as they usually provide their a program for injecting events into the framework from external
       sources.

       A final option is use of the stomp (Streaming Text Oriented Messaging Protocol) action, which allows pmie
       to connect to a central JMS (Java Messaging System) server and send events to the PMIE topic.  Tools  can
       be  written  to  extract  these  text  messages  and present them to operations people (via desktop popup
       windows, etc).  Use of the stomp action requires a stomp configuration file to be setup, which  specifies
       the location of the JMS server host, port number, and username/password.

       The format of this file is as follows:

            host=messages.sgi.com   # this is the JMS server (required)
            port=61616              # and its listening here (required)
            timeout=2               # seconds to wait for server (optional)
            username=joe            # (required)
            password=j03ST0MP       # (required)
            topic=PMIE              # JMS topic for pmie messages (optional)

       The timeout value specifies the time (in seconds) that pmie should wait for acknowledgements from the JMS
       server after sending a message (as required by the STOMP protocol).  Note that on startup, pmie will wait
       indefinitely  for a connection, and will not begin rule evaluation until that initial connection has been
       established.  Should the connection to the JMS server be lost at any time while  pmie  is  running,  pmie
       will  attempt  to reconnect on each subsequent truthful evaluation of a rule with a stomp action, but not
       more than once per minute.  This is to avoid contributing to  network  congestion.   In  this  situation,
       where  the  STOMP  connection to the JMS server has been severed, the stomp action will return a non-zero
       error value.

DIFFERENCES IN HOST AND ARCHIVE MODES

       When running in host mode, the delta interval for each rule determines a  real-time  delay  between  rule
       evaluation, so pmie spends most if its time sleeping and waiting for the next scheduled rule evaluation.

       When  running in archive mode, pmie uses the delta interval for each rule to determine how frequently the
       rules are evaluated against the archive data, but unlike host mode there are no real-time delays  as  the
       archive is ``replayed'' as fast as possible.

       In  archive  mode  when  a rule predicate evaluates true then the action is modified, so that rather than
       posting to syslog or raising a visible alarm or running a shell command or sending a stomp message,  pmie
       prints  the  name  of  the  action, the timestamp from the archive when the rule predicate triggering the
       action was true and all of the arguments that would have been passed to the real action in host mode.

       For example, given the rule:
            delta = 10 sec;
            kernel.all.nprocs > 10 * hinv.ncpu -> print "lotsaprocs:" " %v";
       when run against an archive, the output appears as:
            print Mon Sep  4 00:10:21 2017: lotsaprocs: 1292
            print Mon Sep  4 00:10:31 2017: lotsaprocs: 1294
            print Mon Sep  4 00:10:41 2017: lotsaprocs: 1291
            ...

       The rationale is that the context in which the action would have been executed (in host mode)  was  at  a
       time  in  the  past and the possibly on a different host (if the archive was collected from one host, but
       pmie is being run on a different host).  So flooding syslog with  misleading  messages  or  an  avalanche
       visual  alarms  or  a lot of STOMP messages or a shell command that might not even work on the host where
       pmie is being run, are all examples of ``badness'' to be avoided.  Rather the output is text in a regular
       format suitable for post-processing with a range of filters and performance analysis tools.

       The  output  format  can  be  changed  using  the -o option which consists of literal characters with the
       following embedded ``meta-field'' tokens:

       %a  The name of the action, e.g.  print, syslog, etc.

       %d  The date and time in ctime(3) format when the action would have been executed.

       %f  The name of the configuration file containing the action being executed, else <stdin>  if  the  rules
           were read from standard input.

       %l  The (approximate) line number in the configuration file for the action being executed.

       %m  The message component of the action.

       %u  The  date  and  time  when  the  action  would  have  been  executed in extended ctime(3) format with
           microsecond precision for the time.

       %%  A literal percent character.

       The default output format is equivalent to a format of %a %d: %m.

SIGNALS

       If pmie is sent a SIGHUP signal, the logfile will be closed, unlinked and re-opened.   This  is  used  by
       pmie_daily(1) to achieve nightly log rotation.

       Most  of  the  time pmie is sleeping, waiting until the next set of rules needs to be evaluated.  Sending
       pmie a SIGUSR1 signal will cause the details for the next set of rules to be dumped on logfile, including
       how  long the current sleep is and how much time remains.  The scheduling of rules is not changed by this
       action.

HOSTNAME CHANGES

       The hostname of the PMCD that is providing metrics to pmie is used in several ways.

       PMCD's hostname is user internally to provide a value for the %h substitutions in rule action strings.

       For pmie instances using a local PMCD that are launched and managed by pmie_check(1)  and  pmie_daily(1),
       (or  the  systemd(1)  or cron(8) services that use these scripts), the local hostname may also be used to
       construct  the  name  of  a  directory  where  the  pmie   logs   for   one   host   are   stored,   e.g.
       $PCP_LOG_DIR/pmie/<hostname>.

       The  hostname  of  the PMCD host may change during boot time when the system transitions from a temporary
       hostname to a persistent hostname, or by explicit administrative action anytime after the system has been
       booted.   When  this happens, pmie may need to take special action, specifically if the pmie instance was
       launched from pmie_check(1) or pmie_daily(1), then pmie must exit.  Under normal circumstances systemd(1)
       or cron(8) will launch a new pmie shortly thereafter, and this new pmie instance will be operating in the
       context of the new hostname for the host where PMCD is running.

BUGS

       The lexical scanner and parser will attempt to recover after an error in the input expressions.   Parsing
       resumes  after  skipping  input  up  to the next semi-colon (;), however during this skipping process the
       scanner is ignorant of comments and strings, so an embedded semi-colon may cause parsing to resume at  an
       unexpected  place.  This behavior is largely benign, as until the initial syntax error is corrected, pmie
       will not attempt any expression evaluation.

FILES

       $PCP_DEMOS_DIR/pmie/*
            annotated example rules

       $PCP_VAR_DIR/pmns/*
            default PMNS specification files

       $PCP_TMP_DIR/pmie
            pmie maintains files in this directory to identify the running pmie instances and to export  runtime
            information about each instance - this data forms the basis of the pmcd.pmie performance metrics

       $PCP_PMIECONTROL_PATH
            the default set of pmie instances to start at boot time - refer to pmie_check(1) for details

PCP ENVIRONMENT

       Environment  variables with the prefix PCP_ are used to parameterize the file and directory names used by
       PCP.  On each installation, the file /etc/pcp.conf contains the local values for  these  variables.   The
       $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

       When  executing  shell  actions,  pmie overrides two variables - IFS and PATH - in the environment of the
       child process.  IFS is set to "\t\n".  The PATH is set to  a  combination  of  a  default  path  for  all
       platforms  ("/usr/sbin:/sbin:/usr/bin:/bin")  and  several  configurable  components.  These are (in this
       order): $PCP_BIN_DIR, $PCP_BINADM_DIR and $PCP_PLATFORM_PATHS.

       When executing popup alarm actions,  pmie  will  use  the  value  of  $PCP_XCONFIRM_PROG  as  the  visual
       notification program to run.  This is typically set to pmconfirm(1), a cross-platform dialog box.

UNIX SEE ALSO

       logger(1).

WINDOWS SEE ALSO

       pcp-eventlog(1).

SEE ALSO

       PCPIntro(1),  pmcd(1),  pmconfirm(1), pmie_check(1), pmieconf(1), pmie_daily(1), pminfo(1), pmlogdump(1),
       pmlogger(1), pmval(1), systemd(1), ctime(3), PMAPI(3), pcp.conf(5), pcp.env(5) and PMNS(5).

USER GUIDE

       For a more complete description of the pmie  language,  refer  to  the  Performance  Co-Pilot  Users  and
       Administrators Guide.  This is available online from:
           https://pcp.readthedocs.io/en/latest/UAG/PerformanceMetricsInferenceEngine.html