Provided by: pcp_3.8.12ubuntu1_amd64 bug

NAME

       pmie - inference engine for performance metrics

SYNOPSIS

       pmie  [-bCdefHVvWxz]  [-A  align]  [-a  archive]  [-c filename] [-h host] [-l logfile] [-j stompfile] [-n
       pmnsfile] [-O offset] [-S starttime] [-T endtime] [-t interval] [-U  username]  [-Z  timezone]  [filename
       ...]

DESCRIPTION

       pmie  accepts  a  collection  of  arithmetic,  logical, and rule expressions to be evaluated at specified
       frequencies.  The base data for the expressions consists of performance metrics values delivered in real-
       time from any host running the Performance Metrics Collection Daemon (PMCD),  or  using  historical  data
       from Performance Co-Pilot (PCP) archive logs.

       As  well as computing arithmetic and logical values, pmie can execute actions (popup alarms, write system
       log messages, and launch programs) in response to  specified  conditions.   Such  actions  are  extremely
       useful in detecting, monitoring and correcting performance related problems.

       The  expressions  to  be  evaluated  are  read from configuration files specified by one or more filename
       arguments.  In the absence of any filename, expressions are read from standard input.

       A description of the command line options specific to pmie follows:

       -a   archive is the base name of a PCP archive log written by pmlogger(1).  Multiple instances of the  -a
            flag may appear on the command line to specify a set of archives.  In this case, it is required that
            only  one  archive  be  present for any one host.  Also, any explicit host names occurring in a pmie
            expression must match the host name recorded in one of the archive labels.  In the case of  multiple
            archives, timestamps recorded in the archives are used to ensure temporal consistency.

       -b   Output will be line buffered and standard output is attached to standard error.  This is most useful
            for  background  execution in conjunction with the -l option.  The -b option is always used for pmie
            instances launched from pmie_check(1).

       -C   Parse the configuration file(s) and exit before performing  any  evaluations.   Any  errors  in  the
            configuration file are reported.

       -c   An alternative to specifying filename at the end of the command line.

       -d   Normally  pmie  would be launched as a non-interactive process to monitor and manage the performance
            of one or more hosts.  Given the -d flag however, execution is interactive and the user is presented
            with a menu of options.  Interactive mode is useful mainly for debugging new expressions.

       -e   When used with -V, -v or -W, this option forces timestamps to be reported with each expression.  The
            timestamps are in ctime(3) format, enclosed in parenthesis and appear after the expression name  and
            before the expression value, e.g.
                 expr_1 (Tue Feb  6 19:55:10 2001): 12

       -f   If  the -l option is specified and there is no -a option (ie. real-time monitoring) then pmie is run
            as a daemon in the background (in all other cases foreground is the default).  The -f option  forces
            pmie to be run in the foreground, independent of any other options.

       -H   The default hostname written to the stats file will not be looked up via gethostbyname(3), rather it
            will  be  written as-is.  This option can be useful when host name aliases are in use at a site, and
            the logical name is more important than the physical host name.

       -h   By default performance data is fetched from the local host (in real-time mode) or the host  for  the
            first  named  archive  on  the  command  line  (in  archive mode).  The host argument overrides this
            default.  It does not override hosts explicitly named in the expressions being evaluated.

       -l   Standard error is sent to logfile.

       -j   An alternative STOMP protocol configuration is loaded from stompfile.  If this option is  not  used,
            and  the  stomp  action is used in any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
            will be used.

       -n   An alternative Performance Metrics Name Space (PMNS) is loaded from the file pmnsfile.

       -t   The interval argument follows the syntax described in PCPIntro(1), and in the simplest form  may  be
            an  unsigned  integer  (the implied units in this case are seconds).  The value is used to determine
            the sample interval for expressions that do not explicitly set their sample interval using the  pmie
            variable delta described below.  The default is 10.0 seconds.

       -U username
            User  account under which to run pmie.  The default is the current user account for interactive use.
            When run as a daemon, the unprivileged "pcp" account is used in current  versions  of  PCP,  but  in
            older versions the superuser account ("root") was used by default.

       -v   Unless  one  of  the  verbose  options  -V,  -v  or  -W appears on the command line, expressions are
            evaluated silently, the only output is as a result of any actions being executed.   In  the  verbose
            mode,  specified using the -v flag, the value of each expression is printed as it is evaluated.  The
            values are in canonical units; bytes in the dimension of ``space'',  seconds  in  the  dimension  of
            ``time'' and events in the dimension of ``count''.  See pmLookupDesc(3) for details of the supported
            dimension  and scaling mechanisms for performance metrics.  The verbose mode is useful in monitoring
            the value of given expressions, evaluating derived performance metrics, passing these values  on  to
            other tools for further processing and in debugging new expressions.

       -V   This  option has the same effect as the -v option, except that the name of the host and instance (if
            applicable) are printed as well as expression values.

       -W   This option has the same  effect  as  the  -V  option  described  above,  except  that  for  boolean
            expressions,  only  those names and values that make the expression true are printed.  These are the
            same names and values accessible to rule actions as the %h, %i and %v bindings, as described below.

       -x   Execute in domain agent mode.  This mode is used within the Performance Co-Pilot product  to  derive
            values  for summary metrics, see pmdasummary(1).  Only restricted functionality is available in this
            mode (expressions with actions may not be used).

       -Z   Change the reporting timezone to timezone in the format of the environment variable TZ as  described
            in environ(5).

       -z   Change  the  reporting  timezone  to  the timezone of the host that is the source of the performance
            metrics, as identified via either the -h option or the first named archive (as described  above  for
            the -a option).

       The -S, -T, -O, and -A options may be used to define a time window to restrict the samples retrieved, set
       an  initial  origin within the time window, or specify a ``natural'' alignment of the sample times; refer
       to PCPIntro(1) for a complete description of these options.

       Output from pmie is directed to standard output and standard error as follows:

       stdout
            Expression values printed in the verbose -v mode and the output of print actions.

       stderr
            Error and warning messages for any syntactic or semantic problems during expression parsing, and any
            semantic or performance metrics availability problems during expression evaluation.

EXAMPLES

       The following example expressions demonstrate some of the capabilities of the inference engine.

       The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated examples of pmie expressions.

       The variable delta controls expression evaluation frequency.   Specify  that  subsequent  expressions  be
       evaluated once a second, until further notice:

            delta = 1 sec;

       If total syscall rate exceeds 5000 per second per CPU, then display an alarm notifier:

            kernel.all.syscall / hinv.ncpu > 5000 count/sec
            -> alarm "high syscall rate";

       If  the  high  syscall  rate  is  sustained for 10 consecutive samples, then launch top(1) in an xwsh(1G)
       window to monitor processes, but do this at most once every 5 minutes:

            all_sample (
                kernel.all.syscall @0..9 > 5000 count/sec * hinv.ncpu
            ) -> shell 5 min "xwsh -e 'top'";

       The following rules are evaluated once every 20 seconds:

            delta = 20 sec;

       If any disk is performing more than 60 I/Os per second, then print a message identifying the busy disk to
       standard output and launch dkvis(1):

            some_inst (
                disk.dev.total > 60 count/sec
            ) -> print "disk %i busy " &
                 shell 5 min "dkvis";

       Refine the preceding rule to apply only between the hours  of  9am  and  5pm,  and  to  require  3  of  4
       consecutive samples to exceed the threshold before executing the action:

            $hour >= 9 && $hour <= 17 &&
            some_inst (
              75 %_sample (
                disk.dev.total @0..3 > 60 count/sec
              )
            ) -> print "disk %i busy ";

       The following rules are evaluated once every 10 minutes:

            delta = 10 min;

       If  either  the / or the /usr filesystem is more than 95% full, display an alarm popup, but not if it has
       already been displayed during the last 4 hours:

            filesys.free #'/dev/root' /
                filesys.capacity #'/dev/root' < 0.05
            -> alarm 4 hour "root filesystem (almost) full";

            filesys.free #'/dev/usr' /
                filesys.capacity #'/dev/usr' < 0.05
            -> alarm 4 hour "/usr filesystem (almost) full";

       The following rule requires a machine  that  supports  the  PCP  environment  metrics.   If  the  machine
       environment temperature rises more than 2 degrees over a 10 minute interval, write an entry in the system
       log:

            environ.temp @0 - environ.temp @1 > 2
            -> alarm "temperature rising fast" &
               syslog "machine room temperature rise alarm";

       And last, something interesting if you have performance problems with your Oracle database:

            db = "oracle.ptg1";
            host = ":moomba.melbourne.sgi.com";
            lru = "#'cache buffers lru chain'";
            gets = "$db.latch.gets $host $lru";
            total = "$db.latch.gets $host $lru +
                     $db.latch.misses $host $lru +
                     $db.latch.immisses $host $lru";

            $total > 100 && $gets / $total < 0.2
            -> alarm "high lru latch contention";

QUICK START

       The pmie specification language is powerful and large.

       To  expedite  rapid  development of pmie rules, the pmieconf(1) tool provides a facility for generating a
       pmie configuration file from a set of generalized pmie rules.  The supplied set of rules  covers  a  wide
       range of performance scenarios.

       The  Performance  Co-Pilot  User's  and  Administrator's Guide provides a detailed tutorial-style chapter
       covering pmie.

EXPRESSION SYNTAX

       This description is terse and informal.  For a more comprehensive description  see  the  Performance  Co-
       Pilot User's and Administrator's Guide.

       A pmie specification is a sequence of semicolon terminated expressions.

       Basic  operators  are  modeled  on  the arithmetic, relational and Boolean operators of the C programming
       language.  Precedence rules are as expected, although the use of parentheses  is  encouraged  to  enhance
       readability and remove ambiguity.

       Operands are performance metric names (see pmns(5)) and the normal literal constants.

       Operands  involving  performance  metrics  may  produce sets of values, as a result of enumeration in the
       dimensions of hosts, instances and time.  Special qualifiers may appear after a performance  metric  name
       to define the enumeration in each dimension.  For example,

           kernel.percpu.cpu.user :foo :bar #cpu0 @0..2

       defines 6 values corresponding to the time spent executing in user mode on CPU 0 on the hosts ``foo'' and
       ``bar''  over  the  last 3 consecutive samples.  The default interpretation in the absence of : (host), #
       (instance) and @ (time) qualifiers is all instances at the most recent sample time for the default source
       of PCP performance metrics.

       Host and instance names that do not  follow  the  rules  for  variables  in  programming  languages,  ie.
       alphabetic optionally followed by alphanumerics, should be enclosed in single quotes.

       Expression  evaluation  follows  the  law  of  ``least  surprises''.   Where performance metrics have the
       semantics of a counter, pmie will automatically convert to a rate based upon consecutive samples and  the
       time  interval  between  these  samples.   All  expressions  are evaluated in double precision, and where
       appropriate, automatically scaled into canonical units of ``bytes'', ``seconds'' and ``counts''.

       A rule is a special form of expression that specifies  a  condition  or  logical  expression,  a  special
       operator (->) and actions to be performed when the condition is found to be true.

       The following table summarizes the basic pmie operators:

                           ┌─────────────────┬────────────────────────────────────────────┐
                           │    Operators    │                Explanation                 │
                           ├─────────────────┼────────────────────────────────────────────┤
                           │ + - * /         │ Arithmetic                                 │
                           │ < <= == >= > != │ Relational (value comparison)              │
                           │ ! && ||         │ Boolean                                    │
                           │ ->              │ Rule                                       │
                           │ rising          │ Boolean, false to true transition          │
                           │ falling         │ Boolean, true to false transition          │
                           │ rate            │ Explicit rate conversion (rarely required) │
                           └─────────────────┴────────────────────────────────────────────┘

       Aggregate operators may be used to aggregate or summarize along one dimension of a set-valued expression.
       The  following  aggregate  operators  map  from  a  logical  expression  to a logical expression of lower
       dimension.

                        ┌──────────────────────────┬─────────────┬──────────────────────────┐
                        │        Operators         │    Type     │       Explanation        │
                        ├──────────────────────────┼─────────────┼──────────────────────────┤
                        │ some_inst                │ Existential │ True if at least one set │
                        │ some_host                │             │ member is true in the    │
                        │ some_sample              │             │ associated dimension     │
                        ├──────────────────────────┼─────────────┼──────────────────────────┤
                        │ all_inst                 │ Universal   │ True if all set members  │
                        │ all_host                 │             │ are true in the          │
                        │ all_sample               │             │ associated dimension     │
                        ├──────────────────────────┼─────────────┼──────────────────────────┤
                        │ N%_inst                  │ Percentile  │ True if at least N       │
                        │ N%_host                  │             │ percent of set members   │
                        │ N%_sample                │             │ are true in the          │
                        │                          │             │ associated dimension     │
                        └──────────────────────────┴─────────────┴──────────────────────────┘

       The following instantial operators may be used to filter or limit a set-valued logical expression,  based
       on  regular  expression  matching  of instance names.  The logical expression must be a set involving the
       dimension of instances, and the regular expression is of the  form  used  by  egrep(1)  or  the  Extended
       Regular Expressions of regcomp(3G).

                             ┌──────────────┬──────────────────────────────────────────┐
                             │  Operators   │               Explanation                │
                             ├──────────────┼──────────────────────────────────────────┤
                             │ match_inst   │ For each value of the logical expression │
                             │              │ that is ``true'', the result is ``true'' │
                             │              │ if the associated instance name matches  │
                             │              │ the regular expression.  Otherwise the   │
                             │              │ result is ``false''.                     │
                             ├──────────────┼──────────────────────────────────────────┤
                             │ nomatch_inst │ For each value of the logical expression │
                             │              │ that is ``true'', the result is ``true'' │
                             │              │ if the associated instance name does not │
                             │              │ match the regular expression.  Otherwise │
                             │              │ the result is ``false''.                 │
                             └──────────────┴──────────────────────────────────────────┘

       For  example,  the  expression below will be ``true'' for disks attached to controllers 2 or 3 performing
       more than 20 operations per second:
            match_inst "^dks[23]d" disk.dev.total > 20;

       The following aggregate operators map from an arithmetic expression to an arithmetic expression of  lower
       dimension.

                         ┌──────────────────────────┬───────────┬──────────────────────────┐
                         │        Operators         │   Type    │       Explanation        │
                         ├──────────────────────────┼───────────┼──────────────────────────┤
                         │ min_inst                 │ Extrema   │ Minimum value across all │
                         │ min_host                 │           │ set members in the       │
                         │ min_sample               │           │ associated dimension     │
                         ├──────────────────────────┼───────────┼──────────────────────────┤
                         │ max_inst                 │ Extrema   │ Maximum value across all │
                         │ max_host                 │           │ set members in the       │
                         │ max_sample               │           │ associated dimension     │
                         ├──────────────────────────┼───────────┼──────────────────────────┤
                         │ sum_inst                 │ Aggregate │ Sum of values across all │
                         │ sum_host                 │           │ set members in the       │
                         │ sum_sample               │           │ associated dimension     │
                         ├──────────────────────────┼───────────┼──────────────────────────┤
                         │ avg_inst                 │ Aggregate │ Average value across all │
                         │ avg_host                 │           │ set members in the       │
                         │ avg_sample               │           │ associated dimension     │
                         └──────────────────────────┴───────────┴──────────────────────────┘

       The  aggregate  operators  count_inst,  count_host  and  count_sample map from a logical expression to an
       arithmetic expression of lower dimension by counting the number of set members for which  the  expression
       is true in the associated dimension.

       For action rules, the following actions are defined:
                                ┌───────────┬────────────────────────────────────────┐
                                │ Operators │              Explanation               │
                                ├───────────┼────────────────────────────────────────┤
                                │ alarm     │ Raise a visible alarm with xconfirm(1) │
                                │ print     │ Display on standard output             │
                                │ shell     │ Execute with sh(1)                     │
                                │ stomp     │ Send a STOMP message to a JMS server   │
                                │ syslog    │ Append a message to system log file    │
                                └───────────┴────────────────────────────────────────┘

       Multiple  actions  may be separated by the & and | operators to specify respectively sequential execution
       (both actions are executed) and alternate execution (the second action  will  only  be  executed  if  the
       execution of the first action returns a non-zero error status.

       Arguments  to  actions are an optional suppression time, and then one or more expressions (a string is an
       expression in this context).  Strings appearing as arguments to  an  action  may  include  the  following
       special selectors that will be replaced at the time the action is executed.

       %h  Host(s) that make the left-most top-level expression in the condition true.

       %i  Instance(s) that make the left-most top-level expression in the condition true.

       %v  One  value  from  the left-most top-level expression in the condition for each host and instance pair
           that makes the condition true.

       Note that expansion of the special selectors is done by repeating the whole argument once for each unique
       binding to any of the qualifying special selectors.  For example if a rule were true for the host  mumble
       with  instances  grunt  and  snort,  and  for host fumble the instance puff makes the rule true, then the
       action
            ...
            -> shell myscript "Warning: %h:%i busy ";
       will execute myscript with the argument string "Warning: mumble:grunt  busy  Warning:  mumble:snort  busy
       Warning: fumble:puff busy".

       By comparison, if the action
            ...
            -> shell myscript "Warning! busy:" " %h:%i";
       were  executed  under  the  same  circumstances, then myscript would be executed with the argument string
       "Warning! busy: mumble:grunt mumble:snort fumble:puff".

       The semantics of the expansion of the special selectors leads to a common usage  pattern  in  an  action,
       where one argument is a constant (contains no special selectors) the second argument contains the desired
       special  selectors  with minimal separator characters, and an optional third argument provides a constant
       postscript (e.g. to terminate any  argument  quoting  from  the  first  argument).   If  necessary  post-
       processing  (eg.  in  myscript)  can  provide the necessary enumeration over each unique expansion of the
       string containing just the special selectors.

       For complex conditions, the bindings to these selectors is not obvious.  It is strongly recommended  that
       pmie  be  used  in  the  debugging  mode  (specify  the -W command line option in particular) during rule
       development.

SCALE FACTORS

       Scale factors may be appended to arithmetic  expressions  and  force  linear  scaling  of  the  value  to
       canonical  units.   Simple  scale  factors  are constructed from the keywords: nanosecond, nanosec, nsec,
       microsecond, microsec, usec, millisecond, millisec, msec, second, sec, minute, min,  hour,  byte,  Kbyte,
       Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and the operator /, for example ``Kbytes / hour''.

MACROS

       Macros are defined using expressions of the form:

            name = constexpr;

       Where  name  follows  the  normal rules for variables in programming languages, ie. alphabetic optionally
       followed by alphanumerics.  constexpr must be a constant expression, either a string (enclosed in  double
       quotes) or an arithmetic expression optionally followed by a scale factor.

       Macros are expanded when their name, prefixed by a dollar ($) appears in an expression, and macros may be
       nested within a constexpr string.

       The following reserved macro names are understood.

       minute    Current minute of the hour.

       hour      Current hour of the day, in the range 0 to 23.

       day       Current day of the month, in the range 1 to 31.

       month     Current month of the year, in the range 0 (January) to 11 (December).

       year      Current year.

       day_of_week
                 Current day of the week, in the range 0 (Sunday) to 6 (Saturday).

       delta     Sample interval in effect for this expression.

       Dates  and  times  are  presented  in  the reporting time zone (see description of -Z and -z command line
       options above).

AUTOMATIC RESTART

       It is often useful for pmie processes to be started  and  stopped  when  the  local  host  is  booted  or
       shutdown,  or  when  they have been detected as no longer running (when they have unexpectedly exited for
       some reason).  Refer to pmie_check(1) for details on automating this process.

EVENT MONITORING

       It is common for production systems to be monitored in a central location.  Traditionally on UNIX systems
       this has been performed by the system log facilities  -  see  logger(1),  and  syslogd(1).   On  Windows,
       communication with the system event log is handled by pcp-eventlog(1).

       pmie  fits  into this model when rules use the syslog action.  Note that if the action string begins with
       -p (priority) and/or -t (tag) then these are extracted from the string and treated in the same way as  in
       logger(1) and pcp-eventlog(1).

       However,  it  is  common  to  have  other  event  monitoring  frameworks also, into which you may wish to
       incorporate performance events from pmie.  You can often use the shell action to  send  events  to  these
       frameworks, as they usually provide their a program for injecting events into the framework from external
       sources.

       A final option is use of the stomp (Streaming Text Oriented Messaging Protocol) action, which allows pmie
       to  connect to a central JMS (Java Messaging System) server and send events to the PMIE topic.  Tools can
       be written to extract these text messages and present  them  to  operations  people  (via  desktop  popup
       windows,  etc).  Use of the stomp action requires a stomp configuration file to be setup, which specifies
       the location of the JMS server host, port number, and username/password.

       The format of this file is as follows:

            host=messages.sgi.com   # this is the JMS server (required)
            port=61616              # and its listening here (required)
            timeout=2               # seconds to wait for server (optional)
            username=joe            # (required)
            password=j03ST0MP       # (required)
            topic=PMIE              # JMS topic for pmie messages (optional)

       The timeout value specifies the time (in seconds) that pmie should wait for acknowledgements from the JMS
       server after sending a message (as required by the STOMP protocol).  Note that on startup, pmie will wait
       indefinitely for a connection, and will not begin rule evaluation until that initial connection has  been
       established.   Should  the  connection  to the JMS server be lost at any time while pmie is running, pmie
       will attempt to reconnect on each subsequent truthful evaluation of a rule with a stomp action,  but  not
       more  than  once  per  minute.   This is to avoid contributing to network congestion.  In this situation,
       where the STOMP connection to the JMS server has been severed, the stomp action will  return  a  non-zero
       error value.

FILES

       $PCP_DEMOS_DIR/pmie/*
                 annotated example rules
       $PCP_VAR_DIR/pmns/*
                 default PMNS specification files
       $PCP_TMP_DIR/pmie
                 pmie  maintains  files  in  this directory to identify the running pmie instances and to export
                 runtime information about  each  instance  -  this  data  forms  the  basis  of  the  pmcd.pmie
                 performance metrics
       $PCP_PMIECONTROL_PATH
                 the default set of pmie instances to start at boot time - refer to pmie_check(1) for details
       $PCP_SYSCONF_DIR/pmie/*
                 the  predefined  alarm action scripts (email, log, popup and syslog), the example action script
                 (sample)and the concurrent action control file (control.master).

BUGS

       The lexical scanner and parser will attempt to recover after an error in the input expressions.   Parsing
       resumes  after  skipping  input  up  to the next semi-colon (;), however during this skipping process the
       scanner is ignorant of comments and strings, so an embedded semi-colon may cause parsing to resume at  an
       unexpected  place.  This behavior is largely benign, as until the initial syntax error is corrected, pmie
       will not attempt any expression evaluation.

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory names used  by
       PCP.   On  each  installation, the file /etc/pcp.conf contains the local values for these variables.  The
       $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

UNIX SEE ALSO

       logger(1).

WINDOWS SEE ALSO

       pcp-eventlog(1).

SEE ALSO

       PCPIntro(1),  pmcd(1),  pmdumplog(1),  pmieconf(1),  pmie_check(1),  pminfo(1),  pmlogger(1),   pmval(1),
       PMAPI(3), pcp.conf(5) and pcp.env(5).

USER GUIDE

       For  a  more  complete  description  of  the  pmie  language, refer to the Performance Co-Pilot Users and
       Administrators Guide.  This is available online from:
           http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?\
               db=bks&fname=/SGI_Admin/books/PCP_IRIX/sgi_html/ch05.html

Performance Co-Pilot                                   PCP                                               PMIE(1)