Provided by: systemtap_0.0.20090214-1ubuntu1_i386 bug

NAME

       stap - systemtap script translator/driver

SYNOPSIS

       stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
       stap [ OPTIONS ] - [ ARGUMENTS ]
       stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
       stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
       stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]

DESCRIPTION

       The  stap  program  is the front-end to the Systemtap tool.  It accepts
       probing  instructions  (written  in  a  simple   scripting   language),
       translates  those  instructions  into C code, compiles this C code, and
       loads the resulting kernel  module  into  a  running  Linux  kernel  to
       perform the requested system trace/probe functions.  You can supply the
       script in a named file, from standard input, or from the command  line.
       The  program runs until it is interrupted by the user, or if the script
       voluntarily invokes the exit() function, or  by  sufficient  number  of
       soft errors.

       The language, which is described in a later section, is strictly typed,
       declaration free, procedural, and inspired by awk.   It  allows  source
       code  points  or  events  in the kernel to be associated with handlers,
       which are subroutines that are executed synchronously.  It is  somewhat
       similar conceptually to "breakpoint command lists" in the gdb debugger.

       This manual corresponds to version 0.8.

OPTIONS

       The systemtap translator supports the  following  options.   Any  other
       option prints a list of supported options.

       -h     Show help message.

       -V     Show version message.

       -p NUM Stop  after  pass  NUM.   The  passes  are  numbered 1-5: parse,
              elaborate, translate, compile, run.  See the PROCESSING  section
              for details.

       -v     Increase  verbosity  for all passes.  Produce a larger volume of
              informative (?) output each time option repeated.

       --vp ABCDE
              Increase verbosity on a per-pass basis.  For example, "--vp 002"
              adds  2  units  of  verbosity  to  pass 3 only.  The combination
              "-v --vp 00004" adds 1 unit of verbosity for all passes,  and  4
              more for pass 5.

       -k     Keep  the temporary directory after all processing.  This may be
              useful in order to examine the generated C code, or to reuse the
              compiled kernel object.

       -g     Guru  mode.   Enable  parsing  of unsafe expert-level constructs
              like embedded C.

       -P     Prologue-searching mode.  Activate  heuristics  to  work  around
              incorrect debugging information for $target variables.

       -u     Unoptimized   mode.    Disable   unused   code   elision  during
              elaboration.

       -w     Suppressed warnings mode.  Disable warning messages  for  elided
              code in user script.

       -b     Use bulk mode (percpu files) for kernel-to-user data transfer.

       -t     Collect timing information on the number of times probe executes
              and average amount of time spent in each probe.

       -sNUM  Use NUM megabyte buffers for kernel-to-user data transfer.  On a
              multiprocessor in bulk mode, this is a per-processor amount.

       -I DIR Add the given directory to the tapset search directory.  See the
              description of pass 2 for details.

       -D NAME=VALUE
              Add the given C preprocessor directive to the  module  Makefile.
              These  can be used to override limit parameters described below.

       -R DIR Look for the systemtap runtime sources in the given directory.

       -r /DIR
              Build for kernel in given build tree.

       -r RELEASE
              Build for kernel in build tree /lib/modules/RELEASE/build.

       -m MODULE
              Use the given name  for  the  generated  kernel  object  module,
              instead  of  a  unique  randomized  name.   The generated kernel
              object module is copied to the current directory.

       -d MODULE
              Add symbol/unwind information for  the  given  module  into  the
              kernel  object module.  This may enable symbolic tracebacks from
              those modules/programs, even if they do  not  have  an  explicit
              probe placed into them.

       -o FILE
              Send  standard  output to named file. In bulk mode, percpu files
              will start with FILE_ followed by the cpu number.

       -c CMD Start the probes, run CMD, and exit when CMD finishes.

       -x PID Sets target() to PID. This allows scripts  to  be  written  that
              filter on a specific process.

       -l PROBE
              Instead of running a probe script, just list all available probe
              points matching the given  pattern.   The  pattern  may  include
              wildcards and aliases.

       -L PROBE
              Similar  to  "-l",  but list probe points and script-level local
              variables.

       -F     Load module and  start  probes,  then  detach  from  the  module
              leaving the probes running.

       --kelf For  names  and  addresses  of  functions  to probe, consult the
              symbol tables in the kernel and modules.  This can be useful  if
              your  kernel  and/or  modules  were  compiled  without debugging
              information, or  the  function  you  want  to  probe  is  in  an
              assembly-language file built without debugging information.  See
              the MAKING DO WITH SYMBOL TABLES section for more information.

       --kmap[=FILE]
              For names and addresses of kernel functions  to  probe,  consult
              the  symbol  table  in  the indicated text file.  The default is
              /boot/System.map-VERSION.  The contents of this file  should  be
              in  the  form of the default output from nm(1).  Only symbols of
              type T or t are used.  If you  specify  /proc/kallsyms  or  some
              other  file  in  that  format,  where  lines  for module symbols
              contain a fourth column, reading of the symbol table stops  with
              the  first  module  symbol (which should be right after the last
              kernel symbol).  As  with  --kelf,  the  symbol  table  in  each
              module’s  .ko  file  will  also be consulted.  See the MAKING DO
              WITH SYMBOL TABLES section for more information.

       --ignore-vmlinux
              For testing, act  as  though  neither  the  uncompressed  kernel
              (vmlinux) nor the kernel debugging information can be found.

       --ignore-dwarf
              For  testing,  act  as though vmlinux and modules lack debugging
              information.

ARGUMENTS

       Any additional arguments on the command line are passed to  the  script
       parser for substitution.  See below.

SCRIPT LANGUAGE

       The  systemtap  script  language  resembles  awk.   There  are two main
       outermost constructs: probes and functions.  Within  these,  statements
       and expressions use C-like operator syntax and precedence.

   GENERAL SYNTAX
       Whitespace is ignored.  Three forms of comments are supported:
              # ... shell style, to the end of line, except for $# and @#
              // ... C++ style, to the end of line
              /* ... C style ... */
       Literals  are either strings enclosed in double-quotes (passing through
       the usual C escape codes with backslashes), or  integers  (in  decimal,
       hexadecimal,  or  octal, using the same notation as in C).  All strings
       are limited in length to some reasonable value (a few  hundred  bytes).
       Integers are 64-bit signed quantities, although the parser also accepts
       (and wraps around) values above positive 2**63.

       In addition, script arguments given at the end of the command line  may
       be inserted.  Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
       insertion as a string literal.  The number of arguments may be accessed
       through  $# (as an unquoted number) or through @# (as a quoted number).
       These may be used at any place a token may begin, including within  the
       preprocessing  stage.   Reference to an argument number beyond what was
       actually given is an error.

   PREPROCESSING
       A simple conditional preprocessing stage is run as a part  of  parsing.
       The general form is similar to the cond ? exp1 : exp2 ternary operator:
              %( CONDITION %? TRUE-TOKENS %)
              %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
       The CONDITION is either an expression whose format is determined by its
       first  keyword,  or  a string literals comparison or a numeric literals
       comparison.

       If the first part is the identifier kernel_vr or kernel_v to  refer  to
       the  kernel  version  number,  with  ("2.6.13-1.322FC3smp")  or without
       ("2.6.13") the release code suffix, then the second part is one of  the
       six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
       the third part is a string literal that contains an RPM-style  version-
       release value.  The condition is deemed satisfied if the version of the
       target kernel (as optionally overridden by the -r option)  compares  to
       the  given  version  string.   The comparison is performed by the glibc
       function strverscmp.  As a special case, if the operator is for  simple
       equality  (==),  or  inequality  (!=),  and the third part contains any
       wildcard characters (* or ? or [), then the expression is treated as  a
       wildcard (mis)match as evaluated by fnmatch.

       If,  on  the other hand, the first part is the identifier arch to refer
       to the processor architecture, then the second  part  then  the  second
       part  is  one  of the two string comparison operators == or !=, and the
       third part is a string literal for matching it.  This comparison  is  a
       wildcard (mis)match.

       Otherwise,  the  CONDITION  is  expected to be a comparison between two
       string literals or two numeric literals.  In this case,  the  arguments
       are the only variables usable.

       The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
       (possibly including nested preprocessor conditionals), and  are  pasted
       into  the input stream if the condition is true or false.  For example,
       the following code induces a  parse  error  unless  the  target  kernel
       version is newer than 2.6.5:
              %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
       The following code might adapt to hypothetical kernel version drift:
              probe kernel.function (
                %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
                   %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
                      UNSUPPORTED %) %)
              ) { /* ... */ }

              %( arch == "ia64" %?
                 probe syscall.vliw = kernel.function("vliw_widget") {}
              %)

   VARIABLES
       Identifiers  for  variables and functions are an alphanumeric sequence,
       and may include "_" and "$" characters.  They  may  not  start  with  a
       plain  digit,  as in C.  Each variable is by default local to the probe
       or function statement block within which it is mentioned, and therefore
       its  scope  and  lifetime  is limited to a particular probe or function
       invocation.

       Scalar variables are implicitly typed  as  either  string  or  integer.
       Associative  arrays  also have a string or integer value, and a a tuple
       of strings and/or integers serving as a key.   Here  are  a  few  basic
       expressions.
              var1 = 5
              var2 = "bar"
              array1 [pid()] = "name"     # single numeric key
              array2 ["foo",4,i++] += 5   # vector of string/num/num keys
              if (["hello",5,4] in array2) println ("yes")  # membership test

       The  translator  performs  type inference on all identifiers, including
       array indexes and function parameters.  Inconsistent  type-related  use
       of identifiers signals an error.

       Variables  may  be declared global, so that they are shared amongst all
       probes and live as long as the entire systemtap session.  There is  one
       namespace  for  all  global  variables, regardless of which script file
       they are found within.  A global declaration  may  be  written  at  the
       outermost level anywhere, not within a block of code.  Global variables
       which are written but never read will  be  displayed  automatically  at
       session  shutdown.   The following declaration marks a few variables as
       global.  The translator will infer for each its value type, and  if  it
       is  used as an array, its key types.  Optionally, scalar globals may be
       initialized with a string or number literal.
              global var1, var2, var3=4

       Arrays are limited in size by the MAXMAPENTRIES  variable  --  see  the
       SAFETY AND SECURITY section for details.  Optionally, global arrays may
       be declared with a maximum size in brackets,  overriding  MAXMAPENTRIES
       for  that array only.  Note that this doesn’t indicate the type of keys
       for the array, just the size.
              global tiny_array[10], normal_array, big_array[50000]

   STATEMENTS
       Statements enable procedural  control  flow.   They  may  occur  within
       functions  and probe handlers.  The total number of statements executed
       in response to any single probe event is limited to some number defined
       by  a  macro  in  the translated C code, and is in the neighbourhood of
       1000.

       EXP    Execute the string- or integer-valued expression and throw  away
              the value.

       { STMT1 STMT2 ... }
              Execute  each  statement  in  sequence in this block.  Note that
              separators or terminators are generally  not  necessary  between
              statements.

       ;      Null  statement,  do  nothing.   It  is  useful  as  an optional
              separator between statements to improve  syntax-error  detection
              and to handle certain grammar ambiguities.

       if (EXP) STMT1 [ else STMT2 ]
              Compare  integer-valued  EXP  to  zero.  Execute the first (non-
              zero) or second STMT (zero).

       while (EXP) STMT
              While integer-valued EXP evaluates to non-zero, execute STMT.

       for (EXP1; EXP2; EXP3) STMT
              Execute EXP1 as initialization.  While EXP2 is non-zero, execute
              STMT, then the iteration expression EXP3.

       foreach (VAR in ARRAY [ limit EXP ]) STMT
              Loop  over  each  element  of  the named global array, assigning
              current key to VAR.  The array may not be  modified  within  the
              statement.   By adding a single + or - operator after the VAR or
              the ARRAY identifier, the iteration will  proceed  in  a  sorted
              order,  by  ascending  or  descending index or value.  Using the
              optional limit keyword limits the number of loop  iterations  to
              EXP times.  EXP is evaluted once at the beginning of the loop.

       foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
              Same  as  above,  used when the array is indexed with a tuple of
              keys.  A sorting suffix may be used on at most one VAR or  ARRAY
              identifier.

       break, continue
              Exit  or  iterate  the  innermost  nesting loop (while or for or
              foreach) statement.

       return EXP
              Return EXP value from enclosing  function.   If  the  function’s
              value  is  not  taken  anywhere,  then a return statement is not
              needed, and the function will have a special "unknown" type with
              no return value.

       next   Return now from enclosing probe handler.

       delete ARRAY[INDEX1, INDEX2, ...]
              Remove from ARRAY the element specified by the index tuple.  The
              value will no longer be  available,  and  subsequent  iterations
              will  not  report  the element.  It is not an error to delete an
              element that does not exist.

       delete ARRAY
              Remove all elements from ARRAY.

       delete SCALAR
              Removes the value of SCALAR.  Integers and strings  are  cleared
              to  0  and  ""  respectively,  while statistics are reset to the
              initial empty state.

   EXPRESSIONS
       Systemtap supports a number of operators that  have  the  same  general
       syntax,  semantics,  and  precedence  as  in  C and awk.  Arithmetic is
       performed as per typical C rules for signed integers.  Division by zero
       or overflow is detected and results in an error.

       binary numeric operators
              * / % + - >> << & ^ | && ||

       binary string operators
              .  (string concatenation)

       numeric assignment operators
              = *= /= %= += -= >>= <<= &= ^= |=

       string assignment operators
              = .=

       unary numeric operators
              + - ! ~ ++ --

       binary numeric or string comparison operators
              < > <= >= == !=

       ternary operator
              cond ? exp1 : exp2

       grouping operator
              ( exp )

       function call
              fn ([ arg1, arg2, ... ])

       array membership check
              exp in array
              [exp1, exp2, ...] in array

   PROBES
       The main construct in the scripting language identifies probes.  Probes
       associate abstract events with a statement block ("probe handler") that
       is  to  be executed when any of those events occur.  The general syntax
       is as follows:
              probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }

       Events are specified in a special syntax called "probe points".   There
       are  several  varieties  of probe points defined by the translator, and
       tapset scripts may define further ones using aliases.  These are listed
       in the stapprobes(5) manual pages.

       The probe handler is interpreted relative to the context of each event.
       For events associated  with  kernel  code,  this  context  may  include
       variables  defined  in  the  source  code  at that spot.  These "target
       variables" are presented to the script as  variables  whose  names  are
       prefixed  with "$".  They may be accessed only if the kernel’s compiler
       preserved them despite optimization.  This is the same constraint  that
       a  debugger  user  faces  when working with optimized code.  Some other
       events have very little context.

       New probe points may be defined using "aliases".  Probe  point  aliases
       look similar to probe definitions, but instead of activating a probe at
       the given point, it just defines a new probe point name as an alias  to
       an  existing one. There are two types of alias, i.e. the prologue style
       and  the  epilogue  style  which  are  identified  by  "="   and   "+="
       respectively.

       For  prologue  style  alias,  the statement block that follows an alias
       definition is implicitly added as a prologue to any probe  that  refers
       to  the  alias. While for the epilogue style alias, the statement block
       that follows an alias definition is implicitly added as an epilogue  to
       any probe that refers to the alias.  For example:

              probe syscall.read = kernel.function("sys_read") {
                fildes = $fd
                if (execname == "init") next  # skip rest of probe
              }
       defines   a   new   probe   point   syscall.read,   which   expands  to
       kernel.function("sys_read"), with the given statement  as  a  prologue,
       which  is  useful to predefine some variables for the alias user and/or
       to skip probe processing entirely based on some conditions.  And
              probe syscall.read += kernel.function("sys_read") {
                if (tracethis) println ($fd)
              }
       defines a new probe point with the  given  statement  as  an  epilogue,
       which  is  useful to take actions based upon variables set or left over
       by the the alias user.

       An alias is used just like a built-in probe type.
              probe syscall.read {
                printf("reading fd=%d0, fildes)
                if (fildes > 10) tracethis = 1
              }

   FUNCTIONS
       Systemtap scripts may define subroutines to  factor  out  common  work.
       Functions  take any number of scalar (integer or string) arguments, and
       must return a single scalar (integer or string).  An  example  function
       declaration looks like this:
              function thisfn (arg1, arg2) {
                 return arg1 + arg2
              }
       Note  the  general  absence  of  type  declarations,  which are instead
       inferred by the translator.  However, if desired, a function definition
       may  include explicit type declarations for its return value and/or its
       arguments.  This is especially helpful for  embedded-C  functions.   In
       the  following  example, the type inference engine need only infer type
       type of arg2 (a string).
              function thatfn:string (arg1:long, arg2) {
                 return sprint(arg1) . arg2
              }
       Functions may call others or themselves  recursively,  up  to  a  fixed
       nesting  limit.   This  limit is defined by a macro in the translated C
       code and is in the neighbourhood of 10.

   PRINTING
       There are a set of function names that are  specially  treated  by  the
       translator.   They format values for printing to the standard systemtap
       output stream in a more convenient way.  The  sprint*  variants  return
       the formatted string instead of printing it.

       print, sprint
              Print  one  or  more  values  of any type, concatenated directly
              together.

       println, sprintln
              Print values like print and sprint, but also append a newline.

       printd, sprintd
              Take a string delimiter and two or more values of any type,  and
              print  the  values with the delimiter interposed.  The delimiter
              must be a literal string constant.

       printdln, sprintdln
              Print values with a delimiter like printd and sprintd, but  also
              append a newline.

       printf, sprintf
              Take a formatting string and a number of values of corresponding
              types, and print them all.  The format must be a literal  string
              constant.

       The  printf  formatting  directives  similar to those of C, except that
       they are fully type-checked by the translator:

              %b     Writes a binary blob of the value given, instead of ASCII
                     text.  The width specifier determines the number of bytes
                     to write; valid  specifiers  are  %b  %1b  %2b  %4b  %8b.
                     Default (%b) is 8 bytes.

              %c     Character.

              %d,%i  Signed decimal.

              %m     Safely  reads kernel memory at the given address, outputs
                     its content.   The  precision  specifier  determines  the
                     number of bytes to read.  Default is 1 byte.

              %M     Same  as  %m,  but outputs in hexadecimal.  The precision
                     specifier determines the number of hexadecimal digits  to
                     output.  Default is 1 digit.

              %o     Unsigned octal.

              %p     Unsigned pointer address.

              %s     String.

              %u     Unsigned decimal.

              %x     Unsigned hex value, in all lower-case.

              %X     Unsigned hex value, in all upper-case.

              %%     Writes a %.

       Examples:
                   a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
                   print("hello")
                        Prints: hello
                   println(b)
                        Prints: bob\n
                   println(a . " is " . sprint(16))
                        Prints: alice is 16
                   foreach (name in id)  printdln("|", strlen(name), name, id[name])
                        Prints: 5|alice|1234\n3|bob|4567
                   printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
                        Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
                   printf("2 bytes of kernel buffer at address %p: %2m", p, p)
                        Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
                   printf("%4b", p)
                        Prints (these values as binary data): 0x1234abcd

   STATISTICS
       It  is  often  desirable to collect statistics in a way that avoids the
       penalties of repeatedly exclusive locking the  global  variables  those
       numbers  are  being  put  into.   Systemtap provides a solution using a
       special operator to accumulate values, and several pseudo-functions  to
       extract the statistical aggregates.

       The  aggregation operator is <<<, and resembles an assignment, or a C++
       output-streaming operation.  The left operand  specifies  a  scalar  or
       array-index  lvalue,  which must be declared global.  The right operand
       is a numeric expression.  The  meaning  is  intuitive:  add  the  given
       number  to the pile of numbers to compute statistics of.  (The specific
       list of statistics to gather is given  separately,  by  the  extraction
       functions.)
                  foo <<< 1
                  stats[pid()] <<< memsize

       The  extraction  functions  are also special.  For each appearance of a
       distinct extraction function  operating  on  a  given  identifier,  the
       translator  arranges  to  compute  a set of statistics that satisfy it.
       The statistics system is thereby "on-demand".   Each  execution  of  an
       extraction  function  causes  the  aggregation  to be computed for that
       moment across all processors.

       Here is the set of extractor functions.  The first argument of each  is
       the  same  style of lvalue used on the left hand side of the accumulate
       operation.  The @count(v), @sum(v), @min(v), @max(v), @avg(v) extractor
       functions   compute  the  number/total/minimum/maximum/average  of  all
       accumulated values.  The resulting values are all simple integers.

       Histograms are also available, but are more  complicated  because  they
       have       a       vector      rather      than      scalar      value.
       @hist_linear(v,start,stop,interval) represents a linear histogram  from
       "start"  to  "stop"  by increments of "interval".  The interval must be
       positive.  Similarly,  @hist_log(v)  represents  a  base-2  logarithmic
       histogram.  Printing  a  histogram  with  the print family of functions
       renders a histogram object as a tabular "ASCII art" bar chart.
              probe foo {
                x <<< $value
              }
              probe end {
                printf ("avg %d = sum %d / count %d\n",
                        @avg(x), @sum(x), @count(x))
                print (@hist_log(v))
              }

   EMBEDDED C
       When in guru mode, the translator accepts embedded code in the  script.
       Such  code  is  enclosed  between %{ and %} markers, and is transcribed
       verbatim, without analysis, in some  sequence,  into  the  generated  C
       code.   At  the  outermost  level,  this  may be useful to add #include
       instructions, and any auxiliary definitions for use by  other  embedded
       code.

       The other place where embedded code is permitted is as a function body.
       In this case, the script language body is replaced entirely by a  piece
       of C code enclosed again between %{ and %} markers.  This C code may do
       anything reasonable and safe.  There are a number of  undocumented  but
       complex   safety   constraints   on  atomicity,  concurrency,  resource
       consumption, and run time limits, so this is an advanced technique.

       The memory locations set aside for input and  output  values  are  made
       available to it using a macro THIS.  Here are some examples:
              function add_one (val) %{
                THIS->__retvalue = THIS->val + 1;
              %}
              function add_one_str (val) %{
                strlcpy (THIS->__retvalue, THIS->val, MAXSTRINGLEN);
                strlcat (THIS->__retvalue, "one", MAXSTRINGLEN);
              %}
       The function argument and return value types have to be inferred by the
       translator from the call sites in order for this  to  work.   The  user
       should  examine C code generated for ordinary script-language functions
       in order to write compatible embedded-C ones.

   BUILT-INS
       A set of builtin functions and probe point aliases are provided by  the
       scripts  installed  under  the  /usr/share/systemtap/tapset  directory.
       These are described in the stapfuncs(5) and stapprobes(5) manual pages.

PROCESSING

       The translator begins pass 1 by parsing the given input script, and all
       scripts  (files  named  *.stp)  found  in  a  tapset  directory.    The
       directories listed with -I are processed in sequence, each processed in
       "guru mode".  For each directory, a number of subdirectories  are  also
       searched.   These  subdirectories  are derived from the selected kernel
       version (the -R option), in order to allow more kernel-version-specific
       scripts  to  override  less  specific  ones.  For example, for a kernel
       version 2.6.12-23.FC3 the following  patterns  would  be  searched,  in
       sequence:  2.6.12-23.FC3/*.stp,  2.6.12/*.stp,  2.6/*.stp,  and finally
       *.stp Stopping the translator after pass 1 causes it to print the parse
       trees.

       In  pass 2, the translator analyzes the input script to resolve symbols
       and types.  References to variables, functions, and probe aliases  that
       are unresolved internally are satisfied by searching through the parsed
       tapset scripts.  If any tapset script is selected because it defines an
       unresolved  symbol,  then  the  entirety of that script is added to the
       translator’s resolution queue.  This process iterates until all symbols
       are resolved and a subset of tapset scripts is selected.

       Next,  all  probe  point  descriptions  are  validated against the wide
       variety supported by the translator.  Probe points that refer  to  code
       locations  ("synchronous  probe points") require the appropriate kernel
       debugging  information  to  be  installed.   In  the  associated  probe
       handlers,  target-side variables (whose names begin with "$") are found
       and have their run-time locations decoded.

       Next,  all  probes  and  functions  are   analyzed   for   optimization
       opportunities, in order to remove variables, expressions, and functions
       that have no useful value and no side-effect.  Embedded-C functions are
       assumed  to  have  side-effects  unless  they  include the magic string
       /* pure */.  Since this optimization can hide latent code  errors  such
       as  type  mismatches  or invalid $target variables, it sometimes may be
       useful to disable the optimizations with the -u option.

       Finally, all variable, function, parameter, array, and index types  are
       inferred   from   context   (literals  and  operators).   Stopping  the
       translator after pass 2 causes it to list all  the  probes,  functions,
       and  variables,  along  with  all  inferred types.  Any inconsistent or
       unresolved types cause an error.

       In pass 3, the translator writes C code that represents the actions  of
       all  selected script files, and creates a Makefile to build that into a
       kernel object.  These files are  placed  into  a  temporary  directory.
       Stopping  the  translator at this point causes it to print the contents
       of the C file.

       In pass 4, the translator invokes the  Linux  kernel  build  system  to
       create  the  actual  kernel object file.  This involves running make in
       the temporary directory, and requires  a  kernel  module  build  system
       (headers,  config  and  Makefiles)  to  be  installed in the usual spot
       /lib/modules/VERSION/build.  Stopping the translator after  pass  4  is
       the  last  chance before running the kernel object.  This may be useful
       if you want to archive the file.

       In pass 5, the  translator  invokes  the  systemtap  auxiliary  program
       staprun  program for the given kernel object.  This program arranges to
       load the module then communicates with it, copying trace data from  the
       kernel  into temporary files, until the user sends an interrupt signal.
       Any run-time error encountered by the probe handlers, such  as  running
       out  of  memory, division by zero, exceeding nesting or runtime limits,
       results in a soft error indication.  Soft errors in excess of MAXERRORS
       block  of  all  subsequent probes, and terminate the session.  Finally,
       staprun unloads the module, and cleans up.

EXAMPLES

       See the stapex(5) manual page for a collection of samples.

CACHING

       The systemtap translator caches the pass  3  output  (the  generated  C
       code)  and  the  pass  4  output (the compiled kernel module) if pass 4
       completes successfully.  This cached  output  is  reused  if  the  same
       script  is  translated  again  assuming the same conditions exist (same
       kernel version, same systemtap version, etc.).  Cached files are stored
       in  the  $SYSTEMTAP_DIR/cache  directory.  The  cache can be limited by
       having the file cache_mb_limit placed in  the  cache  directory  (shown
       above)  containing  only an ASCII integer representing how many MiB the
       cache should not exceed. Note that this is a ’soft’ limit in  that  the
       cache  will  be  cleaned after a new entry is added, so the total cache
       size may temporarily exceed this limit. In the absence of this file,  a
       default will be created with the limit set to 64MiB.

SAFETY AND SECURITY

       Systemtap  is  an administrative tool.  It exposes kernel internal data
       structures and  potentially  private  user  information.   It  acquires
       either root privileges

       To actually run the kernel objects it builds, a user must be one of the
       following:

       ·   the root user;

       ·   a member of the stapdev group; or

       ·   a member of the stapusr group.  Members of the  stapusr  group  can
           only  use  modules  located  in  the /lib/modules/VERSION/systemtap
           directory.  This directory must be owned by root and not  be  world
           writable.

       The  kernel  modules  generated  by stap program are run by the staprun
       program.  The latter is a part of the Systemtap package,  dedicated  to
       module  loading and unloading (but only in the white zone), and kernel-
       to-user data transfer.  Since staprun does not perform  any  additional
       security  checks  on the kernel objects it is given, it would be unwise
       for a system administrator to add untrusted users  to  the  stapdev  or
       stapusr groups.

       The  translator  asserts certain safety constraints.  It aims to ensure
       that no handler routine can run for very long, allocate memory, perform
       unsafe  operations,  or  in  unintentionally interfere with the kernel.
       Use of script global variables is suitably locked  to  protect  against
       manipulation by concurrent probe handlers.  Use of guru mode constructs
       such as embedded C can violate these  constraints,  leading  to  kernel
       crash or data corruption.

       The  resource  use  limits  are  set by macros in the generated C code.
       These may be overridden with the -D flag.  A selection of these  is  as
       follows:

       MAXNESTING
              Maximum number of recursive function call levels, default 10.

       MAXSTRINGLEN
              Maximum length of strings, default 128.

       MAXTRYLOCK
              Maximum  number  of  iterations  to  wait  for  locks  on global
              variables before declaring possible deadlock  and  skipping  the
              probe, default 1000.

       MAXACTION
              Maximum  number of statements to execute during any single probe
              hit (with interrupts disabled), default 1000.

       MAXACTION_INTERRUPTIBLE
              Maximum number of statements to execute during any single  probe
              hit which is executed with interrupts enabled (such as begin/end
              probes), default (MAXACTION * 10).

       MAXMAPENTRIES
              Maximum number of rows in any single global array, default 2048.

       MAXERRORS
              Maximum  number  of  soft  errors  before  an exit is triggered,
              default 0, which means  that  the  first  error  will  exit  the
              script.

       MAXSKIPPED

       Maximum  number  of skipped probes before an exit is triggered, default
       100.
              Running systemtap with -t (timing) mode gives more details about
              skipped probes.  MINSTACKSPACE Minimum  number  of  free  kernel
              stack  bytes  required  in order to run a probe handler, default
              1024.   This  number  should  be  large  enough  for  the  probe
              handler’s own needs, plus a safety margin.

       MAXUPROBES
              Maximum   number   of   concurrently   armed  user-space  probes
              (uprobes), default 100 times  the  number  of  user-space  probe
              points  named  in  the  script.   This  pool  is  large  because
              individual uprobe objects are allocated  for  each  process  for
              each script-level probe.

       Multipule  scripts  can  write data into a relay buffer concurrently. A
       host script provides an interface for accessing  its  relay  buffer  to
       guest  scripts.   Then,  the  output  of the guests are merged into the
       output of the host.  To run a script  as  a  host,  execute  stap  with
       -DRELAYHOST[=name]  option.  The name identifies your host script among
       several  hosts.   While   running   the   host,   execute   stap   with
       -DRELAYGUEST[=name]  to  add a guest script to the host.  Note that you
       must unload guests before unloading a host. If there  are  some  guests
       connected to the host, unloading the host will be failed.

       In  case  something  goes  wrong with stap or staprun after a probe has
       already started running, one may safely kill both user  processes,  and
       remove  the  active  probe kernel module with rmmod.  Any pending trace
       messages may be lost.

       In addition to the methods outlined above, the generated kernel  module
       also  uses  overload  processing to make sure that probes can’t run for
       too  long.   If  more  than  STP_OVERLOAD_THRESHOLD   cycles   (default
       500000000) have been spent in all the probes on a single cpu during the
       last STP_OVERLOAD_INTERVAL cycles (default 1000000000), the probes have
       overloaded the system and an exit is triggered.

       By  default,  overload processing is turned on for all modules.  If you
       would like to disable overload processing, define STP_NO_OVERLOAD.

MAKING DO WITH SYMBOL TABLES

       Systemtap performs best when it has access to the debugging information
       associated  with your kernel and modules.  However, if this information
       is not available, systemtap  can  still  support  probing  of  function
       entries  and returns using symbols read from vmlinux and/or the modules
       in /lib/modules.  Systemtap can also read the kernel symbol table  from
       a  text  file  such  as  /boot/System.map  or  /proc/kallsyms.  See the
       ---kelf and ---kmap options.

       If systemtap finds relevant debugging information, it will use it  even
       if you specify ---kelf or ---kmap.

       Without  debugging  information, systemtap cannot support the following
       types of language constructs:

       ·   probe specifications that refer to source files or line numbers

       ·   probe specifications that refer to inline functions

       ·   statements that refer to $target variables

       ·   tapset-defined variables defined using any of the above constructs.
           In  particular,  at  this  writing, the prologue blocks for certain
           aliases in the syscall tapset  (e.g.,  syscall.open)  contain  "if"
           statements  that refer to $target variables.  If your script refers
           to any such aliases, systemtap must have  access  to  the  kernel’s
           debugging information.

       Most  T  and t symbols correspond to function entry points, but some do
       not.  Based only  on  the  symbol  table,  systemtap  cannot  tell  the
       difference.   Placing return probes on symbols that aren’t entry points
       will most likely lead to kernel stack corruption.

FILES

       ~/.systemtap
              Systemtap data directory  for  cached  systemtap  files,  unless
              overridden by the SYSTEMTAP_DIR environment variable.

       /tmp/stapXXXXXX
              Temporary  directory for systemtap files, including translated C
              code and kernel object.

       /usr/share/systemtap/tapset
              The automatic tapset search directory, unless overridden by  the
              SYSTEMTAP_TAPSET environment variable.

       /usr/share/systemtap/runtime
              The  runtime sources, unless overridden by the SYSTEMTAP_RUNTIME
              environment variable.

       /lib/modules/VERSION/build
              The location of kernel module building infrastructure.

       /usr/lib/debug/lib/modules/VERSION
              The location of kernel debugging information when packaged  into
              the    kernel-debuginfo    RPM,   unless   overridden   by   the
              SYSTEMTAP_DEBUGINFO_PATH  environment  variable.   The   default
              value   for   this  variable  is  +:.debug:/usr/lib/debug:build.
              Elfutils searches vmlinux in this path  and  it  interprets  the
              path as a base directory of which various subdirectories will be
              searched for finding modules.

       /usr/bin/staprun
              The auxiliary program supervising module  loading,  interaction,
              and unloading.

SEE ALSO

       stapprobes(5), stapfuncs(5), stapvars(5), stapex(5), awk(1), gdb(1)

BUGS

       Use  the Bugzilla link off of the project web page or our mailing list.
       http://sources.redhat.com/systemtap/,<systemtap@sources.redhat.com>.