Provided by: systemtap-doc_0.0.20090214-1ubuntu1_all bug


       stapprobes - systemtap probe points


       The  following sections enumerate the variety of probe points supported
       by the systemtap translator, and additional aliases defined by standard
       tapset scripts.

       The  general  probe  point  syntax  is  a dotted-symbol sequence.  This
       allows a breakdown of the event namespace into parts, somewhat like the
       Domain Name System does on the Internet.  Each component identifier may
       be parametrized by a string or number literal, with  a  syntax  like  a
       function call.  A component may include a "*" character, to expand to a
       set of matching probe points.  Probe aliases likewise expand  to  other
       probe  points.   Each  and  every  resulting  probe  point  is normally
       resolved to some low-level system  instrumentation  facility  (e.g.,  a
       kprobe  address,  marker,  or  a  timer  configuration),  otherwise the
       elaboration phase will fail.

       However, a probe point may be followed by a "?" character, to  indicate
       that  it  is  optional,  and that no error should result if it fails to
       resolve.  Optionalness passes down through all levels of alias/wildcard
       expansion.   Alternately,  a  probe  point  may  be  followed  by a "!"
       character, to indicate that it is both optional and sufficient.  (Think
       vaguely  of  the  prolog  cut  operator.)  If  it does resolve, then no
       further probe points in the same comma-separated list will be resolved.
       Therefore,  the  "!"   sufficiency  mark  only makes sense in a list of
       probe point alternatives.

       Additionally, a probe point may be followed by a "if (expr)" statement,
       in  order  to  enable/disable the probe point on-the-fly. With the "if"
       statement, if the "expr" is false when the  probe  point  is  hit,  the
       whole  probe  body  including alias’s body is skipped. The condition is
       stacked up through all levels of alias/wildcard expansion. So the final
       condition  becomes  the  logical-and  of  conditions  of  all  expanded

       These are all syntactically valid probe points:

              kernel.function("no_such_function") ?
              module("awol").function("no_such_function") !
              signal.*? if (switch)

       Probes may be broadly classified into "synchronous" and "asynchronous".
       A "synchronous" event is deemed to occur when any processor executes an
       instruction matched by the specification.  This gives  these  probes  a
       reference  point  (instruction address) from which more contextual data
       may  be  available.   Other  families  of   probe   points   refer   to
       "asynchronous" events such as timers/counters rolling over, where there
       is no  fixed  reference  point  that  is  related.   Each  probe  point
       specification   may   match  multiple  locations  (for  example,  using
       wildcards  or  aliases),  and  all  them  are  then  probed.   A  probe
       declaration  may  also  contain several comma-separated specifications,
       all of which are probed.

       The probe points begin and end are defined by the translator  to  refer
       to  the  time  of  session  startup  and  shutdown.   All "begin" probe
       handlers are run, in some sequence, during the startup of the  session.
       All  global  variables  will have been initialized prior to this point.
       All "end" probes are run, in some sequence, during the normal  shutdown
       of  a session, such as in the aftermath of an exit () function call, or
       an interruption from the user.   In  the  case  of  an  error-triggered
       shutdown,  "end"  probes  are  not  run.  There are no target variables
       available in either context.

       If the order of execution among "begin" or "end" probes is significant,
       then an optional sequence number may be provided:


       The  number  N may be positive or negative.  The probe handlers are run
       in increasing order, and the  order  between  handlers  with  the  same
       sequence  number  is  unspecified.   When  "begin"  or  "end" are given
       without a sequence, they are effectively sequence zero.

       The error probe point is similar to the end  probe,  except  that  each
       such  probe  handler  run  when  the  session  ends  after  errors have
       occurred.  In such cases, "end" probes are skipped,  but  each  "error"
       prober  is still attempted.  This kind of probe can be used to clean up
       or emit a "final gasp".  It may also be numerically parametrized to set
       a sequence.

       The  probe  point  never is specially defined by the translator to mean
       "never".  Its probe handler is never run,  though  its  statements  are
       analyzed  for symbol / type correctness as usual.  This probe point may
       be useful in conjunction with optional probes.

       The syscall.*  aliases define  several  hundred  probes,  too  many  to
       summarize here.  They are:


       Generally, two probes are defined for each normal system call as listed
       in the syscalls(2) manual page, one  for  entry  and  one  for  return.
       Those  system  calls  that  never  return  do  not have a corresponding
       .return probe.

       Each probe alias defines a variety of variables. Looking at the  tapset
       source  code is the most reliable way.  Generally, each variable listed
       in the standard  manual  page  is  made  available  as  a  script-level
       variable,  so  exposes  filename,  flags,  and  mode.  In
       addition, a standard suite of variables is available at most aliases:

       argstr A pretty-printed form  of  the  entire  argument  list,  without

       name   The name of the system call.

       retstr For  return  probes,  a  pretty-printed  form of the system-call

       Not all probe aliases obey all of  these  general  guidelines.   Please
       report any bothersome ones you encounter as a bug.

       Intervals defined by the standard kernel "jiffies" timer may be used to
       trigger probe handlers asynchronously.  Two probe  point  variants  are
       supported by the translator:


       The  probe  handler  is  run  every N jiffies (a kernel-defined unit of
       time, typically between 1 and 60 ms).  If the "randomize" component  is
       given,  a  linearly  distributed  random value in the range [-M..+M] is
       added to N every time the  handler  is  run.   N  is  restricted  to  a
       reasonable  range  (1  to  around a million), and M is restricted to be
       smaller than N.  There are  no  target  variables  provided  in  either
       context.   It  is  possible for such probes to be run concurrently on a
       multi-processor computer.

       Alternatively, intervals may be specified in units of time.  There  are
       two probe point variants similar to the jiffies timer:


       Here,  N  and M are specified in milliseconds, but the full options for
       units  are  seconds  (s/sec),  milliseconds   (ms/msec),   microseconds
       (us/usec), nanoseconds (ns/nsec), and hertz (hz).  Randomization is not
       supported for hertz timers.

       The actual resolution of the timers depends on the target kernel.   For
       kernels  prior  to 2.6.17, timers are limited to jiffies resolution, so
       intervals are rounded  up  to  the  nearest  jiffies  interval.   After
       2.6.17,  the implementation uses hrtimers for tighter precision, though
       the actual resolution will be arch-dependent.  In either case,  if  the
       "randomize"  component is given, then the random value will be added to
       the interval before any rounding occurs.

       Profiling timers are also available to provide probes that  execute  on
       all  CPUs at the rate of the system tick (CONFIG_HZ).  This probe takes
       no parameters.


       Full context information  of  the  interrupted  process  is  available,
       making this probe suitable for a time-based sampling profiler.

       This family of probe points uses symbolic debugging information for the
       target  kernel/module/program,  as   may   be   found   in   unstripped
       executables,  or the separate debuginfo packages.  They allow placement
       of probes logically into the execution path of the target  program,  by
       specifying  a  set  of  points  in  the  source or object code.  When a
       matching statement executes on any processor, the probe handler is  run
       in that context.

       Points  in  a kernel, which are identified by module, source file, line
       number, function name, or some combination of these.

       Here is a list  of  probe  point  families  currently  supported.   The
       .function  variant  places  a  probe  near  the  beginning of the named
       function, so that parameters are available as context  variables.   The
       .return  variant places a probe at the moment after the return from the
       named function, so the return  value  is  available  as  the  "$return"
       context  variable.   The  .inline  modifier  for  .function filters the
       results to include only instances  of  inlined  functions.   The  .call
       modifier  selects the opposite subset.  Inline functions do not have an
       identifiable return point, so  .return  is  not  supported  on  .inline
       probes.  The  .statement  variant  places  a  probe  at the exact spot,
       exposing those local variables that are visible there.


       In the above list, MPATTERN stands for a string literal  that  aims  to
       identify the loaded kernel module of interest and LPATTERN stands for a
       source program label.  Both MPATTERN and LPATTERN may include  the  "*"
       "[]", and "?" wildcards.  PATTERN stands for a string literal that aims
       to identify a point in the program.  It is made up of three parts:

       ·   The first part is the name of a function, as would appear in the nm
           program’s  output.   This  part may use the "*" and "?" wildcarding
           operators to match multiple names.

       ·   The second part is optional and begins with the "@" character.   It
           is followed by the path to the source file containing the function,
           which may include a wildcard pattern, such as mm/slab*.  If it does
           not  match  as  is, an implicit "*/" is optionally added before the
           pattern, so that a script need only name the last few components of
           a possibly long source directory path.

       ·   Finally,  the  third  part  is  optional  if the file name part was
           given, and identifies the line number in the source  file  preceded
           by  a  ":"  or a "+".  The line number is assumed to be an absolute
           line number if preceded by a ":", or relative to the entry  of  the
           function  if  preceded by a "+".  All the lines in the function can
           be matched with ":*".  A range of lines x through y can be  matched
           with ":x-y".

       As  an  alternative,  PATTERN  may be a numeric constant, indicating an
       address.  Such an address may  be  found  from  symbol  tables  of  the
       appropriate  kernel / module object file.  It is verified against known
       statement code boundaries, and will be relocated for use at run time.

       In guru mode only, absolute kernel-space  addresses  may  be  specified
       with  the  ".absolute"  suffix.   Such an address is considered already
       relocated, as if it came from /proc/kallsyms, so it cannot  be  checked
       against statement/instruction boundaries.

       Some   of   the   source-level  context  variables,  such  as  function
       parameters, locals, globals visible in the  compilation  unit,  may  be
       visible  to  probe  handlers.   They  may  refer  to these variables by
       prefixing their name with "$"  within  the  scripts.   In  addition,  a
       special  syntax  allows  limited traversal of structures, pointers, and

       $var   refers to an in-scope variable "var".  If it’s  an  integer-like
              type,  it will be cast to a 64-bit int for systemtap script use.
              String-like pointers (char *) may be copied to systemtap  string
              values using the kernel_string or user_string functions.

              traversal  to a structure’s field.  The indirection operator may
              be repeated to follow more levels of pointers.

              is available in  return  probes  only  for  functions  that  are
              declared with a return value.

              indexes into an array.  The index is given with a literal

       $$vars expands to a  character  string  that  is  equivalent  to
              sprintf("parm1=%x  ...  parmN=%x  var1=%x  ...  varN=%x",
              parm1, ..., parmN, var1, ..., varN)

              expands to a subset of $$vars for only local variables.

              expands  to  a  subset  of  $$vars  for   only   function

              is  available  in  return  probes  only.  It expands to a
              string  that  is   equivalent   to   sprintf("return=%x",
              $return)  if  the  probed function has a return value, or
              else an empty string.

       For ".return" probes, context variables other than the "$return"
       value   itself   are   only  available  for  the  function  call
       parameters.  The expressions evaluate to the  entry-time  values
       of  those  variables,  since  that  is when a snapshot is taken.
       Other local variables are not generally accessible, since by the
       time  a  ".return"  probe  hits,  the  probed function will have
       already returned.

       Early prototype support for user-space probing is  available  in
       the form of a non-symbolic probe point:
       is  analogous to kernel.statement(ADDRESS).absolute in that both
       use  raw  (unverified)  virtual   addresses   and   provide   no
       $variables.   The  target  PID parameter must identify a running
       process,  and  ADDRESS  should  identify  a  valid   instruction
       address.  All threads of that process will be probed.

       Additional  user-space  probing  is  available  in the following

       A .begin probe gets called when new process described by PID  or
       PATH gets created.  A .thread.begin probe gets called when a new
       thread described by PID or PATH gets created.  A .end probe gets
       called   when   process  described  by  PID  or  PATH  dies.   A
       .thread.end probe gets called when a thread described by PID  or
       PATH dies.  A .syscall probe gets called when a thread described
       by PID or PATH makes a system call.  The system call  number  is
       available  in  the  $syscall  context  variable, and the first 6
       arguments of the system call are available  in  the  $argN  (ex.
       $arg1,  $arg2,  ...)  context variable.  A .syscall.return probe
       gets called when a thread described by PID or PATH returns  from
       a  system  call.   The  system  call  number is available in the
       $syscall context variable, and the return value  of  the  system
       call  is  available  in the $return context variable.  A .itrace
       probe gets called for every single step of the process described
       by  PID  or  PATH.  A .mark probe gets called via a static probe
       which     is     defined     in     the      application      by
       STAP_PROBE1(handle,LABEL,arg1),  which is defined in sdt.h.  The
       handle is an application handle, LABEL corresponds to the  .mark
       argument,  and  arg1  is  the argument.  STAP_PROBE1 is used for
       probes with 1 argument, STAP_PROBE2 is used for  probes  with  2
       arguments,  and so on.  The arguments of the probe are available
       in the context variables $arg1, $arg2, ...   An  alternative  to
       using  the  STAP_PROBE  macros  is  to  use the dtrace script to
       create custom macros.

       Note that PATH names refer to executables that are searched  the
       same  way  shells  do: relative to the working directory if they
       contain a "/" character, otherwise in $PATH.  If a process probe
       is specified without a PID or PATH, all user threads are probed.

       These    probe    points     allow     procfs     "files"     in
       /proc/systemtap/MODNAME to be created, read and written (MODNAME
       is the name of the systemtap module). The proc filesystem  is  a
       pseudo-filesystem  which  is used an an interface to kernel data
       structures.  There are four probe point  variants  supported  by
       the translator:


       PATH  is  the file name (relative to /proc/systemtap/MODNAME) to
       be created.  If no  PATH  is  specified  (as  in  the  last  two
       variants above), PATH defaults to "command".

       When    a    user    reads   /proc/systemtap/MODNAME/PATH,   the
       corresponding procfs read probe is triggered.  The  string  data
       to  be  read should be assigned to a variable named $value, like

              procfs("PATH").read { $value = "100\n" }

       When  a  user  writes  into  /proc/systemtap/MODNAME/PATH,   the
       corresponding  procfs  write  probe  is triggered.  The data the
       user wrote is available in the  string  variable  named  $value,
       like this:

              procfs("PATH").write { printf("user wrote: %s", $value) }

       This  family  of probe points hooks up to static probing markers
       inserted into the kernel or modules.  These markers are  special
       macro calls inserted by kernel developers to make probing faster
       and more reliable than with DWARF-based probes.  Further,  DWARF
       debugging information is not required to probe markers.

       Marker  probe points begin with kernel.  The next part names the
       marker itself: mark("name").  The marker name string, which  may
       contain  the  usual  wildcard characters, is matched against the
       names given to the marker macros when the kernel  and/or  module
       was  compiled.     Optionally, you can specify format("format").
       Specifying  the  marker  format  string  allows   differentation
       between  two  markers  with  the  same name but different marker
       format strings.

       The handler associated with a marker-based probe  may  read  the
       optional parameters specified at the macro call site.  These are
       named $arg1 through $argNN, where NN is the number of parameters
       supplied  by the macro.  Number and string parameters are passed
       in a type-safe manner.

       The marker format string associated with a marker  is  available
       in  $format.   And  also  the  marker name string is avalable in

       The perfmon family  of  probe  points  is  used  to  access  the
       performance  monitoring hardware available in modern processors.
       This family of probes points needs the perfmon2 support  in  the
       kernel to access the performance monitoring hardware.

       Performance  monitor  hardware points begin with a perfmon.  The
       next part of the names the event being counted counter("event").
       The  event  names are processor implementation specific with the
       execption of the generic cycles and instructions  events,  which
       are  available  on all processors. This sets up a counter on the
       processor  to  count  the  number  of  events  occuring  on  the
       processor. For more details on the performance monitoring events
       available on a  specific  processor  use  the  command  perfmon2

              pfmon -l

              is  a handle used in the body of the probe for operations
              involving the counter associated with the probe.

              is a function that is passed the handle for  the  perfmon
              probe and returns the current count for the event.


       Here  are  some  example  probe  points, defining the associated

       begin, end, end
              refers to the startup and normal shutdown of the session.
              In  this  case, the handler would run once during startup
              and twice during shutdown.

              refers to  a  periodic  interrupt,  every  1000  +/-  200

       kernel.function("*init*"), kernel.function("*exit*")
              refers  to  all kernel functions with "init" or "exit" in
              the name.

              refers to any functions within the "kernel/sched.c"  file
              that span line 240.

              refers  to  an  STAP_MARK(getuid,  ...) macro call in the

              refers to the moment of return from  all  functions  with
              "sync" in the name in any of the USB drivers.

              refers  to the first byte of the statement whose compiled
              instructions include the given address in the kernel.

              refers   to   the   statement   of   line   2917   within

              refers   to  the  statement  at  line  bio_init+3  within

              refers to the group of probe aliases with any name in the
              third position


       stap(1),       stapprobes.iosched(5),      stapprobes.netdev(5),
       stapprobes.nfs(5), stapprobes.nfsd(5),  stapprobes.pagefault(5),
       stapprobes.process(5),   stapprobes.rpc(5),  stapprobes.scsi(5),
       stapprobes.signal(5),  stapprobes.socket(5),  stapprobes.tcp(5),
       stapprobes.udp(5), proc(5)