Provided by: systemtap_0.0.20080705-2ubuntu1_i386 bug


       stapprobes - systemtap probe points


       The  following sections enumerate the variety of probe points supported
       by the systemtap translator, and additional aliases defined by standard
       tapset scripts.

       The  general  probe  point  syntax  is  a dotted-symbol sequence.  This
       allows a breakdown of the event namespace into parts, somewhat like the
       Domain Name System does on the Internet.  Each component identifier may
       be parametrized by a string or number literal, with  a  syntax  like  a
       function call.  A component may include a "*" character, to expand to a
       set of matching probe points.  Probe aliases likewise expand  to  other
       probe  points.   Each  and  every  resulting  probe  point  is normally
       resolved to some low-level system  instrumentation  facility  (e.g.,  a
       kprobe  address,  marker,  or  a  timer  configuration),  otherwise the
       elaboration phase will fail.

       However, a probe point may be followed by a "?" character, to  indicate
       that  it  is  optional,  and that no error should result if it fails to
       resolve.  Optionalness passes down through all levels of alias/wildcard
       expansion.   Alternately,  a  probe  point  may  be  followed  by a "!"
       character, to indicate that it is both optional and sufficient.  (Think
       vaguely  of  the  prolog  cut  operator.)  If  it does resolve, then no
       further probe points in the same comma-separated list will be resolved.
       Therefore,  the  "!"   sufficiency  mark  only makes sense in a list of
       probe point alternatives.

       Additionally, a probe point may be followed by a "if (expr)" statement,
       in  order  to  enable/disable the probe point on-the-fly. With the "if"
       statement, if the "expr" is false when the  probe  point  is  hit,  the
       whole  probe  body  including alias’s body is skipped. The condition is
       stacked up through all levels of alias/wildcard expansion. So the final
       condition  becomes  the  logical-and  of  conditions  of  all  expanded

       These are all syntactically valid probe points:

              kernel.function("no_such_function") ?
              module("awol").function("no_such_function") !
              signal.*? if (switch)

       Probes may be broadly classified into "synchronous" and "asynchronous".
       A "synchronous" event is deemed to occur when any processor executes an
       instruction matched by the specification.  This gives  these  probes  a
       reference  point  (instruction address) from which more contextual data
       may  be  available.   Other  families  of   probe   points   refer   to
       "asynchronous" events such as timers/counters rolling over, where there
       is no  fixed  reference  point  that  is  related.   Each  probe  point
       specification   may   match  multiple  locations  (for  example,  using
       wildcards  or  aliases),  and  all  them  are  then  probed.   A  probe
       declaration  may  also  contain several comma-separated specifications,
       all of which are probed.

       The probe points begin and end are defined by the translator  to  refer
       to  the  time  of  session  startup  and  shutdown.   All "begin" probe
       handlers are run, in some sequence, during the startup of the  session.
       All  global  variables  will have been initialized prior to this point.
       All "end" probes are run, in some sequence, during the normal  shutdown
       of  a session, such as in the aftermath of an exit () function call, or
       an interruption from the user.   In  the  case  of  an  error-triggered
       shutdown,  "end"  probes  are  not  run.  There are no target variables
       available in either context.

       If the order of execution among "begin" or "end" probes is significant,
       then an optional sequence number may be provided:


       The  number  N may be positive or negative.  The probe handlers are run
       in increasing order, and the  order  between  handlers  with  the  same
       sequence  number  is  unspecified.   When  "begin"  or  "end" are given
       without a sequence, they are effectively sequence zero.

       The error probe point is similar to the end  probe,  except  that  each
       such  probe  handler  run  when  the  session  ends  after  errors have
       occurred.  In such cases, "end" probes are skipped,  but  each  "error"
       prober  is still attempted.  This kind of probe can be used to clean up
       or emit a "final gasp".  It may also be numerically parametrized to set
       a sequence.

       The  probe  point  never is specially defined by the translator to mean
       "never".  Its probe handler is never run,  though  its  statements  are
       analyzed  for symbol / type correctness as usual.  This probe point may
       be useful in conjunction with optional probes.

       Intervals defined by the standard kernel "jiffies" timer may be used to
       trigger  probe  handlers  asynchronously.  Two probe point variants are
       supported by the translator:


       The probe handler is run every N  jiffies  (a  kernel-defined  unit  of
       time,  typically between 1 and 60 ms).  If the "randomize" component is
       given, a linearly distributed random value in  the  range  [-M..+M]  is
       added  to  N  every  time  the  handler  is  run.  N is restricted to a
       reasonable range (1 to around a million), and M  is  restricted  to  be
       smaller  than  N.   There  are  no  target variables provided in either
       context.  It is possible for such probes to be run  concurrently  on  a
       multi-processor computer.

       Alternatively,  intervals may be specified in units of time.  There are
       two probe point variants similar to the jiffies timer:


       Here, N and M are specified in milliseconds, but the full  options  for
       units   are   seconds  (s/sec),  milliseconds  (ms/msec),  microseconds
       (us/usec), nanoseconds (ns/nsec), and hertz (hz).  Randomization is not
       supported for hertz timers.

       The  actual resolution of the timers depends on the target kernel.  For
       kernels prior to 2.6.17, timers are limited to jiffies  resolution,  so
       intervals  are  rounded  up  to  the  nearest  jiffies interval.  After
       2.6.17, the implementation uses hrtimers for tighter precision,  though
       the  actual  resolution will be arch-dependent.  In either case, if the
       "randomize" component is given, then the random value will be added  to
       the interval before any rounding occurs.

       Profiling  timers  are also available to provide probes that execute on
       all CPUs at  the  rate  of  the  system  tick.   This  probe  takes  no


       Full  context  information  of  the  interrupted  process is available,
       making this probe suitable for a time-based sampling profiler.

       This family of probe points uses symbolic debugging information for the
       target   kernel/module/program,   as   may   be   found  in  unstripped
       executables, or the separate debuginfo packages.  They allow  placement
       of  probes  logically into the execution path of the target program, by
       specifying a set of points in  the  source  or  object  code.   When  a
       matching  statement executes on any processor, the probe handler is run
       in that context.

       Points in a kernel, which are identified by module, source  file,  line
       number, function name, or some combination of these.

       Here  is  a  list  of  probe  point  families currently supported.  The
       .function variant places a  probe  near  the  beginning  of  the  named
       function,  so  that parameters are available as context variables.  The
       .return variant places a probe at the moment after the return from  the
       named  function,  so  the  return  value  is available as the "$return"
       context variable.  The  .inline  modifier  for  .function  filters  the
       results  to  include  only  instances  of inlined functions.  The .call
       modifier selects the opposite subset.  Inline functions do not have  an
       identifiable  return  point,  so  .return  is  not supported on .inline
       probes. The .statement variant  places  a  probe  at  the  exact  spot,
       exposing those local variables that are visible there.


       In  the  above  list, MPATTERN stands for a string literal that aims to
       identify the loaded kernel module of interest.   It  may  include  "*",
       "[]", and "?" wildcards.  PATTERN stands for a string literal that aims
       to identify a point in the program.  It is made up of three parts:

       ·   The first part is the name of a function, as would appear in the nm
           program’s  output.   This  part may use the "*" and "?" wildcarding
           operators to match multiple names.

       ·   The second part is optional and begins with the "@" character.   It
           is followed by the path to the source file containing the function,
           which may include a wildcard pattern, such as  mm/slab*.   In  most
           cases,  the  path should be relative to the top of the linux source
           directory, although an absolute path  may  be  necessary  for  some
           kernels.  If a relative pathname doesn’t work, try absolute.

       ·   Finally,  the  third  part  is  optional  if the file name part was
           given, and identifies the line number in the source  file  preceded
           by  a  ":"  or a "+".  The line number is assumed to be an absolute
           line number if preceded by a ":", or relative to the entry  of  the
           function  if  preceded by a "+".  All the lines in the function can
           be matched with ":*".  A range of lines x through y can be  matched
           with ":x-y".

       As  an  alternative,  PATTERN  may be a numeric constant, indicating an
       (module-relative or  kernel-_stext-relative)  address.   In  guru  mode
       only,  absolute  kernel addresses may be specified with the ".absolute"

       Some  of  the  source-level  context  variables,   such   as   function
       parameters,  locals,  globals  visible  in the compilation unit, may be
       visible to probe handlers.   They  may  refer  to  these  variables  by
       prefixing  their  name  with  "$"  within  the scripts.  In addition, a
       special syntax allows limited traversal of  structures,  pointers,  and

       $var   refers  to  an in-scope variable "var".  If it’s an integer-like
              type, it will be cast to a 64-bit int for systemtap script  use.
              String-like  pointers (char *) may be copied to systemtap string
              values using the kernel_string or user_string functions.

              traversal to a structure’s field.  The indirection operator  may
              be repeated to follow more levels of pointers.

              indexes  into  an  array.   The  index  is  given with a literal

       For ".return" probes, context variables other than the "$return"  value
       itself  are  only  available  for  the  function  call parameters.  The
       expressions evaluate to the entry-time values of those variables, since
       that  is  when  a  snapshot  is  taken.   Other local variables are not
       generally accessible, since by the time a  ".return"  probe  hits,  the
       probed function will have already returned.

       Early prototype support for user-space probing is available in the form
       of a non-symbolic probe point:
       is analogous to kernel.statement(ADDRESS).absolute in that both use raw
       (unverified)  virtual  addresses and provide no $variables.  The target
       PID parameter must identify  a  running  process,  and  ADDRESS  should
       identify a valid instruction address.  All threads of that process will
       be probed.

       Additional user-space probing is available in the following forms:

       A .begin probe gets called when new process described by  PID  or  PATH
       gets  created.   A  .thread.begin  probe  gets called when a new thread
       described by PID or PATH gets created.  A .end probe gets  called  when
       process described by PID or PATH dies.  A .thread.end probe gets called
       when a thread described by PID or PATH dies.   A  .syscall  probe  gets
       called when a thread described by PID or PATH makes a system call.  The
       system call number is available in the "$syscall" context variable.   A
       .syscall.return  probe  gets  called  when a thread described by PID or
       PATH returns from a system call.  The system call number  is  available
       in the "$syscall" context variable.

       Note that PATH pathnames must be absolute.

       These  probe  points allow procfs "files" in /proc/systemtap/MODNAME to
       be created, read and written (MODNAME is  the  name  of  the  systemtap
       module). The proc filesystem is a pseudo-filesystem which is used an an
       interface to kernel  data  structures.   There  are  four  probe  point
       variants supported by the translator:


       PATH  is  the  file  name  (relative  to /proc/systemtap/MODNAME) to be
       created.  If no PATH is specified (as in the last two variants  above),
       PATH defaults to "command".

       When  a  user  reads  /proc/systemtap/MODNAME/PATH,  the  corresponding
       procfs read probe is triggered.  The string data to be read  should  be
       assigned to a variable named $value, like this:

              procfs("PATH").read { $value = "100\n" }

       When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding
       procfs write probe is triggered.  The data the user wrote is  available
       in the string variable named $value, like this:

              procfs("PATH").write { printf("user wrote: %s", $value) }

       This family of probe points hooks up to static probing markers inserted
       into the kernel or modules.  These  markers  are  special  macro  calls
       inserted  by kernel developers to make probing faster and more reliable
       than with DWARF-based probes.  Further, DWARF debugging information  is
       not required to probe markers.

       Marker  probe points begin with kernel.  The next part names the marker
       itself: mark("name").  The marker name string, which  may  contain  the
       usual  wildcard  characters,  is matched against the names given to the
       marker  macros  when   the   kernel   and/or   module   was   compiled.
       Optionally,  you  can  specify format("format").  Specifying the marker
       format string allows differentation between two markers with  the  same
       name but different marker format strings.

       The  handler associated with a marker-based probe may read the optional
       parameters specified at the macro call site.   These  are  named  $arg1
       through  $argNN,  where  NN is the number of parameters supplied by the
       macro.  Number and string parameters are passed in a type-safe  manner.

       The  marker  format  string  associated  with  a marker is available in

       The perfmon family of probe points is used to  access  the  performance
       monitoring  hardware  available  in  modern  processors. This family of
       probes points needs the perfmon2 support in the kernel  to  access  the
       performance monitoring hardware.

       Performance  monitor  hardware  points  begin with a perfmon.  The next
       part of the names the event being counted counter("event").  The  event
       names  are  processor implementation specific with the execption of the
       generic cycles and instructions events,  which  are  available  on  all
       processors. This sets up a counter on the processor to count the number
       of  events  occuring  on  the  processor.  For  more  details  on   the
       performance monitoring events available on a specific processor use the
       command perfmon2 command:

              pfmon -l

              is a handle used  in  the  body  of  the  probe  for  operations
              involving the counter associated with the probe.

              is  a  function  that is passed the handle for the perfmon probe
              and returns the current count for the event.


       Here are some example probe points, defining the associated events.

       begin, end, end
              refers to the startup and normal shutdown of  the  session.   In
              this  case,  the handler would run once during startup and twice
              during shutdown.

              refers to a periodic interrupt, every 1000 +/- 200 jiffies.

       kernel.function("*init*"), kernel.function("*exit*")
              refers to all kernel functions with  "init"  or  "exit"  in  the

              refers  to  any  functions within the "kernel/sched.c" file that
              span line 240.

              refers to an STAP_MARK(getuid, ...) macro call in the kernel.

              refers to the moment of return from all functions with "sync" in
              the name in any of the USB drivers.

              refers  to  the  first  byte  of  the  statement  whose compiled
              instructions include the given address in the kernel.

              refers to the statement of line 2917 within "kernel/sched.c".

              refers to the statement at line bio_init+3 within "fs/bio.c".

              refers to the group of probe aliases with any name in the  third


       stap(1),          stapprobes.iosched(5),          stapprobes.netdev(5),
       stapprobes.nfs(5),     stapprobes.nfsd(5),     stapprobes.pagefault(5),
       stapprobes.process(5),      stapprobes.rpc(5),      stapprobes.scsi(5),
       stapprobes.signal(5),     stapprobes.socket(5),      stapprobes.tcp(5),
       stapprobes.udp(5), proc(5)