Provided by: systemtap-doc_5.0-2ubuntu1_amd64 

NAME
stapprobes - systemtap probe points
DESCRIPTION
The following sections enumerate the variety of probe points supported by the systemtap translator, and
some of the additional aliases defined by standard tapset scripts. Many are individually documented in
the 3stap manual section, with the probe:: prefix.
SYNTAX
probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
A probe declaration may list multiple comma-separated probe points in order to attach a handler to all of
the named events. Normally, the handler statements are run whenever any of events occur. Depending on
the type of probe point, the handler statements may refer to context variables (denoted with a dollar-
sign prefix like $foo) to read or write state. This may include function parameters for function probes,
or local variables for statement probes.
The syntax of a single probe point is a general dotted-symbol sequence. This allows a breakdown of the
event namespace into parts, somewhat like the Domain Name System does on the Internet. Each component
identifier may be parametrized by a string or number literal, with a syntax like a function call. A com‐
ponent may include a "*" character, to expand to a set of matching probe points. It may also include
"**" to match multiple sequential components at once. Probe aliases likewise expand to other probe
points.
Probe aliases can be given on their own, or with a suffix. The suffix attaches to the underlying probe
point that the alias is expanded to. For example,
syscall.read.return.maxactive(10)
expands to
kernel.function("sys_read").return.maxactive(10)
with the component maxactive(10) being recognized as a suffix.
Normally, each and every probe point resulting from wildcard- and alias-expansion must be resolved to
some low-level system instrumentation facility (e.g., a kprobe address, marker, or a timer configura‐
tion), otherwise the elaboration phase will fail.
However, a probe point may be followed by a "?" character, to indicate that it is optional, and that no
error should result if it fails to resolve. Optionalness passes down through all levels of alias/wild‐
card expansion. Alternately, a probe point may be followed by a "!" character, to indicate that it is
both optional and sufficient. (Think vaguely of the Prolog cut operator.) If it does resolve, then no
further probe points in the same comma-separated list will be resolved. Therefore, the "!" sufficiency
mark only makes sense in a list of probe point alternatives.
Additionally, a probe point may be followed by a "if (expr)" statement, in order to enable/disable the
probe point on-the-fly. With the "if" statement, if the "expr" is false when the probe point is hit, the
whole probe body including alias's body is skipped. The condition is stacked up through all levels of
alias/wildcard expansion. So the final condition becomes the logical-and of conditions of all expanded
alias/wildcard. The expressions are necessarily restricted to global variables.
These are all syntactically valid probe points. (They are generally semantically invalid, depending on
the contents of the tapsets, and the versions of kernel/user software installed.)
kernel.function("foo").return
process("/bin/vi").statement(0x2222)
end
syscall.*
syscall.*.return.maxactive(10)
syscall.{open,close}
sys**open
kernel.function("no_such_function") ?
module("awol").function("no_such_function") !
signal.*? if (switch)
kprobe.function("foo")
Probes may be broadly classified into "synchronous" and "asynchronous". A "synchronous" event is deemed
to occur when any processor executes an instruction matched by the specification. This gives these
probes a reference point (instruction address) from which more contextual data may be available. Other
families of probe points refer to "asynchronous" events such as timers/counters rolling over, where there
is no fixed reference point that is related. Each probe point specification may match multiple locations
(for example, using wildcards or aliases), and all them are then probed. A probe declaration may also
contain several comma-separated specifications, all of which are probed.
Brace expansion is a mechanism which allows a list of probe points to be generated. It is very similar to
shell expansion. A component may be surrounded by a pair of curly braces to indicate that the comma-sepa‐
rated sequence of one or more subcomponents will each constitute a new probe point. The braces may be ar‐
bitrarily nested. The ordering of expanded results is based on product order.
The question mark (?), exclamation mark (!) indicators and probe point conditions may not be placed in
any expansions that are before the last component.
The following is an example of brace expansion.
syscall.{write,read}
# Expands to
syscall.write, syscall.read
{kernel,module("nfs")}.function("nfs*")!
# Expands to
kernel.function("nfs*")!, module("nfs").function("nfs*")!
DWARF DEBUGINFO
Resolving some probe points requires DWARF debuginfo or "debug symbols" for the specific program being
instrumented. For some others, DWARF is automatically synthesized on the fly from source code header
files. For others, it is not needed at all. Since a systemtap script may use any mixture of probe
points together, the union of their DWARF requirements has to be met on the computer where script compi‐
lation occurs. (See the --use-server option and the stap-server(8) man page for information about the
remote compilation facility, which allows these requirements to be met on a different machine.)
The following point lists many of the available probe point families, to classify them with respect to
their need for DWARF debuginfo for the specific program for that probe point.
DWARF NON-DWARF SYMBOL-TABLE
kernel.function, .statement kernel.mark kernel.function*
module.function, .statement process.mark, process.plt module.function*
process.function, .statement begin, end, error, never process.function*
process.mark* timer
.function.callee perf
python2, python3 procfs
debuginfod kernel.statement.absolute
kernel.data
AUTO-GENERATED-DWARF kprobe.function
kernel.trace process.statement.absolute
process.begin, .end
netfilter
java
The probe types marked with * asterisks mark fallbacks, where systemtap can sometimes infer subset or
substitute information. In general, the more symbolic / debugging information available, the higher
quality probing will be available.
ON-THE-FLY ARMING
The following types of probe points may be armed/disarmed on-the-fly to save overheads during uninterest‐
ing times. Arming conditions may also be added to other types of probes, but will be treated as a wrap‐
ping conditional and won't benefit from overhead savings.
DISARMABLE exceptions
kernel.function, kernel.statement
module.function, module.statement
process.*.function, process.*.statement
process.*.plt, process.*.mark
timer. timer.profile
java
PROBE POINT FAMILIES
BEGIN/END/ERROR
The probe points begin and end are defined by the translator to refer to the time of session startup and
shutdown. All "begin" probe handlers are run, in some sequence, during the startup of the session. All
global variables will have been initialized prior to this point. All "end" probes are run, in some se‐
quence, during the normal shutdown of a session, such as in the aftermath of an exit () function call, or
an interruption from the user. In the case of an error-triggered shutdown, "end" probes are not run.
There are no target variables available in either context.
If the order of execution among "begin" or "end" probes is significant, then an optional sequence number
may be provided:
begin(N)
end(N)
The number N may be positive or negative. The probe handlers are run in increasing order, and the order
between handlers with the same sequence number is unspecified. When "begin" or "end" are given without a
sequence, they are effectively sequence zero.
The error probe point is similar to the end probe, except that each such probe handler run when the ses‐
sion ends after errors have occurred. In such cases, "end" probes are skipped, but each "error" probe is
still attempted. This kind of probe can be used to clean up or emit a "final gasp". It may also be nu‐
merically parametrized to set a sequence.
NEVER
The probe point never is specially defined by the translator to mean "never". Its probe handler is never
run, though its statements are analyzed for symbol / type correctness as usual. This probe point may be
useful in conjunction with optional probes.
SYSCALL and ND_SYSCALL
The syscall.* and nd_syscall.* aliases define several hundred probes, too many to detail here. They are
of the general form:
syscall.NAME
nd_syscall.NAME
syscall.NAME.return
nd_syscall.NAME.return
Generally, a pair of probes are defined for each normal system call as listed in the syscalls(2) manual
page, one for entry and one for return. Those system calls that never return do not have a corresponding
.return probe. The nd_* family of probes are about the same, except it uses non-DWARF based searching
mechanisms, which may result in a lower quality of symbolic context data (parameters), and may miss some
system calls. You may want to try them first, in case kernel debugging information is not immediately
available.
Each probe alias provides a variety of variables. Looking at the tapset source code is the most reliable
way. Generally, each variable listed in the standard manual page is made available as a script-level
variable, so syscall.open exposes filename, flags, and mode. In addition, a standard suite of variables
is available at most aliases:
argstr A pretty-printed form of the entire argument list, without parentheses.
name The name of the system call.
retval For return probes, the raw numeric system-call result.
retstr For return probes, a pretty-printed string form of the system-call result.
As usual for probe aliases, these variables are all initialized once from the underlying $context vari‐
ables, so that later changes to $context variables are not automatically reflected. Not all probe alias‐
es obey all of these general guidelines. Please report any bothersome ones you encounter as a bug. Note
that on some kernel/userspace architecture combinations (e.g., 32-bit userspace on 64-bit kernel), the
underlying $context variables may need explicit sign extension / masking. When this is an issue, consid‐
er using the tapset-provided variables instead of raw $context variables.
If debuginfo availability is a problem, you may try using the non-DWARF syscall probe aliases instead.
Use the nd_syscall. prefix instead of syscall. The same context variables are available, as far as pos‐
sible.
nd_syscall probes on kernels that use syscall wrappers to pass arguments via pt_regs (currently 4.17+ on
x86_64 and 4.19+ on aarch64) support syscall argument writing when guru mode is enabled. If a probe
syscall parameter is modified in the probe body then immediately before the probe exits the parameter's
current value will be written to pt_regs. This overwrites the previous value. nd_syscall probes also in‐
clude two parameters for each of the syscall's string parameters. One holds a quoted version of the
string passed to the syscall. The other holds an unquoted version of the string intended to be used when
modifying the parameter. If the probe modifies the unquoted string variable then as the probe is about
to exit the contents of this variable will be written to the user space buffer passed to the syscall. It
is the user's responsibility to ensure that this buffer is large enough to hold the modified string and
that it is located in a writable memory segment.
TIMERS
There are two main types of timer probes: "jiffies" timer probes and time interval timer probes.
Intervals defined by the standard kernel "jiffies" timer may be used to trigger probe handlers asynchro‐
nously. Two probe point variants are supported by the translator:
timer.jiffies(N)
timer.jiffies(N).randomize(M)
The probe handler is run every N jiffies (a kernel-defined unit of time, typically between 1 and 60 ms).
If the "randomize" component is given, a linearly distributed random value in the range [-M..+M] is added
to N every time the handler is run. N is restricted to a reasonable range (1 to around a million), and M
is restricted to be smaller than N. There are no target variables provided in either context. It is
possible for such probes to be run concurrently on a multi-processor computer.
Alternatively, intervals may be specified in units of time. There are two probe point variants similar
to the jiffies timer:
timer.ms(N)
timer.ms(N).randomize(M)
Here, N and M are specified in milliseconds, but the full options for units are seconds (s/sec), mil‐
liseconds (ms/msec), microseconds (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
supported for hertz timers.
The actual resolution of the timers depends on the target kernel. For kernels prior to 2.6.17, timers
are limited to jiffies resolution, so intervals are rounded up to the nearest jiffies interval. After
2.6.17, the implementation uses hrtimers for tighter precision, though the actual resolution will be
arch-dependent. In either case, if the "randomize" component is given, then the random value will be
added to the interval before any rounding occurs.
Profiling timers are also available to provide probes that execute on all CPUs at the rate of the system
tick (CONFIG_HZ) or at a given frequency (hz). On some kernels, this is a one-concurrent-user-only or
disabled facility, resulting in error -16 (EBUSY) during probe registration.
timer.profile.tick
timer.profile.freq.hz(N)
Full context information of the interrupted process is available, making this probe suitable for a time-
based sampling profiler.
It is recommended to use the tapset probe timer.profile rather than timer.profile.tick. This probe point
behaves identically to timer.profile.tick when the underlying functionality is available, and falls back
to using perf.sw.cpu_clock on some recent kernels which lack the corresponding profile timer facility.
Profiling timers with specified frequencies are only accurate up to around 100 hz. You may need to pro‐
vide a larger value to achieve the desired rate.
Note that if a timer probe is set to fire at a very high rate and if the probe body is complex, succeed‐
ing timer probes can get skipped, since the time for them to run has already passed. Normally systemtap
reports missed probes, but it will not report these skipped probes.
DWARF
This family of probe points uses symbolic debugging information for the target kernel/module/program, as
may be found in unstripped executables, or the separate debuginfo packages. They allow placement of
probes logically into the execution path of the target program, by specifying a set of points in the
source or object code. When a matching statement executes on any processor, the probe handler is run in
that context.
Probe points in the DWARF family can be identified by the target kernel module (or user process), source
file, line number, function name, or some combination of these.
Here is a list of DWARF probe points currently supported:
kernel.function(PATTERN)
kernel.function(PATTERN).call
kernel.function(PATTERN).callee(PATTERN)
kernel.function(PATTERN).callee(PATTERN).return
kernel.function(PATTERN).callee(PATTERN).call
kernel.function(PATTERN).callees(DEPTH)
kernel.function(PATTERN).return
kernel.function(PATTERN).inline
kernel.function(PATTERN).label(LPATTERN)
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).callee(PATTERN)
module(MPATTERN).function(PATTERN).callee(PATTERN).return
module(MPATTERN).function(PATTERN).callee(PATTERN).call
module(MPATTERN).function(PATTERN).callees(DEPTH)
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).inline
module(MPATTERN).function(PATTERN).label(LPATTERN)
kernel.statement(PATTERN)
kernel.statement(PATTERN).nearest
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
process("PATH").function("NAME")
process("PATH").statement("*@FILE.c:123")
process("PATH").library("PATH").function("NAME")
process("PATH").library("PATH").statement("*@FILE.c:123")
process("PATH").library("PATH").statement("*@FILE.c:123").nearest
process("PATH").function("*").return
process("PATH").function("myfun").label("foo")
process("PATH").function("foo").callee("bar")
process("PATH").function("foo").callee("bar").return
process("PATH").function("foo").callee("bar").call
process("PATH").function("foo").callees(DEPTH)
process(PID).function("NAME")
process(PID).function("myfun").label("foo")
process(PID).plt("NAME")
process(PID).plt("NAME").return
process(PID).statement("*@FILE.c:123")
process(PID).statement("*@FILE.c:123").nearest
process(PID).statement(ADDRESS).absolute
debuginfod.process("PATH").**
(See the USER-SPACE section below for more information on the process probes.)
The list above includes multiple variants and modifiers which provide additional functionality or fil‐
ters. They are:
.function
Places a probe near the beginning of the named function, so that parameters are available
as context variables.
.return
Places a probe at the moment after the return from the named function, so the return value
is available as the "$return" context variable.
.inline
Filters the results to include only instances of inlined functions. Note that inlined func‐
tions do not have an identifiable return point, so .return is not supported on .inline
probes.
.call Filters the results to include only non-inlined functions (the opposite set of .inline)
.exported
Filters the results to include only exported functions.
.statement
Places a probe at the exact spot, exposing those local variables that are visible there.
.statement.nearest
Places a probe at the nearest available line number for each line number given in the
statement.
.callee
Places a probe on the callee function given in the .callee modifier, where the callee must
be a function called by the target function given in .function. The advantage of doing this
over directly probing the callee function is that this probe point is run only when the
callee is called from the target function (add the -DSTAP_CALLEE_MATCHALL directive to
override this when calling stap(1)).
Note that only callees that can be statically determined are available. For example, calls
through function pointers are not available. Additionally, calls to functions located in
other objects (e.g. libraries) are not available (instead use another probe point). This
feature will only work for code compiled with GCC 4.7+.
.callees
Shortcut for .callee("*"), which places a probe on all callees of the function.
.callees(DEPTH)
Recursively places probes on callees. For example, .callees(2) will probe both callees of
the target function, as well as callees of those callees. And .callees(3) goes one level
deeper, etc... A callee probe at depth N is only triggered when the N callers in the call‐
stack match those that were statically determined during analysis (this also may be over‐
ridden using -DSTAP_CALLEE_MATCHALL).
In the above list of probe points, MPATTERN stands for a string literal that aims to identify the loaded
kernel module of interest. For in-tree kernel modules, the name suffices (e.g. "btrfs"). The name may al‐
so include the "*", "[]", and "?" wildcards to match multiple in-tree modules. Out-of-tree modules are
also supported by specifying the full path to the ko file. Wildcards are not supported. The file must
follow the convention of being named <module_name>.ko (characters ',' and '-' are replaced by '_').
LPATTERN stands for a source program label. It may also contain "*", "[]", and "?" wildcards. PATTERN
stands for a string literal that aims to identify a point in the program. It is made up of three parts:
• The first part is the name of a function, as would appear in the nm program's output. This part may
use the "*" and "?" wildcarding operators to match multiple names.
• The second part is optional and begins with the "@" character. It is followed by the path to the
source file containing the function, which may include a wildcard pattern, such as mm/slab*. If it
does not match as is, an implicit "*/" is optionally added before the pattern, so that a script need
only name the last few components of a possibly long source directory path.
• Finally, the third part is optional if the file name part was given, and identifies the line number
in the source file preceded by a ":" or a "+". The line number is assumed to be an absolute line
number if preceded by a ":", or relative to the declaration line of the function if preceded by a
"+". All the lines in the function can be matched with ":*". A range of lines x through y can be
matched with ":x-y". Ranges and specific lines can be mixed using commas, e.g. ":x,y-z".
As an alternative, PATTERN may be a numeric constant, indicating an address. Such an address may be
found from symbol tables of the appropriate kernel / module object file. It is verified against known
statement code boundaries, and will be relocated for use at run time.
In guru mode only, absolute kernel-space addresses may be specified with the ".absolute" suffix. Such an
address is considered already relocated, as if it came from /proc/kallsyms, so it cannot be checked
against statement/instruction boundaries.
CONTEXT VARIABLES
Many of the source-level context variables, such as function parameters, locals, globals visible in the
compilation unit, may be visible to probe handlers. They may refer to these variables by prefixing their
name with "$" within the scripts. In addition, a special syntax allows limited traversal of structures,
pointers, and arrays. More syntax allows pretty-printing of individual variables or their groups. See
also @cast. Note that variables may be inaccessible due to them being paged out, or for a few other rea‐
sons. See also man error::fault(7stap).
Functions called from DWARF class probe points and from process.mark probes may also refer to context
variables.
$var refers to an in-scope variable or thread local storage variable "var". If it's an integer-like
type, it will be cast to a 64-bit int for systemtap script use. String-like pointers (char *) may
be copied to systemtap string values using the kernel_string or user_string functions.
@var("varname")
an alternative syntax for $varname
@var("varname","module")
The global variable or global thread local storage variable in scope of the given module already
loaded into the current probed process. Useful to get an exported variable in a shared library
loaded into the process being probed, or a global variable in a process while a shared library
probe is being executed. For user-space modules only. For example: @var("_r_debug","/lib/ld-lin‐
ux.so.2")
@var("varname@src/file.c")
refers to the global (either file local or external) variable varname defined when the file
src/file.c was compiled. The CU in which the variable is resolved is the first CU in the module of
the probe point which matches the given file name at the end and has the shortest file name path
(e.g. given @var("foo@bar/baz.c") and CUs with file name paths src/sub/module/bar/baz.c and
src/bar/baz.c the second CU will be chosen to resolve the (file) global variable foo
@var("varname@src/file.c","module")
The global variable in scope of the given CU, defined in the given module, even if the variable is
static (so the name is not unique without the CU name).
$var->field traversal via a structure's or a pointer's field. This
generalized indirection operator may be repeated to follow more levels. Note that the . operator
is not used for plain structure members, only -> for both purposes. (This is because "." is re‐
served for string concatenation.) Also note that for direct dereferencing of $var pointer {ker‐
nel,user}_{char,int,...}($var) should be used. (Refer to stapfuncs(5) for more details.)
$return
is available in return probes only for functions that are declared with a return value, which can
be determined using @defined($return).
$var[N]
indexes into an array. The index given with a literal number or even an arbitrary numeric expres‐
sion.
A number of operators exist for such basic context variable expressions:
$$vars expands to a character string that is equivalent to
sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
parm1, ..., parmN, var1, ..., varN)
for each variable in scope at the probe point. Some values may be printed as =? if their run-
time location cannot be found.
$$locals
expands to a subset of $$vars for only local variables.
$$parms
expands to a subset of $$vars for only function parameters.
$$return
is available in return probes only. It expands to a string that is equivalent to sprintf("re‐
turn=%x", $return) if the probed function has a return value, or else an empty string.
& $EXPR
expands to the address of the given context variable expression, if it is addressable.
@defined($EXPR)
expands to 1 or 0 iff the given context variable expression is resolvable, for use in conditionals
such as
@defined($foo->bar) ? $foo->bar : 0
@probewrite($VAR)
see the PROBES section of stap(1).
$EXPR$ expands to a string with all of $EXPR's members, equivalent to
sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
$EXPR->a, $EXPR->b)
$EXPR$$
expands to a string with all of $var's members and submembers, equivalent to
sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
$EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
@errno expands to the last value the C library global variable errno was set to.
MORE ON RETURN PROBES
For the kernel ".return" probes, only a certain fixed number of returns may be outstanding. The default
is a relatively small number, on the order of a few times the number of physical CPUs. If many different
threads concurrently call the same blocking function, such as futex(2) or read(2), this limit could be
exceeded, and skipped "kretprobes" would be reported by "stap -t". To work around this, specify a
probe FOO.return.maxactive(NNN)
suffix, with a large enough NNN to cover all expected concurrently blocked threads. Alternately, use the
stap -DKRETACTIVE=NNNN
stap command line macro setting to override the default for all ".return" probes.
For ".return" probes, context variables other than the "$return" may be accessible, as a convenience for
a script programmer wishing to access function parameters. These values are snapshots taken at the time
of function entry. (Local variables within the function are not generally accessible, since those vari‐
ables did not exist in allocated/initialized form at the snapshot moment.) These entry-snapshot vari‐
ables should be accessed via @entry($var).
In addition, arbitrary entry-time expressions can also be saved for ".return" probes using the @entry(ex‐
pr) operator. For example, one can compute the elapsed time of a function:
probe kernel.function("do_filp_open").return {
println( get_timeofday_us() - @entry(get_timeofday_us()) )
}
The following table summarizes how values related to a function parameter context variable, a pointer
named addr, may be accessed from a .return probe.
at-entry value past-exit value
$addr not available
$addr->x->y @cast(@entry($addr),"struct zz")->x->y
$addr[0] {kernel,user}_{char,int,...}(& $addr[0])
DWARFLESS
In absence of debugging information, entry & exit points of kernel & module functions can be probed using
the "kprobe" family of probes. However, these do not permit looking up the arguments / local variables
of the function. Following constructs are supported :
kprobe.function(FUNCTION)
kprobe.function(FUNCTION).call
kprobe.function(FUNCTION).return
kprobe.module(NAME).function(FUNCTION)
kprobe.module(NAME).function(FUNCTION).call
kprobe.module(NAME).function(FUNCTION).return
kprobe.statement(ADDRESS).absolute
Probes of type function are recommended for kernel functions, whereas probes of type module are recom‐
mended for probing functions of the specified module. In case the absolute address of a kernel or module
function is known, statement probes can be utilized.
Note that FUNCTION and MODULE names must not contain wildcards, or the probe will not be registered. Al‐
so, statement probes must be run under guru-mode only.
USER-SPACE
Support for user-space probing is available for kernels that are configured with the utrace extensions,
or have the uprobes facility in linux 3.5. (Various kernel build configuration options need to be en‐
abled; systemtap will advise if these are missing.)
There are several forms. First, a non-symbolic probe point:
process(PID).statement(ADDRESS).absolute
is analogous to kernel.statement(ADDRESS).absolute in that both use raw (unverified) virtual addresses
and provide no $variables. The target PID parameter must identify a running process, and ADDRESS should
identify a valid instruction address. All threads of that process will be probed.
Second, non-symbolic user-kernel interface events handled by utrace may be probed:
process(PID).begin
process("FULLPATH").begin
process.begin
process(PID).thread.begin
process("FULLPATH").thread.begin
process.thread.begin
process(PID).end
process("FULLPATH").end
process.end
process(PID).thread.end
process("FULLPATH").thread.end
process.thread.end
process(PID).syscall
process("FULLPATH").syscall
process.syscall
process(PID).syscall.return
process("FULLPATH").syscall.return
process.syscall.return
A process.begin probe gets called when new process described by PID or FULLPATH gets created. In addi‐
tion, it is called once from the context of each preexisting process, at systemtap script startup. This
is useful to track live processes. A process.thread.begin probe gets called when a new thread described
by PID or FULLPATH gets created. A process.end probe gets called when process described by PID or FULL‐
PATH dies. A process.thread.end probe gets called when a thread described by PID or FULLPATH dies. A
process.syscall probe gets called when a thread described by PID or FULLPATH makes a system call. The
system call number is available in the $syscall context variable, and the first 6 arguments of the system
call are available in the $argN (ex. $arg1, $arg2, ...) context variable. A process.syscall.return probe
gets called when a thread described by PID or FULLPATH returns from a system call. The system call num‐
ber is available in the $syscall context variable, and the return value of the system call is available
in the $return context variable. A
If a process probe is specified without a PID or FULLPATH, all user threads will be probed. However, if
systemtap was invoked with the -c or -x options, then process probes are restricted to the process hier‐
archy associated with the target process. If a process probe is unspecified (i.e. without a PID or FULL‐
PATH), but with the -c option, the PATH of the -c cmd will be heuristically filled into the process PATH.
In that case, only command parameters are allowed in the -c command (i.e. no command substitution allowed
and no occurrences of any of these characters: '|&;<>(){}').
Third, symbolic static instrumentation compiled into programs and shared libraries may be probed:
process("PATH").mark("LABEL")
process("PATH").provider("PROVIDER").mark("LABEL")
process(PID).mark("LABEL")
process(PID).provider("PROVIDER").mark("LABEL")
A .mark probe gets called via a static probe which is defined in the application by
STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in sys/sdt.h. The PROVIDER is an arbitrary
application identifier, LABEL is the marker site identifier, and arg1 is the integer-typed argument.
STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used for probes with 2 arguments, and so
on. The arguments of the probe are available in the context variables $arg1, $arg2, ... An alternative
to using the STAP_PROBE macros is to use the dtrace script to create custom macros. Additionally, the
variables $$name and $$provider are available as parts of the probe point name. The sys/sdt.h macro
names DTRACE_PROBE* are available as aliases for STAP_PROBE*.
Finally, full symbolic source-level probes in user-space programs and shared libraries are supported.
These are exactly analogous to the symbolic DWARF-based kernel/module probes described above. They ex‐
pose the same sorts of context $variables for function parameters, local variables, and so on.
process("PATH").function("NAME")
process("PATH").statement("*@FILE.c:123")
process("PATH").plt("NAME")
process("PATH").library("PATH").plt("NAME")
process("PATH").library("PATH").function("NAME")
process("PATH").library("PATH").statement("*@FILE.c:123")
process("PATH").function("*").return
process("PATH").function("myfun").label("foo")
process("PATH").function("foo").callee("bar")
process("PATH").plt("NAME").return
debuginfod.process("PATH").**
process(PID).function("NAME")
process(PID).statement("*@FILE.c:123")
process(PID).plt("NAME")
Note that for all process probes, PATH names refer to executables that are searched the same way shells
do: relative to the working directory if they contain a "/" character, otherwise in $PATH. If PATH names
refer to scripts, the actual interpreters (specified in the script in the first line after the #! charac‐
ters) are probed. In the debuginfod probe family PATH names likewise refer to executables, but are
searched for in the currently defined $DEBUGINFOD_URLS.
Tapset process probes placed in the special directory $prefix/share/systemtap/tapset/PATH/ with relative
paths will have their process parameter prefixed with the location of the tapset. For example,
process("foo").function("NAME")
expands to
process("/usr/bin/foo").function("NAME")
when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
If PATH is a process component parameter referring to shared libraries then all processes that map it at
runtime would be selected for probing. If PATH is a library component parameter referring to shared li‐
braries then the process specified by the process component would be selected. Note that the PATH pat‐
tern in a library component will always apply to libraries statically determined to be in use by the
process. However, you may also specify the full path to any library file even if not statically needed by
the process.
A .plt probe will probe functions in the program linkage table corresponding to the rest of the probe
point. .plt can be specified as a shorthand for .plt("*"). The symbol name is available as a $$name
context variable; function arguments are not available, since PLTs are processed without debuginfo. A
.plt.return probe places a probe at the moment after the return from the named function.
If the PATH string contains wildcards as in the MPATTERN case, then standard globbing is performed to
find all matching paths. In this case, the $PATH environment variable is not used.
If systemtap was invoked with the -c or -x options, then process probes are restricted to the process hi‐
erarchy associated with the target process.
DEBUGINFOD
These probes take the form
debuginfod.process("PATH").**
They are very similar to the process("PATH").** probe family. The key difference is that the process
probes search for PATH in the host filesystem, while debuginfod probes search the current federation of
debuginfod servers, using the currently defined $DEBUGINFOD_URLS (see debuginfod(8) ).
In order to probe the contents of one or more elf/archive files and/or elf/archive containing directo‐
ries, the below will create a debuginfod server which will scan and process the elf files within and pre‐
pare them for systemtap.
$ debuginfod [options] [-F -R -Z etc.] /path1 /path2
$ env DEBUGINFOD_URLS=http://localhost:8002/ stap ...
JAVA
Support for probing Java methods is available using Byteman as a backend. Byteman is an instrumentation
tool from the JBoss project which systemtap can use to monitor invocations for a specific method or line
in a Java program.
Systemtap does so by generating a Byteman script listing the probes to instrument and then invoking the
Byteman bminstall utility.
This Java instrumentation support is currently a prototype feature with major limitations. Moreover, Ja‐
va probing currently does not work across users; the stap script must run (with appropriate permissions)
under the same user that the Java process being probed. (Thus a stap script under root currently cannot
probe Java methods in a non-root-user Java process.)
The first probe type refers to Java processes by the name of the Java process:
java("PNAME").class("CLASSNAME").method("PATTERN")
java("PNAME").class("CLASSNAME").method("PATTERN").return
The PNAME argument must be a pre-existing jvm pid, and be identifiable via a jps listing.
The PATTERN parameter specifies the signature of the Java method to probe. The signature must consist of
the exact name of the method, followed by a bracketed list of the types of the arguments, for instance
"myMethod(int,double,Foo)". Wildcards are not supported.
The probe can be set to trigger at a specific line within the method by appending a line number with
colon, just as in other types of probes: "myMethod(int,double,Foo):245".
The CLASSNAME parameter identifies the Java class the method belongs to, either with or without the pack‐
age qualification. By default, the probe only triggers on descendants of the class that do not override
the method definition of the original class. However, CLASSNAME can take an optional caret prefix, as in
^org.my.MyClass, which specifies that the probe should also trigger on all descendants of MyClass that
override the original method. For instance, every method with signature foo(int) in program org.my.MyApp
can be probed at once using
java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
The second probe type works analogously, but refers to Java processes by PID:
java(PID).class("CLASSNAME").method("PATTERN")
java(PID).class("CLASSNAME").method("PATTERN").return
(PIDs for an already running process can be obtained using the jps(1) utility.)
Context variables defined within java probes include $arg1 through $arg10 (for up to the first 10 argu‐
ments of a method), represented as character-pointers for the toString() form of each actual argument.
The arg1 through arg10 script variables provide access to these as ordinary strings, fetched via
user_string_warn().
Prior to systemtap version 3.1, $arg1 through $arg10 could contain either integers or character pointers,
depending on the types of the objects being passed to each particular java method. This previous behav‐
iour may be invoked with the stap --compatible=3.0 flag.
PROCFS
These probe points allow procfs "files" in /proc/systemtap/MODNAME to be created, read and written using
a permission that may be modified using the proper umask value. Default permissions are 0400 for read
probes, and 0200 for write probes. If both a read and write probe are being used on the same file, a de‐
fault permission of 0600 will be used. Using procfs.umask(0040).read would result in a 0404 permission
set for the file. (MODNAME is the name of the systemtap module). The proc filesystem is a pseudo-
filesystem which is used as an interface to kernel data structures. There are several probe point vari‐
ants supported by the translator:
procfs("PATH").read
procfs("PATH").umask(UMASK).read
procfs("PATH").read.maxsize(MAXSIZE)
procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
procfs("PATH").write
procfs("PATH").umask(UMASK).write
procfs.read
procfs.umask(UMASK).read
procfs.read.maxsize(MAXSIZE)
procfs.umask(UMASK).read.maxsize(MAXSIZE)
procfs.write
procfs.umask(UMASK).write
Note that there are a few differences when procfs probes are used in the stapbpf runtime. FIFO special
files are used instead of proc filesystem files. These files are created in /var/tmp/systemtap-USER/MOD‐
NAME. (USER is the name of the user). Additionally, users cannot create both read and write probes on
the same file.
PATH is the file name (relative to /proc/systemtap/MODNAME or /var/tmp/systemtap-USER/MODNAME) to be cre‐
ated. If no PATH is specified (as in the last two variants above), PATH defaults to "command". The file
name "__stdin" is used internally by systemtap for input probes and should not be used as a PATH for
procfs probes; see the input probe section below.
When a user reads /proc/systemtap/MODNAME/PATH (normal runtime) or /var/tmp/systemtap-USER/MODNAME
(stapbpf runtime), the corresponding procfs read probe is triggered. The string data to be read should
be assigned to a variable named $value, like this:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH (normal runtime) or /var/tmp/systemtap-USER/MODNAME
(stapbpf runtime), the corresponding procfs write probe is triggered. The data the user wrote is avail‐
able in the string variable named $value, like this:
procfs("PATH").write { printf("user wrote: %s", $value) }
MAXSIZE is the size of the procfs read buffer. Specifying MAXSIZE allows larger procfs output. If no
MAXSIZE is specified, the procfs read buffer defaults to STP_PROCFS_BUFSIZE (which defaults to
MAXSTRINGLEN, the maximum length of a string). If setting the procfs read buffers for more than one file
is needed, it may be easiest to override the STP_PROCFS_BUFSIZE definition. Here's an example of using
MAXSIZE:
procfs.read.maxsize(1024) {
$value = "long string..."
$value .= "another long string..."
$value .= "another long string..."
$value .= "another long string..."
}
INPUT
These probe points make input from stdin available to the script during runtime. The translator current‐
ly supports two variants of this family:
input.char
input.line
input.char is triggered each time a character is read from stdin. The current character is available in
the string variable named char. There is no newline buffering; the next character is read from stdin as
soon as it becomes available.
input.line causes all characters read from stdin to be buffered until a newline is read, at which point
the probe will be triggered. The current line of characters (including the newline) is made available in
a string variable named line. Note that no more than MAXSTRINGLEN characters will be buffered. Any addi‐
tional characters will not be included in line.
Input probes are aliases for procfs("__stdin").write. Systemtap reconfigures stdin if the presence of
this procfs probe is detected, therefore "__stdin" should not be used as a path argument for procfs
probes. Additionally, input probes will not work with the -F and --remote options.
NETFILTER HOOKS
These probe points allow observation of network packets using the netfilter mechanism. A netfilter probe
in systemtap corresponds to a netfilter hook function in the original netfilter probes API. It is proba‐
bly more convenient to use tapset::netfilter(3stap), which wraps the primitive netfilter hooks and does
the work of extracting useful information from the context variables.
There are several probe point variants supported by the translator:
netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
PROTOCOL_F is the protocol family to listen for, currently one of NFPROTO_IPV4, NFPROTO_IPV6, NFPRO‐
TO_ARP, or NFPROTO_BRIDGE.
HOOKNAME is the point, or 'hook', in the protocol stack at which to intercept the packet. The available
hook names for each protocol family are taken from the kernel header files <linux/netfilter_ipv4.h>,
<linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and <linux/netfilter_bridge.h>. For instance, allowable
hook names for NFPROTO_IPV4 are NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD, NF_INET_LO‐
CAL_OUT, and NF_INET_POST_ROUTING.
PRIORITY is an integer priority giving the order in which the probe point should be triggered relative to
any other netfilter hook functions which trigger on the same packet. Hook functions execute on each pack‐
et in order from smallest priority number to largest priority number. If no PRIORITY is specified (as in
the first two probe point variants above), PRIORITY defaults to "0".
There are a number of predefined priority names of the form NF_IP_PRI_* and NF_IP6_PRI_* which are de‐
fined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The
script is permitted to use these instead of specifying an integer priority. (The probe points for NFPRO‐
TO_ARP and NFPROTO_BRIDGE currently do not expose any named hook priorities to the script writer.) Thus,
allowable ways to specify the priority include:
priority("255")
priority("NF_IP_PRI_SELINUX_LAST")
A script using guru mode is permitted to specify any identifier or number as the parameter for hook, pf,
and priority. This feature should be used with caution, as the parameter is inserted verbatim into the C
code generated by systemtap.
The netfilter probe points define the following context variables:
$hooknum
The hook number.
$skb The address of the sk_buff struct representing the packet. See <linux/skbuff.h> for details on how
to use this struct, or alternatively use the tapset tapset::netfilter(3stap) for easy access to
key information.
$in The address of the net_device struct representing the network device on which the packet was re‐
ceived (if any). May be 0 if the device is unknown or undefined at that stage in the protocol
stack.
$out The address of the net_device struct representing the network device on which the packet will be
sent (if any). May be 0 if the device is unknown or undefined at that stage in the protocol stack.
$verdict
(Guru mode only.) Assigning one of the verdict values defined in <linux/netfilter.h> to this vari‐
able alters the further progress of the packet through the protocol stack. For instance, the fol‐
lowing guru mode script forces all ipv6 network packets to be dropped:
probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
$verdict = 0 /* nf_drop */
}
For convenience, unlike the primitive probe points discussed here, the probes defined in
tapset::netfilter(3stap) export the lowercase names of the verdict constants (e.g. NF_DROP becomes
nf_drop) as local variables.
KERNEL TRACEPOINTS
This family of probe points hooks up to static probing tracepoints inserted into the kernel or modules.
As with markers, these tracepoints are special macro calls inserted by kernel developers to make probing
faster and more reliable than with DWARF-based probes, and DWARF debugging information is not required to
probe tracepoints. Tracepoints have an extra advantage of more strongly-typed parameters than markers.
Tracepoint probes look like: kernel.trace("name"). The tracepoint name string, which may contain the
usual wildcard characters, is matched against the names defined by the kernel developers in the trace‐
point header files. To restrict the search to specific subsystems (e.g. sched, ext3, etc...), the follow‐
ing syntax can be used: kernel.trace("system:name"). The tracepoint system string may also contain the
usual wildcard characters.
The handler associated with a tracepoint-based probe may read the optional parameters specified at the
macro call site. These are named according to the declaration by the tracepoint author. For example,
the tracepoint probe kernel.trace("sched:sched_switch") provides the parameters $prev and $next. If the
parameter is a complex type, as in a struct pointer, then a script can access fields with the same syntax
as DWARF $target variables. Also, tracepoint parameters cannot be modified, but in guru-mode a script
may modify fields of parameters.
The subsystem and name of the tracepoint are available in $$system and $$name and a string of name=value
pairs for all parameters of the tracepoint is available in $$vars or $$parms.
KERNEL MARKERS (OBSOLETE)
This family of probe points hooks up to an older style of static probing markers inserted into older ker‐
nels or modules. These markers are special STAP_MARK macro calls inserted by kernel developers to make
probing faster and more reliable than with DWARF-based probes. Further, DWARF debugging information is
not required to probe markers.
Marker probe points begin with kernel. The next part names the marker itself: mark("name"). The marker
name string, which may contain the usual wildcard characters, is matched against the names given to the
marker macros when the kernel and/or module was compiled. Optionally, you can specify format("for‐
mat"). Specifying the marker format string allows differentiation between two markers with the same name
but different marker format strings.
The handler associated with a marker-based probe may read the optional parameters specified at the macro
call site. These are named $arg1 through $argNN, where NN is the number of parameters supplied by the
macro. Number and string parameters are passed in a type-safe manner.
The marker format string associated with a marker is available in $format. And also the marker name
string is available in $name.
HARDWARE BREAKPOINTS
This family of probes is used to set hardware watchpoints for a given
(global) kernel symbol. The probes take three components as inputs :
1. The virtual address / name of the kernel symbol to be traced is supplied as argument to this class of
probes. ( Probes for only data segment variables are supported. Probing local variables of a function
cannot be done.)
2. Nature of access to be probed : a. .write probe gets triggered when a write happens at the specified
address/symbol name. b. rw probe is triggered when either a read or write happens.
3. .length (optional) Users have the option of specifying the address interval to be probed using
"length" constructs. The user-specified length gets approximated to the closest possible address length
that the architecture can support. If the specified length exceeds the limits imposed by architecture, an
error message is flagged and probe registration fails. Wherever 'length' is not specified, the transla‐
tor requests a hardware breakpoint probe of length 1. It should be noted that the "length" construct is
not valid with symbol names.
Following constructs are supported :
probe kernel.data(ADDRESS).write
probe kernel.data(ADDRESS).rw
probe kernel.data(ADDRESS).length(LEN).write
probe kernel.data(ADDRESS).length(LEN).rw
probe kernel.data("SYMBOL_NAME").write
probe kernel.data("SYMBOL_NAME").rw
This set of probes make use of the debug registers of the processor, which is a scarce resource. (4 on
x86 , 1 on powerpc ) The script translation flags a warning if a user requests more hardware breakpoint
probes than the limits set by architecture. For example,a pass-2 warning is flashed when an input script
requests 5 hardware breakpoint probes on an x86 system while x86 architecture supports a maximum of 4
breakpoints. Users are cautioned to set probes judiciously.
PERF
This family of probe points interfaces to the kernel "perf event" infrastructure for controlling hardware
performance counters. The events being attached to are described by the "type", "config" fields of the
perf_event_attr structure, and are sampled at an interval governed by the "sample_period" and "sam‐
ple_freq" fields.
These fields are made available to systemtap scripts using the following syntax:
probe perf.type(NN).config(MM).sample(XX)
probe perf.type(NN).config(MM).hz(XX)
probe perf.type(NN).config(MM)
probe perf.type(NN).config(MM).process("PROC")
probe perf.type(NN).config(MM).counter("COUNTER")
probe perf.type(NN).config(MM).process("PROC").counter("NAME")
The systemtap probe handler is called once per XX increments of the underlying performance counter when
using the .sample field or at a frequency in hertz when using the .hz field. When not specified, the de‐
fault behavior is to sample at a count of 1000000. The range of valid type/config is described by the
perf_event_open(2) system call, and/or the linux/perf_event.h file. Invalid combinations or exhausted
hardware counter resources result in errors during systemtap script startup. Systemtap does not sanity-
check the values: it merely passes them through to the kernel for error- and safety-checking. By default
the perf event probe is systemwide unless .process is specified, which will bind the probe to a specific
task. If the name is omitted then it is inferred from the stap -c argument. A perf event can be read
on demand using .counter. The body of the perf probe handler will not be invoked for a .counter probe;
instead, the counter is read in a user space probe via:
process("PROC").statement("func@file") {stat <<< @perf("NAME")}
PYTHON
Support for probing python 2 and python 3 function is available with the help of an extra python support
module. Note that the debuginfo for the version of python being probed is required. To run a python
script with the extra python support module you'd add the '-m HelperSDT' option to your python command,
like this:
stap foo.stp -c "python -m HelperSDT foo.py"
Python probes look like the following:
python2.module("MPATTERN").function("PATTERN")
python2.module("MPATTERN").function("PATTERN").call
python2.module("MPATTERN").function("PATTERN").return
python3.module("MPATTERN").function("PATTERN")
python3.module("MPATTERN").function("PATTERN").call
python3.module("MPATTERN").function("PATTERN").return
The list above includes multiple variants and modifiers which provide additional functionality or fil‐
ters. They are:
.function
Places a probe at the beginning of the named function by default, unless modified by PAT‐
TERN. Parameters are available as context variables.
.call Places a probe at the beginning of the named function. Parameters are available as context
variables.
.return
Places a probe at the moment before the return from the named function. Parameters and lo‐
cal/global python variables are available as context variables.
PATTERN stands for a string literal that aims to identify a point in the python program. It is made up
of three parts:
• The first part is the name of a function (e.g. "foo") or class method (e.g. "bar.baz"). This part may
use the "*" and "?" wildcarding operators to match multiple names.
• The second part is optional and begins with the "@" character. It is followed by the path to the
source file containing the function, which may include a wildcard pattern. The python path is
searched for a matching filename.
• Finally, the third part is optional if the file name part was given, and identifies the line number
in the source file preceded by a ":" or a "+". The line number is assumed to be an absolute line
number if preceded by a ":", or relative to the declaration line of the function if preceded by a
"+". All the lines in the function can be matched with ":*". A range of lines x through y can be
matched with ":x-y". Ranges and specific lines can be mixed using commas, e.g. ":x,y-z".
In the above list of probe points, MPATTERN stands for a python module or script name that names the
python module of interest. This part may use the "*" and "?" wildcarding operators to match multiple
names. The python path is searched for a matching filename.
EXAMPLES
Here are some example probe points, defining the associated events.
begin, end, end
refers to the startup and normal shutdown of the session. In this case, the handler would run
once during startup and twice during shutdown.
timer.jiffies(1000).randomize(200)
refers to a periodic interrupt, every 1000 +/- 200 jiffies.
kernel.function("*init*"), kernel.function("*exit*")
refers to all kernel functions with "init" or "exit" in the name.
kernel.function("*@kernel/time.c:240")
refers to any functions within the "kernel/time.c" file that span line 240. Note that this is not
a probe at the statement at that line number. Use the kernel.statement probe instead.
kernel.trace("sched_*")
refers to all scheduler-related (really, prefixed) tracepoints in the kernel.
kernel.mark("getuid")
refers to an obsolete STAP_MARK(getuid, ...) macro call in the kernel.
module("usb*").function("*sync*").return
refers to the moment of return from all functions with "sync" in the name in any of the USB dri‐
vers.
kernel.statement(0xc0044852)
refers to the first byte of the statement whose compiled instructions include the given address in
the kernel.
kernel.statement("*@kernel/time.c:296")
refers to the statement of line 296 within "kernel/time.c".
kernel.statement("bio_init@fs/bio.c+3")
refers to the statement at line bio_init+3 within "fs/bio.c".
kernel.data("pid_max").write
refers to a hardware breakpoint of type "write" set on pid_max
syscall.*.return
refers to the group of probe aliases with any name in the third position
SEE ALSO
stap(1),
probe::*(3stap),
tapset::*(3stap)
STAPPROBES(3stap)