Ubuntu Manpage: libunwind-dynamic -- libunwind-support for runtime-generated code

Provided by: libunwind-dev_1.2.1-8ubuntu0.1_amd64

NAME

       libunwind-dynamic -- libunwind-support for runtime-generated code

INTRODUCTION

For libunwind to do its job, it needs to be able to reconstruct the frame state of each
frame in a call-chain. The frame state describes the subset of the machine-state that
consists of the frame registers (typically the instruction-pointer and the stack-pointer)
and all callee-saved registers (preserved registers). The frame state describes each
register either by providing its current value (for frame registers) or by providing the
location at which the current value is stored (callee-saved registers).

For statically generated code, the compiler normally takes care of emitting unwind-info
which provides the minimum amount of information needed to reconstruct the frame-state for
each instruction in a procedure. For dynamically generated code, the runtime code
generator must use the dynamic unwind-info interface provided by libunwind to supply the
equivalent information. This manual page describes the format of this information in
detail.

For the purpose of this discussion, a procedure is defined to be an arbitrary piece of
contiguous code. Normally, each procedure directly corresponds to a function in the
source-language but this is not strictly required. For example, a runtime code-generator
could translate a given function into two separate (discontiguous) procedures: one for
frequently-executed (hot) code and one for rarely-executed (cold) code. Similarly, simple
source-language functions (usually leaf functions) may get translated into code for which
the default unwind-conventions apply and for such code, it is not strictly necessary to
register dynamic unwind-info.

A procedure logically consists of a sequence of regions. Regions are nested in the sense
that the frame state at the end of one region is, by default, assumed to be the frame
state for the next region. Each region is thought of as being divided into a prologue, a
body, and an epilogue. Each of them can be empty. If non-empty, the prologue sets up the
frame state for the body. For example, the prologue may need to allocate some space on the
stack and save certain callee-saved registers. The body performs the actual work of the
procedure but does not change the frame state in any way. If non-empty, the epilogue
restores the previous frame state and as such it undoes or cancels the effect of the
prologue. In fact, a single epilogue may undo the effect of the prologues of several
(nested) regions.

We should point out that even though the prologue, body, and epilogue are logically
separate entities, optimizing code-generators will generally interleave instructions from
all three entities. For this reason, the dynamic unwind-info interface of libunwind makes
no distinction whatsoever between prologue and body. Similarly, the exact set of
instructions that make up an epilogue is also irrelevant. The only point in the epilogue
that needs to be described explicitly by the dynamic unwind-info is the point at which the
stack-pointer gets restored. The reason this point needs to be described is that once the
stack-pointer is restored, all values saved in the deallocated portion of the stack frame
become invalid and hence libunwind needs to know about it. The portion of the frame state
not saved on the stack is assume to remain valid through the end of the region. For this
reason, there is usually no need to describe instructions which restore the contents of
callee-saved registers.

Within a region, each instruction that affects the frame state in some fashion needs to be
described with an operation descriptor. For this purpose, each instruction in the region
is assigned a unique index. Exactly how this index is derived depends on the
architecture. For example, on RISC and EPIC-style architecture, instructions have a fixed
size so it's possible to simply number the instructions. In contrast, most CISC use
variable-length instruction encodings, so it is usually necessary to use a byte-offset as
the index. Given the instruction index, the operation descriptor specifies the effect of
the instruction in an abstract manner. For example, it might express that the instruction
stores calle-saved register r1 at offset 16 in the stack frame.

PROCEDURES

A runtime code-generator registers the dynamic unwind-info of a procedure by setting up a
structure of type unw_dyn_info_t and calling _U_dyn_register(), passing the address of the
structure as the sole argument. The members of the unw_dyn_info_t structure are described
below:

void *next
Private to libunwind. Must not be used by the application.

void *prev
Private to libunwind. Must not be used by the application.

unw_word_t start_ip
The start-address of the instructions of the procedure (remember: procedure are
defined to be contiguous pieces of code, so a single code-range is sufficient).

unw_word_t end_ip
The end-address of the instructions of the procedure (non-inclusive, that is,
end_ip-start_ip is the size of the procedure in bytes).

unw_word_t gp
The global-pointer value in use for this procedure. The exact meaing of the
global-pointer is architecture-specific and on some architecture, it is not used at
all.

int32_t format
The format of the unwind-info. This member can be one of UNW_INFO_FORMAT_DYNAMIC,
UNW_INFO_FORMAT_TABLE, or UNW_INFO_FORMAT_REMOTE_TABLE.

union u
This union contains one sub-member structure for every possible unwind-info
format:

unw_dyn_proc_info_t pi
This member is used for format UNW_INFO_FORMAT_DYNAMIC.

unw_dyn_table_info_t ti
This member is used for format UNW_INFO_FORMAT_TABLE.

unw_dyn_remote_table_info_t rti
This member is used for format UNW_INFO_FORMAT_REMOTE_TABLE.

The format of these sub-members is described in detail below.

PROC-INFO FORMAT
This is the preferred dynamic unwind-info format and it is generally the one used by
full-blown runtime code-generators. In this format, the details of a procedure are
described by a structure of type unw_dyn_proc_info_t. This structure contains the
following members:

unw_word_t name_ptr
The address of a (human-readable) name of the procedure or 0 if no such name is
available. If non-zero, The string stored at this address must be ASCII NUL
terminated. For source languages that use name-mangling (such as C++ or Java) the
string stored at this address should be the demangled version of the name.

unw_word_t handler
The address of the personality-routine for this procedure. Personality-routines
are used in conjunction with exception handling. See the C++ ABI draft
(http://www.codesourcery.com/cxx-abi/) for an overview and a description of the
personality routine. If the procedure has no personality routine, handler must be
set to 0.

uint32_t flags
A bitmask of flags. At the moment, no flags have been defined and this member must
be set to 0.

unw_dyn_region_info_t *regions
A NULL-terminated linked list of region-descriptors. See section ``Region
descriptors'' below for more details.

TABLE-INFO FORMAT
This format is generally used when the dynamically generated code was derived from static
code and the unwind-info for the dynamic and the static versions is identical. For
example, this format can be useful when loading statically-generated code into an
address-space in a non-standard fashion (i.e., through some means other than dlopen()).
In this format, the details of a group of procedures is described by a structure of type
unw_dyn_table_info. This structure contains the following members:

unw_word_t segbase
The segment-base value that needs to be added to the segment-relative values
stored in the unwind-info. The exact meaning of this value is
architecture-specific.

unw_word_t table_len
The length of the unwind-info (table_data) counted in units of words (unw_word_t).

unw_word_t table_data
A pointer to the actual data encoding the unwind-info. The exact format is
architecture-specific (see architecture-specific sections below).

REMOTE TABLE-INFO FORMAT
The remote table-info format has the same basic purpose as the regular table-info format.
The only difference is that when libunwind uses the unwind-info, it will keep the table
data in the target address-space (which may be remote). Consequently, the type of the
table_data member is unw_word_t rather than a pointer. This implies that libunwind will
have to access the table-data via the address-space's access_mem() call-back, rather than
through a direct memory reference.

From the point of view of a runtime-code generator, the remote table-info format offers no
advantage and it is expected that such generators will describe their procedures either
with the proc-info format or the normal table-info format. The main reason that the remote
table-info format exists is to enable the address-space-specific find_proc_info() callback
(see unw_create_addr_space(3)) to return unwind tables whose data remains in remote
memory. This can speed up unwinding (e.g., for a debugger) because it reduces the amount
of data that needs to be loaded from remote memory.

REGIONS DESCRIPTORS

A region descriptor is a variable length structure that describes how each instruction in
the region affects the frame state. Of course, most instructions in a region usualy do not
change the frame state and for those, nothing needs to be recorded in the region
descriptor. A region descriptor is a structure of type unw_dyn_region_info_t and has the
following members:

unw_dyn_region_info_t *next
A pointer to the next region. If this is the last region, next is NULL.

int32_t insn_count
The length of the region in instructions. Each instruction is assumed to have a
fixed size (see architecture-specific sections for details). The value of
insn_count may be negative in the last region of a procedure (i.e., it may be
negative only if next is NULL). A negative value indicates that the region covers
the last N instructions of the procedure, where N is the absolute value of
insn_count.

uint32_t op_count
The (allocated) length of the op_count array.

unw_dyn_op_t op
An array of dynamic unwind directives. See Section ``Dynamic unwind directives''
for a description of the directives.

A region descriptor with an insn_count of zero is an empty region and such regions are
perfectly legal. In fact, empty regions can be useful to establish a particular frame
state before the start of another region.

A single region list can be shared across multiple procedures provided those procedures
share a common prologue and epilogue (their bodies may differ, of course). Normally, such
procedures consist of a canned prologue, the body, and a canned epilogue. This could be
described by two regions: one covering the prologue and one covering the epilogue. Since
the body length is variable, the latter region would need to specify a negative value in
insn_count such that libunwind knows that the region covers the end of the procedure (up
to the address specified by end_ip).

The region descriptor is a variable length structure to make it possible to allocate all
the necessary memory with a single memory-allocation request. To facilitate the allocation
of a region descriptors libunwind provides a helper routine with the following synopsis:

size_t _U_dyn_region_size(int op_count);

This routine returns the number of bytes needed to hold a region descriptor with space for
op_count unwind directives. Note that the length of the op array does not have to match
exactly with the number of directives in a region. Instead, it is sufficient if the op
array contains at least as many entries as there are directives, since the end of the
directives can always be indicated with the UNW_DYN_STOP directive.

DYNAMIC UNWIND DIRECTIVES

A dynamic unwind directive describes how the frame state changes at a particular point
within a region. The description is in the form of a structure of type unw_dyn_op_t. This
structure has the following members:

int8_t tag
The operation tag. Must be one of the unw_dyn_operation_t values described below.

int8_t qp
The qualifying predicate that controls whether or not this directive is active.
This is useful for predicated architecturs such as IA-64 or ARM, where the contents
of another (callee-saved) register determines whether or not an instruction is
executed (takes effect). If the directive is always active, this member should be
set to the manifest constant _U_QP_TRUE (this constant is defined for all
architectures, predicated or not).

int16_t reg
The number of the register affected by the instruction.

int32_t when
The region-relative number of the instruction to which this directive applies. For
example, a value of 0 means that the effect described by this directive has taken
place once the first instruction in the region has executed.

unw_word_t val
The value to be applied by the operation tag. The exact meaning of this value
varies by tag. See Section ``Operation tags'' below.

It is perfectly legitimate to specify multiple dynamic unwind directives with the same
when value, if a particular instruction has a complex effect on the frame state.

Empty regions by definition contain no actual instructions and as such the directives are
not tied to a particular instruction. By convention, the when member should be set to 0,
however.

There is no need for the dynamic unwind directives to appear in order of increasing when
values. If the directives happen to be sorted in that order, it may result in slightly
faster execution, but a runtime code-generator should not go to extra lengths just to
ensure that the directives are sorted.

IMPLEMENTATION NOTE: should libunwind implementations for certain architectures prefer the
list of unwind directives to be sorted, it is recommended that such implementations first
check whether the list happens to be sorted already and, if not, sort the directives
explicitly before the first use. With this approach, the overhead of explicit sorting is
only paid when there is a real benefit and if the runtime code-generator happens to
generated sorted lists naturally, the performance penalty is limited to a simple O(N)
check.

OPERATIONS TAGS
The possible operation tags are defined by enumeration type unw_dyn_operation_t which
defines the following values:

UNW_DYN_STOP
Marks the end of the dynamic unwind directive list. All remaining entries in the
op array of the region-descriptor are ignored. This tag is guaranteed to have a
value of 0.

UNW_DYN_SAVE_REG
Marks an instruction which saves register reg to register val.

UNW_DYN_SPILL_FP_REL
Marks an instruction which spills register reg to a frame-pointer-relative
location. The frame-pointer-relative offset is given by the value stored in member
val. See the architecture-specific sections for a description of the stack frame
layout.

UNW_DYN_SPILL_SP_REL
Marks an instruction which spills register reg to a stack-pointer-relative
location. The stack-pointer-relative offset is given by the value stored in member
val. See the architecture-specific sections for a description of the stack frame
layout.

UNW_DYN_ADD
Marks an instruction which adds the constant value val to register reg. To add
subtract a constant value, store the two's-complement of the value in val. The set
of registers that can be specified for this tag is described in the
architecture-specific sections below.

UNW_DYN_POP_FRAMES
.PP

UNW_DYN_LABEL_STATE
.PP

UNW_DYN_COPY_STATE
.PP

UNW_DYN_ALIAS
.PP unw_dyn_op_t

_U_dyn_op_save_reg(); _U_dyn_op_spill_fp_rel(); _U_dyn_op_spill_sp_rel(); _U_dyn_op_add();
_U_dyn_op_pop_frames(); _U_dyn_op_label_state(); _U_dyn_op_copy_state();
_U_dyn_op_alias(); _U_dyn_op_stop();

IA-64 SPECIFICS

       -  meaning of segbase member in table-info/table-remote-info format - format of table_data
       in table-info/table-remote-info format - instruction size: each bundle  is  counted  as  3
       instructions,  regardless of template (MLX) - describe stack-frame layout, especially with
       regards to sp-relative and fp-relative addressing - UNW_DYN_ADD can  only  add  to  ``sp''
       (always a negative value); use POP_FRAMES otherwise

AUTHOR

       David Mosberger-Tang
       Email: dmosberger@gmail.com
       WWW: http://www.nongnu.org/libunwind/.