Provided by:
yasm_0.4.0-3_i386 
NAME
yasm_arch - YASM Architectures
SYNOPSIS
yasm -a arch [-m machine] ...
DESCRIPTION
The standard YASM distribution includes a number of loadable modules
for different target architectures. Additional target architectures may
be installed as third-party modules. Each target architecture can
support one or more machine architectures.
The architecture and machine are selected on the yasm(1) command line
by use of the -a arch and -m machine command line options,
respectively.
X86 ARCHITECTURE
The ‘‘x86’’ architecture supports the IA-32 instruction set and
derivatives and the AMD64 instruction set. It consists of two machines:
‘‘x86’’ (for the IA-32 and derivatives) and ‘‘amd64’’ (for the AMD64
and derivatives). The default machine for the ‘‘x86’’ architecture is
the ‘‘x86’’ machine.
BITS Setting
The x86 architecture BITS setting specifies to YASM the processor mode
in which the generated code is intended to execute. x86 processors can
run in three different major execution modes: 16-bit, 32-bit, and on
AMD64-supporting processors, 64-bit. As the x86 instruction set
contains portions whose function is execution-mode dependent (such as
operand-size and address-size override prefixes), YASM cannot assemble
x86 instructions correctly unless it is told by the user in what
processor mode the code will execute.
The BITS setting can be changed in a variety of ways. When using the
NASM-compatible parser, the BITS setting can be changed directly via
the use of the BITS xx assembler directive. The default BITS setting is
determined by the object format in use.
BITS 64 Extensions
When an AMD64-supporting processor is executing in 64-bit mode, a
number of additional extensions are available, including extra general
purpose registers, extra SSE2 registers, and RIP-relative addressing.
The additional 64-bit general purpose registers are named r8-r15. There
are also 8-bit (rXb), 16-bit (rXw), and 32-bit (rXd) subregisters that
map to the least significant 8, 16, or 32 bits of the 64-bit register.
The original 8 general purpose registers have also been extended to
64-bits: eax, edx, ecx, ebx, esi, edi, esp, and ebp have new 64-bit
versions called rax, rdx, rcx, rbx, rsi, rdi, rsp, and rbp
respectively. The old 32-bit registers map to the least significant
bits of the new 64-bit registers.
New 8-bit registers are also available that map to the 8 least
significant bits of rsi, rdi, rsp, and rbp. These are called sil, dil,
spl, and bpl respectively. Unfortunately, due to the way instructions
are encoded, these new 8-bit registers are encoded the same as the old
8-bit registers ah, dh, ch, and bh. The processor tells which is being
used by the presence of the new REX prefix that is used to specify the
other extended registers. This means it is illegal to mix the use of
ah, dh, ch, and bh with an instruction that requires the REX prefix for
other reasons. For instance:
add ah, [r10]
(NASM syntax) is not a legal instruction because the use of r10
requires a REX prefix, making it impossible to use ah.
In 64-bit mode, an additional 8 SSE2 registers are also available.
These are named xmm8-xmm15.
By default, most operations in 64-bit mode remain 32-bit; operations
that are 64-bit usually require a REX prefix (one bit in the REX prefix
determines whether an operation is 64-bit or 32-bit). Thus, essentially
all 32-bit instructions have a 64-bit version, and the 64-bit versions
of instructions can use extended registers ‘‘for free’’ (as the REX
prefix is already present). Examples in NASM syntax:
mov eax, 1 ; 32-bit instruction
mov rcx, 1 ; 64-bit instruction
Instructions that modify the stack (push, pop, call, ret, enter, and
leave) are implicitly 64-bit. Their 32-bit counterparts are not
available, but their 16-bit counterparts are. Examples in NASM syntax:
push eax ; illegal instruction
push rbx ; 1-byte instruction
push r11 ; 2-byte instruction with REX prefix
Results of 32-bit operations are implicitly zero-extended to the upper
32 bits of the corresponding 64-bit register. 16 and 8 bit operations,
on the other hand, do not affect upper bits of the register (just as in
32-bit and 16-bit modes). This can be used to generate smaller code in
some instances. Examples in NASM syntax:
mov ecx, 1 ; 1 byte shorter than mov rcx, 1
and edx, 3 ; equivalent to and rdx, 3
For most instructions in 64-bit mode, immediate values remain 32 bits;
their value is sign-extended into the upper 32 bits of the target
register prior to being used. The exception is the mov instruction,
which can take a 64-bit immediate when the destination is a 64-bit
register. Examples in NASM syntax:
add rax, 1 ; legal
add rax, 0xffffffff ; sign-extended
add rax, -1 ; same as above
add rax, 0xffffffffffffffff ; warning (>32 bit)
mov eax, 1 ; 5 byte instruction
mov rax, 1 ; 10 byte instruction
mov rbx, 0x1234567890abcdef ; 10 byte instruction
mov rcx, 0xffffffff ; 10 byte instruction
mov ecx, -1 ; 5 byte instruction equivalent to above
Just like immediates, displacements, for the most part, remain 32 bits
and are sign extended prior to use. Again, the exception is one
restricted form of the mov instruction: between the al/ax/eax/rax
register and a 64-bit absolute address (no registers allowed in the
effective address). In NASM syntax, use of the 64-bit absolute form
requires [qword]. Examples in NASM syntax:
mov eax, [1] ; 32 bit, with sign extension
mov al, [rax-1] ; 32 bit, with sign extension
mov al, [qword 0x1122334455667788] ; 64-bit absolute
mov al, [0x1122334455667788] ; truncated to 32-bit (warning)
In 64-bit mode, a new form of effective addressing is available to make
it easier to write position-independent code. Any memory reference may
be made RIP relative (RIP is the instruction pointer register, which
contains the address of the location immediately following the current
instruction).
In NASM syntax, there are two ways to specify RIP-relative addressing:
mov dword [rip+10], 1
stores the value 1 ten bytes after the end of the instruction. 10 can
also be a symbolic constant, and will be treated the same way. On the
other hand,
mov dword [symb wrt rip], 1
stores the value 1 into the address of symbol symb. This is distinctly
different than the behavior of:
mov dword [symb+rip], 1
which takes the address of the end of the instruction, adds the address
of symb to it, then stores the value 1 there. If symb is a variable,
this will NOT store the value 1 into the symb variable!
LC3B ARCHITECTURE
The ‘‘lc3b’’ architecture supports the LC-3b ISA as used in the ECE 312
(now ECE 411) course at the University of Illinois, Urbana-Champaign,
as well as other university courses. See
http://courses.ece.uiuc.edu/ece411/ for more details and example code.
The ‘‘lc3b’’ architecture consists of only one machine: ‘‘lc3b’’.
SEE ALSO
yasm(1)
BUGS
When using the ‘‘x86’’ architecture, it is overly easy to generate
AMD64 code (using the BITS 64 directive) and generate a 32-bit object
file (by failing to specify -m amd64 on the command line). Similarly,
specifying -m amd64 does not default the BITS setting to 64.
AUTHOR
Peter Johnson <peter@tortall.net>.