oracular (7) yasm_arch.7.gz

Provided by: yasm_1.3.0-5_amd64 bug

NAME

       yasm_arch - Yasm Supported Target Architectures

SYNOPSIS

       yasm -a arch [-m machine] ...

DESCRIPTION

       The standard Yasm distribution includes a number of modules for different target
       architectures. Each target architecture can support one or more machine architectures.

       The architecture and machine are selected on the yasm(1) command line by use of the -a
       arch and -m machine command line options, respectively.

       The machine architecture may also automatically be selected by certain object formats. For
       example, the “elf32” object format selects the “x86” machine architecture by default,
       while the “elf64” object format selects the “amd64” machine architecture by default.

X86 ARCHITECTURE

       The “x86” architecture supports the IA-32 instruction set and derivatives and the AMD64
       instruction set. It consists of two machines: “x86” (for the IA-32 and derivatives) and
       “amd64” (for the AMD64 and derivatives). The default machine for the “x86” architecture is
       the “x86” machine.

   BITS Setting
       The x86 architecture BITS setting specifies to Yasm the processor mode in which the
       generated code is intended to execute. x86 processors can run in three different major
       execution modes: 16-bit, 32-bit, and on AMD64-supporting processors, 64-bit. As the x86
       instruction set contains portions whose function is execution-mode dependent (such as
       operand-size and address-size override prefixes), Yasm cannot assemble x86 instructions
       correctly unless it is told by the user in what processor mode the code will execute.

       The BITS setting can be changed in a variety of ways. When using the NASM-compatible
       parser, the BITS setting can be changed directly via the use of the BITS xx assembler
       directive. The default BITS setting is determined by the object format in use.

   BITS 64 Extensions
       The AMD64 architecture is a new 64-bit architecture developed by AMD, based on the 32-bit
       x86 architecture. It extends the original x86 architecture by doubling the number of
       general purpose and SIMD registers, extending the arithmetic operations and address space
       to 64 bits, as well as other features.

       Recently, Intel has introduced an essentially identical version of AMD64 called EM64T.

       When an AMD64-supporting processor is executing in 64-bit mode, a number of additional
       extensions are available, including extra general purpose registers, extra SSE2 registers,
       and RIP-relative addressing.

       Yasm extends the base NASM syntax to support AMD64 as follows. To enable assembly of
       instructions for the 64-bit mode of AMD64 processors, use the directive BITS 64. As with
       NASM's BITS directive, this does not change the format of the output object file to 64
       bits; it only changes the assembler mode to assume that the instructions being assembled
       will be run in 64-bit mode. To specify an AMD64 object file, use -m amd64 on the Yasm
       command line, or explicitly target a 64-bit object format such as -f win64 or -f elf64.
       -f elfx32 can be used to select 32-bit ELF object format for AMD64 processors.

       Register Changes
           The additional 64-bit general purpose registers are named r8-r15. There are also 8-bit
           (rXb), 16-bit (rXw), and 32-bit (rXd) subregisters that map to the least significant
           8, 16, or 32 bits of the 64-bit register. The original 8 general purpose registers
           have also been extended to 64-bits: eax, edx, ecx, ebx, esi, edi, esp, and ebp have
           new 64-bit versions called rax, rdx, rcx, rbx, rsi, rdi, rsp, and rbp respectively.
           The old 32-bit registers map to the least significant bits of the new 64-bit
           registers.

           New 8-bit registers are also available that map to the 8 least significant bits of
           rsi, rdi, rsp, and rbp. These are called sil, dil, spl, and bpl respectively.
           Unfortunately, due to the way instructions are encoded, these new 8-bit registers are
           encoded the same as the old 8-bit registers ah, dh, ch, and bh. The processor tells
           which is being used by the presence of the new REX prefix that is used to specify the
           other extended registers. This means it is illegal to mix the use of ah, dh, ch, and
           bh with an instruction that requires the REX prefix for other reasons. For instance:

               add ah, [r10]

           (NASM syntax) is not a legal instruction because the use of r10 requires a REX prefix,
           making it impossible to use ah.

           In 64-bit mode, an additional 8 SSE2 registers are also available. These are named
           xmm8-xmm15.

       64 Bit Instructions
           By default, most operations in 64-bit mode remain 32-bit; operations that are 64-bit
           usually require a REX prefix (one bit in the REX prefix determines whether an
           operation is 64-bit or 32-bit). Thus, essentially all 32-bit instructions have a
           64-bit version, and the 64-bit versions of instructions can use extended registers
           “for free” (as the REX prefix is already present). Examples in NASM syntax:

               mov eax, 1  ; 32-bit instruction

               mov rcx, 1  ; 64-bit instruction

           Instructions that modify the stack (push, pop, call, ret, enter, and leave) are
           implicitly 64-bit. Their 32-bit counterparts are not available, but their 16-bit
           counterparts are. Examples in NASM syntax:

               push eax  ; illegal instruction

               push rbx  ; 1-byte instruction

               push r11  ; 2-byte instruction with REX prefix

       Implicit Zero Extension
           Results of 32-bit operations are implicitly zero-extended to the upper 32 bits of the
           corresponding 64-bit register. 16 and 8 bit operations, on the other hand, do not
           affect upper bits of the register (just as in 32-bit and 16-bit modes). This can be
           used to generate smaller code in some instances. Examples in NASM syntax:

               mov ecx, 1  ; 1 byte shorter than mov rcx, 1

               and edx, 3  ; equivalent to and rdx, 3

       Immediates
           For most instructions in 64-bit mode, immediate values remain 32 bits; their value is
           sign-extended into the upper 32 bits of the target register prior to being used. The
           exception is the mov instruction, which can take a 64-bit immediate when the
           destination is a 64-bit register. Examples in NASM syntax:

               add rax, 1           ; optimized down to signed 8-bit

               add rax, dword 1     ; force size to 32-bit

               add rax, 0xffffffff  ; sign-extended 32-bit

               add rax, -1          ; same as above

               add rax, 0xffffffffffffffff ; truncated to 32-bit (warning)

               mov eax, 1           ; 5 byte

               mov rax, 1           ; 5 byte (optimized to signed 32-bit)

               mov rax, qword 1     ; 10 byte (forced 64-bit)

               mov rbx, 0x1234567890abcdef ; 10 byte

               mov rcx, 0xffffffff  ; 10 byte (does not fit in signed 32-bit)

               mov ecx, -1          ; 5 byte, equivalent to above

               mov rcx, sym         ; 5 byte, 32-bit size default for symbols

               mov rcx, qword sym   ; 10 byte, override default size

           The handling of mov reg64, unsized immediate is different between YASM and NASM 2.x;
           YASM follows the above behavior, while NASM 2.x does the following:

               add rax, 0xffffffff  ; sign-extended 32-bit immediate

               add rax, -1          ; same as above

               add rax, 0xffffffffffffffff ; truncated 32-bit (warning)

               add rax, sym         ; sign-extended 32-bit immediate

               mov eax, 1           ; 5 byte (32-bit immediate)

               mov rax, 1           ; 10 byte (64-bit immediate)

               mov rbx, 0x1234567890abcdef ; 10 byte instruction

               mov rcx, 0xffffffff  ; 10 byte instruction

               mov ecx, -1          ; 5 byte, equivalent to above

               mov ecx, sym         ; 5 byte (32-bit immediate)

               mov rcx, sym         ; 10 byte instruction

               mov rcx, qword sym   ; 10 byte (64-bit immediate)

       Displacements
           Just like immediates, displacements, for the most part, remain 32 bits and are sign
           extended prior to use. Again, the exception is one restricted form of the mov
           instruction: between the al/ax/eax/rax register and a 64-bit absolute address (no
           registers allowed in the effective address). In NASM syntax, use of the 64-bit
           absolute form requires [qword]. Examples in NASM syntax:

               mov eax, [1]    ; 32 bit, with sign extension

               mov al, [rax-1] ; 32 bit, with sign extension

               mov al, [qword 0x1122334455667788] ; 64-bit absolute

               mov al, [0x1122334455667788] ; truncated to 32-bit (warning)

       RIP Relative Addressing
           In 64-bit mode, a new form of effective addressing is available to make it easier to
           write position-independent code. Any memory reference may be made RIP relative (RIP is
           the instruction pointer register, which contains the address of the location
           immediately following the current instruction).

           In NASM syntax, there are two ways to specify RIP-relative addressing:

               mov dword [rip+10], 1

           stores the value 1 ten bytes after the end of the instruction.  10 can also be a
           symbolic constant, and will be treated the same way. On the other hand,

               mov dword [symb wrt rip], 1

           stores the value 1 into the address of symbol symb. This is distinctly different than
           the behavior of:

               mov dword [symb+rip], 1

           which takes the address of the end of the instruction, adds the address of symb to it,
           then stores the value 1 there. If symb is a variable, this will not store the value 1
           into the symb variable!

           Yasm also supports the following syntax for RIP-relative addressing:

               mov [rel sym], rax  ; RIP-relative

               mov [abs sym], rax  ; not RIP-relative

           The behavior of:

               mov [sym], rax

           Depends on a mode set by the DEFAULT directive, as follows. The default mode is always
           "abs", and in "rel" mode, use of registers, an fs or gs segment override, or an
           explicit "abs" override will result in a non-RIP-relative effective address.

               default rel

               mov [sym], rbx      ; RIP-relative

               mov [abs sym], rbx  ; not RIP-relative (explicit override)

               mov [rbx+1], rbx    ; not RIP-relative (register use)

               mov [fs:sym], rbx   ; not RIP-relative (fs or gs use)

               mov [ds:sym], rbx   ; RIP-relative (segment, but not fs or gs)

               mov [rel sym], rbx  ; RIP-relative (redundant override)

               default abs

               mov [sym], rbx      ; not RIP-relative

               mov [abs sym], rbx  ; not RIP-relative

               mov [rbx+1], rbx    ; not RIP-relative

               mov [fs:sym], rbx   ; not RIP-relative

               mov [ds:sym], rbx   ; not RIP-relative

               mov [rel sym], rbx  ; RIP-relative (explicit override)

       Memory references
           Usually the size of a memory reference can be deduced by which registers you're
           moving--for example, "mov [rax],ecx" is a 32-bit move, because ecx is 32 bits. YASM
           currently gives the non-obvious "invalid combination of opcode and operands" error if
           it can't figure out how much memory you're moving. The fix in this case is to add a
           memory size specifier: qword, dword, word, or byte.

           Here's a 64-bit memory move, which sets 8 bytes starting at rax:

               mov qword [rax], 1

           Here's a 32-bit memory move, which sets 4 bytes:

               mov dword [rax], 1

           Here's a 16-bit memory move, which sets 2 bytes:

               mov word [rax], 1

           Here's an 8-bit memory move, which sets 1 byte:

               mov byte [rax], 1

LC3B ARCHITECTURE

       The “lc3b” architecture supports the LC-3b ISA as used in the ECE 312 (now ECE 411) course
       at the University of Illinois, Urbana-Champaign, as well as other university courses. See
       http://courses.ece.uiuc.edu/ece411/ for more details and example code. The “lc3b”
       architecture consists of only one machine: “lc3b”.

SEE ALSO

       yasm(1)

BUGS

       When using the “x86” architecture, it is overly easy to generate AMD64 code (using the
       BITS 64 directive) and generate a 32-bit object file (by failing to specify -m amd64 on
       the command line or selecting a 64-bit object format). Similarly, specifying -m amd64 does
       not default the BITS setting to 64. An easy way to avoid this is by directly specifying a
       64-bit object format such as -f elf64.

AUTHOR

       Peter Johnson <peter@tortall.net>
           Author.

       Copyright © 2004, 2005, 2006, 2007 Peter Johnson