Ubuntu Manpage: inline_asm - Inline Assembler Cookbook

name
about this document
the anatomy of a gcc asm statement
special sequences
constraints
print modifiers
operand modifiers
examples
binding local variables to registers
specifying the assembly name of static objects
what won't work

NAME

       inline_asm - Inline Assembler Cookbook

       AVR-GCC
        Inline Assembler Cookbook

       • About this Document

       • Building Blocks

         • The Anatomy of a GCC asm Statement

         • Special Sequences

         • Constraints

           • Constraint Modifiers

           • Instructions and Constraints

         • Print Modifiers

         • Operand Modifiers

       • Examples

         • Swapping Nibbles

         • Swapping Bytes

         • Accessing Memory

         • Accessing Bytes of wider Expressions

         • Inline Functions and __builtin_constant_p

         • Jumping and Branching

       • Binding local Variables to Registers

         • Interfacing non-ABI Functions

       • Specifying the Assembly Name of Static Objects

       • What won't work

About this Document

       The GNU C/C++ compiler for AVR RISC processors offers to embed assembly language code into C/C++
       programs. This cool feature may be used for manually optimizing time critical parts of the software, or
       to use specific processor instructions which are not available in the C language.

       It's assumed that you are familiar with writing AVR assembler programs, because this is not an AVR
       assembler programming tutorial. It's not a C/C++ tutorial either.

       Note that this document does not cover files written completely in assembly language, refer to AVR-LibC
       and Assembler Programs for this.

       Copyright (C) 2001-2002 by egnite Software GmbH

       Permission is granted to copy and distribute verbatim copies of this manual provided that the copyright
       notice and this permission notice are preserved on all copies. Permission is granted to copy and
       distribute modified versions of this manual provided that the entire resulting derived work is
       distributed under the terms of a permission notice identical to this one.

       This document describes version 4.7 of the compiler or newer.

       Herne, 17th of May 2002 Harald Kipp harald.kipp-at-egnite.de

The Anatomy of a GCC asm Statement

A GCC inline assembly statement starts with the keyword asm, __asm or __asm__, where the first one is not
available in strict ANSI mode.

In its simplest form, the inline assembly statement has no operands and injects just one instruction into
the code stream, like in

__asm ("nop");

In its generic form, an asm statements can have one of the following three forms:

A simple asm without operands

__asm (code-string);

code-string is a string literal that will be added as is into the generated assembly code. This even
applies to the % character. The only replacement is that \n and \t are interpreted as newline resp. TAB
character.

This type of asm statement may occur at top level, outside any function as global asm. When its placement
relative to functions is important, consider -fno-toplevel-reorder.

An asm with operands

__asm volatile (code-string : output-operands : input-operands : clobbers);

This is the most widely used form of an asm statement. It must be located in a function.

output-operands, input-operands and clobbers are comma-separated lists of operands resp. clobber
specifications. Any of them may be empty, for example when the asm has no outputs. At least one : (colon)
must be present, otherwise it will be a simple asm without operands and without % replacements.

An asm goto statement

__asm goto (code-string : : input-operands : clobbers : labels);

Like the asm above, but labels is a comma-separated list of C/C++ code labels which would be valid in a
goto statement. And output-operands must be empty, because it is impossible to generate output reloads
after the code has transferred control to one of the labels.
As there are no output operands, asm goto is implicitly volatile. When volatile is specified explicitly,
the goto keyword may be placed after or before the volatile.

Notes on the various parts:

Volatility
Keyword volatile is optional and means that the asm statement has side effects that are not expressed
in terms of the operands or clobbers. The asm statement must not be optimized away or reordered with
respect to other volatile statements like volatile memory accesses or other volatile asm.

Any asm statement without output-operands is implicitly volatile.

A non-volatile asm statement with output operands that are all unused may be optimized away when all
output operands are unused.

Instead of volatile, __volatile or __volatile__ can be used.

code-string
A string literal that contains the code that is to be injected in the assembly code generated by the
compiler. %-expressions are replaced by the string representations of the operands, and the number of
lines is determined to estimate the code size of the asm.
Apart from that, the compiler does not analyze the code provided in the code template.
This means that the code appears to the compiler as if it was executed in one parallel chunk, all at
once. It is important to keep that in mind, in particular for cases where input and output operands
may overlap.

output-operands

input-operands
A comma-separated list of operands, which may take the following forms. In any case, the first
operand can be referred to as '%0' in code-string, the second one as '%1' etc.

'constraints' (expr)
expr is a C expression that's an input or output (or both) to the asm statement. An output expression
must be an lvalue, i.e. it must be valid to assign a value to it.
'constraints' is a string literal with constraints and constraint modifiers. For example, constraint
'r' stands for general-purpose register. A simple input operand would be

"r" (value + 1)

The compiler computes value + 1 and supplies it in some general-purpose register R2...R31. In many
cases, an upper d-register R16...R31 is required for instructions like LDI or ANDI. A respective output
operand specification is

"=d" (result)

Notice that this operand may overlap with input operands!
When an operand is written before all input operands are consumed, then in almost all cases the output
operand requires an early-clobber modifier & so that it won't overlap with any input operand:

"=&d" (result)

An operand that's both an output and an input can be expressed with the + constraint modifier:

"+d" (result)

Such an operand is both output and input, and hence it won't overlap with other operands.

[name] 'constraints' (expr)
Like above. In addition, a named operand can be referred to as %[name] in code-string. This is useful
in long asm statements with many operands.

clobbers
A comma-separated list of string literals like '16', 'r16' or 'memory'.

The first two clobbers mean that the asm destroys register R16. Only the lower-case form is allowed, and
register names like Z are not recognized.

'memory' means that the asm touches memory in some way. When the asm writes to some RAM location for
example, the compiler must not optimize RAM accesses across the asm because the memory may change.

Clobbering __tmp_reg__ by means of 'r0' has no effect, but such a clobber may be added to indicate to the
reader that the asm clobbers R0.

Clobbering __zero_reg__ by means of 'r1' has no effect. When the asm destroys the zero register, for
example by means of a MUL instruction, then the code must restore the register at the end by means of
'clr __zero_reg__'

The size of an asm
The code size of an asm statement is the number of lines multiplied by 4 bytes, the maximal possible
AVR instruction length. The length is needed when (conditional) jumps cross the asm statement in
order to compute (upper bounds for) jump offsets of PC-relative jumps.

The number of lines is one plus the number of line breaks in code-string. These may be physical line
breaks from \n characters and logical line breaks from $ characters.

Before we start with the first examples, we list all the bells and whistles that can be used to compose
an inline assembly statement: special sequences, constraints, constraint modifiers, print modifiers and
operand modifiers.

Special Sequences

       There are special sequences that can be used in the assembly template.

       Sequence Description  __SREG__ The I/O address of the status register SREG at 0x3F  __tmp_reg__ The
       temporary register R0 (R16 on reduced Tiny)  __zero_reg__ The zero register R1, always zero (R17 on
       reduced Tiny)  $ A logical line separator, used to separate multiple instruction in one physical line  \n
       A physical newline, used to separate multiple instructions  \t A TAB character, can be used for better
       legibility of the generated asm  \" A " character (double quote)  \\ A \ character (backslash)  %% A %
       charater (percent)  %~ 'r' or '', used to construct call or rcall by means of '%~call', depending on the
       architecture  %! '' or 'e', used to construct indirect calls like icall or eicall by means of '%!icall',
       depending on the architecture  %= A number that's unique for the compilation unit and the respective
       inline asm code, used to construct unique labels  Comment Description  ; text A single-line assembly
       comment that extends to the end of the physical line  /* text */ A multi-line C comment

       • Moreover, the following I/O addresses are defined provided the device supports the respective SFR:
         __SP_L__, __SP_H__, __CCP__, __RAMPX__, __RAMPY__, __RAMPZ__, __RAMPD__.

       • Register __tmp_reg__ may be freely used by inline assembly code and need not be restored at the end of
         the code.

       • Register __zero_reg__ contains a value of zero. When that value is destroyed, for example by a MUL
         instruction, its value has to be restored at the end of the code by means of

       clr __zero_reg__

       • In inline asm without operands (i.e without a single colon), a % will always insert a single %. No
         %-codes are available.

       Sequences like __SREG__ are not evaluated as part of the inline asm, they are just copied to the asm code
       as they are. At the top of each assembly file, the compiler prints definitions like

       __SREG__ = 0x3f

        so that they can also be used in inline assembly.

Constraints

The most up-to-date and detailed information on constraints for the AVR can be found in the avr-gcc Wiki.

Constraint Registers Range a Simple upper registers that support FMUL R16 ... R23 b Base pointer
registers that support LDD, STD Y, Z (R28 ... R31) d Upper registersR16 ... R31 e Pointer registers that
support LD, ST X, Y, Z (R26 ... R31) l Lower registersR2 ... R15 r Any registerR2 ... R31 w Upper
registers that support ADIW R24 ... R31 x X pointer registersR26, R27 y Y pointer registersR28, R29 z Z
pointer registersR30, R31 Constraint Constant Range I 6-bit unsigned integer constant0 to 63 J 6-bit
negative integer constant63 to 0 M 8-bit unsigned integer constant0 to 255 n Integer constanti Immediate
value known at link-time, like the address of a variable in static storageEF Floating-point constantYnn
Fixed-point or integer constantConstraint Explanation Notes m A memory locationX Any valid operand0 ...
9 Matches the respective operand number

• Constraints without a modifier specify input operands.

• Constraints with a modifier specify output operands.

• More than one constraint like in 'rn' specifies the union of the specified constraints; 'r' and 'n' in
this case.

• All constraints listed above are single-letter constraints, except Ynn which is a 3-letter constraint.

Constraint modifiers are:

Modifier Meaning = Output-only operand. Without & it may overlap with input operands + Output operand
that's also an input =& 'Early-clobber'. Register should be used for output only and won't overlap with
any input operand(s)

The selection of the proper constraint depends on the range of the constants or registers, which must be
acceptable to the AVR instruction they are used with. The C compiler doesn't check any line of your
assembler code. But it is able to check the constraint against your C expression. However, if you specify
the wrong constraints, then the compiler may silently pass wrong code to the assembler. And, of course,
the assembler will fail with some cryptic output or internal errors, or in the worst case wrong code may
be the result.

For example, if you specify the constraint 'r' and you are using this register with an ORI instruction,
then the compiler may select any register. This will fail if the compiler chooses R2 to R15. (It will
never choose R0 or R1, because these are uses for special purposes.) That's why the correct constraint in
that case is 'd'. On the other hand, if you use the constraint 'M', the compiler will make sure that you
don't pass anything else but an 8-bit unsigned integer value known at compile-time.

The following table shows all AVR assembler mnemonics which require operands, and the related
constraints.

Mnemonic Constraints Mnemonic Constraints adc r,r add r,r adiw w,I and r,r andi d,M asr r bclr I bld
r,I brbc I,label brbs I,label bset I bst r,I call i cbi I,I cbr d,I clr r com r cp r,r cpc r,r cpi
d,M cpse r,r dec r elpm r,z eor r,r fmul a,a fmuls a,a fmulsu a,a in r,I inc r jmp i lac z,r las
z,r lat z,r ld r,e ldd r,b ldi d,M lds r,i lpm r,z lsl r lsr r mov r,r movw r,r mul r,r muls r,r
mulsu a,a neg r or r,r ori d,M out I,r pop r push r rcall i rjmp i rol r ror r sbc r,r sbci d,M sbi
I,I sbic I,I sbiw w,I sbr d,M sbrc r,I sbrs r,I ser d st e,r std b,r sts i,r sub r,r subi d,M swap
r tst r xch z,r

Print Modifiers

       The %-operands in the inline assembly template can be adjusted by special print-modify characters. The
       one-letter modifier follows the % and precedes the operand number like in '%a0', or precedes the name in
       named operands like in '%a[address]'.

       Modifier Number of
       Arguments Explanation Suitable
       Constraints  %a0 1 Print pointer register as address X, Y or Z, like in 'LD r0, %a0+' x, y, z, b, e  %i0
       1 Print compile-time RAM address as I/O address, like in 'OUT %i0, r0' with argument 'n'(&SREG) n  %n0 1
       Print the negative of a compile-time integer constant n  %r0 1 Print the register number of a register,
       like in 'CLR %r0+7' for the MSB of a 64-bit register reg  %x0 1 Print a function name without gs()
       modifier, like in '%~CALL %x0' with argument 's'(main) s  %A0 1 Add 0 to the register number (no effect)
       reg  %B0 1 Add 1 to the register number reg  %C0 1 Add 2 to the register number reg  %D0 1 Add 3 to the
       register number reg  %T0%t1 2 Print the register that holds bit number %1 of register %0 reg + n  %T0%T1
       2 Print operands suitable for BLD/BST, like in 'BST %T0%T1', including the required , reg + n

       • Register constraints are: r, d, w, x, y, z, b, e, a, l.

Operand Modifiers

       Modifier Explanation Purpose  lo8() 1st  Byte of a link-time constant, bits 0...7 Getting parts
       of a byte-address  hi8() 2nd  Byte of a link-time constant, bits 8...15  hlo8() 3rd  Byte of a link-time
       constant, bits 16...23  hhi8() 4th  Byte of a link-time constant, bits 24...31  hh8() Same like hlo8
       pm_lo8() 1st  Byte of a link-time constant divided by 2, bits 1...8 Getting parts
       of a word-address  pm_hi8() 2nd  Byte of a link-time constant divided by 2, bits 9...16  pm_hh8() 3rd
       Byte of a link-time constant divided by 2, bits 17...24  pm() Link-time constant divided by 2 in order to
       get a program memory (word) addresses, like in lo8(pm(main)) Word-address  gs() Function address divided
       by 2 in order to get a (word) addresses, like in lo8(gs(main)). Generate stub (trampoline) as needed.
       This is required to calculate the address of a code label on devices with more than 128 KiB of program
       memory that's supposed to be used in EICALL. For rationale, see the GCC documentation. On devices with
       less program memory, gs() behaves like pm() Function address
       for [E]ICALL

       When the argument of a modifier is not computable at assembler-time, then the assembler has to encode the
       expression in an abstract form using RELOCs. Consequence is that only a very limited number of argument
       expressions is supported when they are not computable at assembler-time.

Examples

       Some examples show the assembly code as generated by the compiler. It's the code from the .s files as
       generated with option -save-temps. Adding the high-level source to the generated assembly can be turned
       on with -fverbose-asm since GCC v8.

   Swapping Nibbles
       The fist example uses the swap instruction to swap the nibbles of a byte. Input and output of swap are
       located in the same general purpose register. This means the input operand, operand 1 below, must be
       located in the same register(s) like operand 0, so that the right constraint for operand 1 is '0':

       asm ("swap" : "=r" (value) : "0" (value));

        All side effects of the code are described by the constraints and the clobbers, so that there is no need
       for this asm to be volatile. In particular, this asm may be optimized out when the output value is
       unused.
        A shorter pattern to state that value is both input and output is by means of constraint modifier +

       asm ("swap" : "+r" (value));

   Swapping Bytes
       Swapping nibbles was a piece of cake, so let's swap the bytes of a 16-bit value. In order to access the
       constituent bytes of the 16-bit input and output values, we use the print modifiers %A and %B.

       The asm is placed in a small C test case so that we can inspect the resulting assembly code as generated
       by the compiler with -save-temps.

       void callee (int, int);

       void func (int param)
       {
           int swapped;

           asm ("mov %A0, %B1" "\n\t"
                "mov %B0, %A1"
                : "=r" (swapped) : "r" (param));

           callee (param, swapped);
       }

       The '\n\t' sequence adds a line feed that is required between the two instructions, and a TAB to align
       the two instructions in the generated assembly. There is no '\n\t' after the last instruction because
       that would just increase the size of the asm.
        The generated assembly works as expected. The compiler wraps it in #APP / #NOAPP annotations:

       func:
       /* #APP */
           mov r22, r25     ;  swapped, param
           mov r23, r24     ;  swapped, param
       /* #NOAPP */
           jmp callee

       Wrong! While the generated code above is correct, the inline asm itself is not!
        We see this with a slightly adjusted test case where the arguments of callee have been swapped, but that
       uses the same inline asm:

       void func (int param)
       {
           int swapped;

           asm ("mov %A0, %B1" "\n\t"
                "mov %B0, %A1"
                : "=r" (swapped) : "r" (param));

           callee (swapped, param);
       }

       The result is the following assembly:

       func:
           movw r22,r24
       /* #APP */
           mov r24, r25     ;  swapped, param
           mov r25, r24     ;  swapped, param
       /* #NOAPP */
           jmp callee

       which is obviously wrong, because after the code from the inline asm, the low byte of swapped and the
       high byte will always have the same value of r25.

       The reason is that the output operand overlaps the input, and the output is changed before all of the
       input operands are consumed. This is a so-called early-clobber situation. There are two possible
       solutions to this predicament:

       • Mark the output operand with the early-clobber constraint modifier:

       asm ("mov %A0, %B1" "\n\t"
            "mov %B0, %A1"
            : "=&r" (swapped) : "r" (param));

       • Use constraints and a code sequence that expect input and output in the same registers:

       asm ("eor %A0, %B0" "\n\t"
            "eor %B0, %A0" "\n\t"
            "eor %A0, %B0"
            : "=r" (swapped) : "0" (param));

   Accessing Memory
       Accessing memory requires that the AVR instructions that perform the memory access are provided with the
       appropriate memory address.

       1.  The address can be provided directly, like __SREG__, 0x3f, as a symbol, or as a symbol plus a
           constant offset.

       2.  Provide the address by means of an inline asm operand.

       Approach 1 is simpler as it does not require an asm operand, while approach 2 is in many cases more
       powerful because macros defined per, say, #include <avr/io.h> can be used as operands, whereas such
       headers are not included in the assembly code as generated by the compiler.

       Reading a SFR like PORTB can be performed by

       asm volatile ("in %0, %1" : "=r" (result) : "I" _SFR_IO_ADDR (PORTB));

        Macro _SFR_IO_ADDR is provided by avr/sfr_defs.h which is included by avr/io.h.

       Since GCC v4.7, print modifier %i is supported, which prints RAM addresses like & PORTB as an I/O
       address:

       asm volatile ("in %0, %i1" : "=r" (result) : "I" (& PORTB));

       When the address is not an I/O address, then LDS or LD must be used, depending on whether the address is
       known at link-time or only at run-time. For example, the following macro provides the functionality to
       clear an SFR. The code discriminates between the possibilities that

       • The SFR address is known at compile-time and is an I/O address.

       • The SFR address is known at compile-time but is not in the I/O range.

       • The SFR address is not known at compile-time.

       #include <avr/io.h>

       #define CLEAR_REG(sfr)                          \
       do {                                            \
         if (__builtin_constant_p (& (sfr))            \
             && _SFR_IO_REG_P (sfr))                   \
           asm volatile ("out %i0, __zero_reg__"       \
                         :: "I" (& (sfr)) : "memory"); \
         else if (__builtin_constant_p (& (sfr)))      \
           asm volatile ("sts %0, __zero_reg__"        \
                         :: "n" (& (sfr)) : "memory"); \
         else                                          \
           asm volatile ("st %a0, __zero_reg__"        \
                         :: "e" (& (sfr)) : "memory"); \
       } while (0)

       The last case with constraint 'e' works because &sfr is a 16-bit value, and 16-bit values (and larger)
       start in even registers. Therefore, the address will be located in R27:R26, R29:R28 or in R31:R30, which
       print modifier %a will print as X, Y or Z, respectively. The address will never end up in, say, R30:R29.

       The test case

       void clear_3_regs (uint8_t volatile *psfr)
       {
           CLEAR_REG (PORTB);
           CLEAR_REG (UDR0);
           CLEAR_REG (*psfr);
       }

       compiles for ATmega328 and with optimization turned on to

       clear_3_regs:
           movw r30,r24
       /* #APP */
           out 0x5, __zero_reg__
           sts 198, __zero_reg__
           st Z,    __zero_reg__   ;  psfr
       /* #NOAPP */
           ret

       As __builtin_constant_p is used to infer whether the address of the SFR is known at compile-time, extra
       care must be taken when the functionality is implemented as an inline function:

       static inline __attribute__((__always_inline__))
       void clear_reg (uint8_t volatile *psfr)
       {
         // !!! The following cast is required to make __builtin_constant_p
         // !!! work as expected in the inline function.
         uintptr_t addr = (uintptr_t) psfr;

         if (__builtin_constant_p (addr)
             && _SFR_IO_REG_P (* psfr))
           asm volatile ("out %i0, __zero_reg__"
                         :: "I" (addr) : "memory");
         else if (__builtin_constant_p (addr))
           asm volatile ("sts %0, __zero_reg__"
                         :: "n" (addr) : "memory");
         else
           asm volatile ("st %a0, __zero_reg__"
                         :: "e" (addr) : "memory");
       }

       void clear_3_pregs (uint8_t volatile *psfr)
       {
         clear_reg (& PORTB);
         clear_reg (& UDR0);
         clear_reg (psfr);
       }

       Casting the address psfr to an integer type in the inline function is required so that the compiler will
       recognize constant addresses.
        Also notice that we have to pass the address of the SFR to the inline function. Passing the SFR directly
       like in the marco approach won't work for obvious reasons.

   Accessing Bytes of wider Expressions
       Finally, an example that atomically increments a 16-bit integer. The code is wrapped in IN SREG / CLI /
       OUT SREG to make it atomic. It reads the 16-bit value data from its absolute address, increments it and
       then writes it back:

       uint16_t volatile data;

       void inc_data (void)
       {
           uint16_t tmp;
           asm volatile ("in __tmp_reg__, __SREG__"   "\n\t"
                         "cli"                        "\n\t"
                         "lds %A[temp], %[addr]"      "\n\t"
                         "lds %B[temp], %[addr]+1"    "\n\t"
       #ifdef __AVR_TINY__
                         // Reduced Tiny does not have ADIW.
                         "subi %A[temp], lo8(-1)"     "\n\t"
                         "sbci %B[temp], hi8(-1)"     "\n\t"
       #else
                         "adiw %[temp], 1"            "\n\t"
       #endif
                         "sts %[addr]+1, %B[temp]"    "\n\t"
                         "sts %[addr],   %A[temp]"    "\n\t"
                         "out __SREG__, __tmp_reg__"
       #ifdef __AVR_TINY__
                         // No need to restrict tmp to a "w" register. And on
                         // avr-gcc v13.2 and older, "w" contains no regs.
                         : [temp] "=d" (tmp), "+m" (data)
       #else
                         : [temp] "=w" (tmp), "+m" (data)
       #endif
                         : [addr] "i" (& data));
       }

       Notice there are three different ways required to access the different bytes of the involved 16-bit
       entities:

       • For the 16-bit general purpose register %[temp], print modifiers %A and %B are used.

       • For the 16-bit value data in static storage, %[addr]+1 is used to access the high byte. The resulting
         expression data+1 is computable at link-time and evaluated by the linker.

       • In the compilation variant for Reduced Tiny, the bytes of the 16-bit subtrahend 1 are accessed with the
         operand modifiers lo8 and hi8 that are evaluated by the assembler because 1 is known at assembler-time.

       data is located in static storage, hence its address is known to the linker and fits constraint 'i'.

       The sole purpose of operand '+m' (data) is to describe the effect of the asm on data memory: It changes
       data. Notice that there is no 'memory' clobber, because that operand already describes all memory side
       effects, and it does this in a less intrusive way than a catch-all 'memory'. The operand is not used in
       the asm template; but in principle it would be possible to use it as operand with LDS and STS instead of
       operand [addr] 'i' (& data). However, there are many situations where a memory operand constrained by 'm'
       takes a form that cannot be used with AVR instructions because there are no matching print modifiers, or
       because it is not known a priori what specific form the memory operand takes. In such cases, one would
       take the address of the operand and supply it as address in a pointer register to the inline asm. The
       compiler generates the required instructions for address computation, and the inline asm knows that it
       can use LD and ST.

   Jumping and Branching
       When an inline asm contains jumps, then it also requires labels. When the label is inside the asm, then
       care must be taken that the label is unique in the compilation unit even when the inline asm is used
       multiple times, e.g. when the code is located in an unrolled loop or a function has multiple incarnations
       due to cloning, or simply because a macro or inline function that contains an asm statement is used more
       than once.
        There are two kinds of labels that can be used:

       • Local labels of the form n: where n is some (small, non-negative) number. They can be targeted by means
         of nb or nf, depending on whether the jump direction is backwards or forwards. Such a numeric labels
         may be present more than once. The taken label is the first one with the specified number in the
         respective direction:

       // Loop until bit PORTB.7 is set.
       asm volatile ("1: sbrs %i[sfr], %[bitno]"  "\n\t"
                     "rjmp 1b"
                     :: [sfr] "I" (& PORTB), [bitno] "n" (PB7));

       • Local labels that contain the sequence %= which yields some number that's unique amongst all asm
         incarnations in the respective compilation unit:

       // Loop until bit PORTB.7 is set.
       asm volatile (".Loop.%=: sbrs %i[sfr], %[bitno]"  "\n\t"
                     "rjmp .Loop.%="
                     :: [sfr] "I" (& PORTB), [bitno] "n" (PB7));

       Which form is used is a matter of taste. In practice, the first variant is often preferred in short
       sequences, whereas the second form is usually seen in longer algorithms.

       For labels that are defined in the surrounding C/C++ code, asm goto has to be used. The print modifier
       %x0 prints panic as a raw label, not as gs(panic) like it would be the case with %0.

       int main (void)
       {
           asm goto ("tst __zero_reg__" "\n\t"
                     "brne %x0"
                     :::: panic);
           /* ...Application code here... */
           return 0;
       panic:
           // __zero_reg__ is supposed to contain 0, but doesn't.
           return 1;
       }

       This assumes that the jump offset can be encoded in the brne instruction in all situations. When static
       analysis cannot prove that the jump offset fits, then a jumpity jump has to be used:

       asm goto ("tst   __zero_reg__" "\n\t"
                 "breq  1f"           "\n\t"
                 "%~jmp %x0"          "\n"
                 "1: ;; all fine"
                 :::: panic);

       Sequence '%~jmp' yields 'rjmp' or 'jmp' depending on the architecture. Notice that a jmp can be relaxed
       to an rjmp with option -mrelax provided the jump offset fits.

Binding local Variables to Registers

       One use of GCC's asm keyword is to bind local register variables to hardware registers.
        Such bindings of local variables to registers are only guaranteed during inline asm which has these
       variables as operands.

   Interfacing non-ABI Functions
       Suppose we want to interface a non-ABI assembly function mul_8_16 that multiplies R24 with R27:R26,
       clobbers R0, R1 and R25, and returns the 24-bit result in R20:R19:R18. One way to implement such an
       interface would be to provide an assembly function that performs the required copying and call to
       mul_8_16. Such a function would destroy some of the performance gain obtained by using assembly for
       mul_8_16: Additional copying back and forth and extra CALL and RET instructions.

       The compiler comes to the rescue. We can bind local variables to the required registers:

       extern void mul_8_16 (void); // Non-ABI function. Don't call in C/C++!

       static inline __attribute__((__always_inline__))
       __uint24 mul_8_16_gccabi (uint8_t val8, uint16_t val16)
       {
           register uint8_t r24 __asm("r24") = val8;
           register __uint24 r18 __asm("r18");

           asm ("%~call %x[func]"  "\n\t"
                "clr    __zero_reg__"
                : "=r" (r18)
                : "r" (r24), "x" (val16), [func] "i" (mul_8_16)
                : "r25", "r0");

           return r18;
       }

       • The 8-bit parameter is bound to R24, and the 24-bit return value is bound to R18...R20.

       • The register keyword is mandatory.

       • The hard register is specified as a string literal for the lower case register name or register number,
         like '18' or 'r18'. Specifications like 'R18', 18 or 'Z' are not supported.

       • The 16-bit parameter of mul_8_16 happens to be required in R27:R26, which is the X register for which
         there is register constraint 'x'. Therefore, no register binding is required for val16.

       • As mul_8_16 clobbers the zero register R1, it has to be restored by means of

       clr __zero_reg__

       • The asm is pure arithmetic and hence not volatile. (It might be advisable to make it volatile anyway,
         so that it won't be reorderd across sei() or cli() instructions.)

       Let's have a look at how this performs in a test case:

       void use_mul_8_16_gccabi (uint8_t val, uint8_t a, uint8_t b)
       {
           if (mul_8_16_gccabi (val, a * b) >= 0x2010)
               __builtin_abort();
       }

        For ATmega8 we get the following assembly:

       use_mul_8_16_gccabi:
           mul  r22,r20
           movw r26,r0
           clr  __zero_reg__
       /* #APP */
           rcall mul_8_16
           clr   __zero_reg__
       /* #NOAPP */
           cpi  r18,16
           sbci r19,32
           cpc  r20,__zero_reg__
           brlo .L1
           rcall abort
       .L1:
           ret

       No superfluous register moves. Great!

Specifying the Assembly Name of Static Objects

       Sometimes, it is desirable to use a different name for an object or function rather than the (mangled)
       name from the C/C++ implementation. Just add an asm specifier with the desired name as a string literal
       at the end of the declaration.

       For example, this is how avr/eeprom.h implements the eeprom_read_double() function:

       #if __SIZEOF_DOUBLE__ == 4
       double eeprom_read_double (const double*) __asm("eeprom_read_dword");
       #elif __SIZEOF_DOUBLE__ == 8
       double eeprom_read_double (const double*) __asm("eeprom_read_qword");
       #endif

       • It uses the implementation of eeprom_read_dword for eeprom_read_double, provided double is a 32-bit
         type.

       • It uses the implementation of eeprom_read_qword for 64-bit doubles.

What won't work

       GCC inline asm has some limitations.

   Setting a Register on one asm and using it in a different one
       Sequences like the following are not supposed to work:

       char var;

       void set_var (char c)
       {
           __asm ("inc r24");
           __asm ("sts var, r24");
       }

       • There is no guarantee whatsoever that the value in R24 will survive from one asm to the next. Such code
         might work in many situations, but it is still wrong and the compiler may very well put instructions
         bewtween the asm statements that change R24 prior to the first asm and also between the asm statements.

       • R24 is changed without noticing the compiler. When R24 contains other data, then that data will be
         trashed.

       A correct code would be

       __asm ("inc %0"    "\n\t"
              "sts var, %0"
              :: "r" (c) : "memory");

        or

       __asm ("inc %1"    "\n\t"
              "sts %0, %1"
              : "=m" (var) : "r" (c));

   Letting an Operand cross the Boundaries of the Y Register
       It is not possible to bind a value to a local register variable that crosses the boundaries of the Y
       register. For example, trying to bind a 32-bit value to R31:R28 by means of

       register uint32_t r28 __asm ("28");

        will result in an error message like

       error: register specified for 'r28' isn't suitable for data type

       Similarly, an operand described by a constraint will be located either completely below the Y register,
       as part of Y register, or above it.

   Using Matching Constraints '=0'...'=9' with Output Operands
       Suppose we want an inline asm that returns the low byte of a 16-bit value val16:

       asm ("" : "=1" (lo8) : "r" (val16));

        The diagnostic will be:

       error: matching constraint not valid in output operand