Ubuntu Manpage: optimization - Compiler optimization

name
problems with reordering code

NAME

       optimization - Compiler optimization

Problems with reordering code

       Author
           Jan Waclawek

       Programs contain sequences of statements, and a naive compiler would execute them exactly in the order as
       they are written. But an optimizing compiler is free to reorder the statements --- or even parts of them
       --- if the resulting 'net effect' is the same. The 'measure' of the 'net effect' is what the standard
       calls 'side effects', and is accomplished exclusively through accesses (reads and writes) to variables
       qualified as volatile. So, as long as all volatile reads and writes are to the same addresses and in the
       same order (and writes write the same values), the program is correct, regardless of other operations in
       it. One important point to note here is, that time duration between consecutive volatile accesses is not
       considered at all.

       Unfortunately, there are also operations which are not covered by volatile accesses. An example of this
       in AVR-GCC/AVR-LibC are the cli() and sei() macros defined in <avr/interrupt.h>, which convert directly
       to the respective assembler mnemonics through the __asm__() statement. They constitute a variable access
       by means of their memory clobber, and they are (implicitly) volatile because they don't have an output
       operand. So the compiler may not reorder these inline asm statements with respect to other memory
       accesses or volatile actions. However, such asm statementy may still be reordered with other statement
       that are neither volatile nor access memory.

       Note that even a volatile asm instruction can be moved relative to other code, including across
       (expensive) arithmetic and jump instructions [...]

       See also
           http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

       However, not even a volatile memory barrier like

       __asm __volatile__ ("" ::: "memory");

       keeps GCC from reordering non-volatile, non-memory accesses across such barriers. Peter Dannegger
       provided a nice example of this effect:

       #define cli() __asm volatile( "cli" ::: "memory" )
       #define sei() __asm volatile( "sei" ::: "memory" )

       unsigned int ivar;

       void test2 (unsigned int val)
       {
         val = 65535U / val;

         cli();

         ivar = val;

         sei();
       }

       avr-gcc v5.4 or v14 compile with optimisations switched on (-Os) to

       00000112 <test2>:
        112:     bc 01          movw r22, r24
        114:     f8 94          cli
        116:     8f ef          ldi  r24, 0xFF ; 255
        118:     9f ef          ldi  r25, 0xFF ; 255
        11a:     0e 94 96 00    call 0x12c     ; 0x12c <__udivmodhi4>
        11e:     70 93 01 02    sts  0x0201, r23
        122:     60 93 00 02    sts  0x0200, r22
        126:     78 94          sei
        128:     08 95          ret

       where the potentially slow division is moved across cli(), resulting in interrupts to be disabled longer
       than intended. Note, that the volatile access occurs in order with respect to cli() or sei(); so the 'net
       effect' required by the standard is achieved as intended, it is 'only' the timing which is off. However,
       for most of embedded applications, timing is an important, sometimes critical factor.

       See also
           https://www.mikrocontroller.net/topic/65923

       Unfortunately, at the moment, in avr-gcc (nor in the C standard), there is no mechanism to enforce
       complete match of written and executed code ordering --- except maybe of switching the optimization
       completely off (-O0), or writing all the critical code in assembly.

       Note
           The artifact with the __udivmodhi4 function is specific to avr-gcc and how the compiler represents
           the division internally. On other target platforms that are using a library function for division or
           whatever expensive operation, this eccect will not occur. The reason is that avr-gcc does not
           represent the library call as a function call but rather like an ordinary instruction. Outcome is
           that the GCC middle-end concludes that the division is cheap (because the backend has an instruction
           for it) but in fact it's not.

       A work around for the code from above would be to enforce that the division havvens prior to the cli():

       val = 65535U / val;
       __asm __volatile__ ("" : "+r" (val));
       cli();

       • The volatile forces the asm statememt prior to the cli.

       • The asm has val as input operand, hence the division must be carried out prior to the asm because val
         is set by the division.

       Notice that this work around does not work in general due to a variety of reasons:

       • The division might be located in an inlined function.

       • The variable might be read-only or may not be appropriate as an asm operand.

       • There may be more such instruction prior to the division, and it is not practical to treat all of them
         like this.

       To sum it up:

       •
        volatile memory barriers don't ensure statements with no volatile accesses to be reordered across the
        barrier