plucky (3) optimization.3avr.gz

NAME
optimization - Compiler optimization
Problems with reordering code
Author Jan Waclawek Programs contain sequences of statements, and a naive compiler would execute them exactly in the order as they are written. But an optimizing compiler is free to reorder the statements --- or even parts of them --- if the resulting 'net effect' is the same. The 'measure' of the 'net effect' is what the standard calls 'side effects', and is accomplished exclusively through accesses (reads and writes) to variables qualified as volatile. So, as long as all volatile reads and writes are to the same addresses and in the same order (and writes write the same values), the program is correct, regardless of other operations in it. One important point to note here is, that time duration between consecutive volatile accesses is not considered at all. Unfortunately, there are also operations which are not covered by volatile accesses. An example of this in AVR-GCC/AVR-LibC are the cli() and sei() macros defined in <avr/interrupt.h>, which convert directly to the respective assembler mnemonics through the __asm__() statement. They constitute a variable access by means of their memory clobber, and they are (implicitly) volatile because they don't have an output operand. So the compiler may not reorder these inline asm statements with respect to other memory accesses or volatile actions. However, such asm statementy may still be reordered with other statement that are neither volatile nor access memory. Note that even a volatile asm instruction can be moved relative to other code, including across (expensive) arithmetic and jump instructions [...] See also http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html However, not even a volatile memory barrier like __asm __volatile__ ("" ::: "memory"); keeps GCC from reordering non-volatile, non-memory accesses across such barriers. Peter Dannegger provided a nice example of this effect: #define cli() __asm volatile( "cli" ::: "memory" ) #define sei() __asm volatile( "sei" ::: "memory" ) unsigned int ivar; void test2 (unsigned int val) { val = 65535U / val; cli(); ivar = val; sei(); } avr-gcc v5.4 or v14 compile with optimisations switched on (-Os) to 00000112 <test2>: 112: bc 01 movw r22, r24 114: f8 94 cli 116: 8f ef ldi r24, 0xFF ; 255 118: 9f ef ldi r25, 0xFF ; 255 11a: 0e 94 96 00 call 0x12c ; 0x12c <__udivmodhi4> 11e: 70 93 01 02 sts 0x0201, r23 122: 60 93 00 02 sts 0x0200, r22 126: 78 94 sei 128: 08 95 ret where the potentially slow division is moved across cli(), resulting in interrupts to be disabled longer than intended. Note, that the volatile access occurs in order with respect to cli() or sei(); so the 'net effect' required by the standard is achieved as intended, it is 'only' the timing which is off. However, for most of embedded applications, timing is an important, sometimes critical factor. See also https://www.mikrocontroller.net/topic/65923 Unfortunately, at the moment, in avr-gcc (nor in the C standard), there is no mechanism to enforce complete match of written and executed code ordering --- except maybe of switching the optimization completely off (-O0), or writing all the critical code in assembly. Note The artifact with the __udivmodhi4 function is specific to avr-gcc and how the compiler represents the division internally. On other target platforms that are using a library function for division or whatever expensive operation, this eccect will not occur. The reason is that avr-gcc does not represent the library call as a function call but rather like an ordinary instruction. Outcome is that the GCC middle-end concludes that the division is cheap (because the backend has an instruction for it) but in fact it's not. A work around for the code from above would be to enforce that the division havvens prior to the cli(): val = 65535U / val; __asm __volatile__ ("" : "+r" (val)); cli(); • The volatile forces the asm statememt prior to the cli. • The asm has val as input operand, hence the division must be carried out prior to the asm because val is set by the division. Notice that this work around does not work in general due to a variety of reasons: • The division might be located in an inlined function. • The variable might be read-only or may not be appropriate as an asm operand. • There may be more such instruction prior to the division, and it is not practical to treat all of them like this. To sum it up: • volatile memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier