The global constructor bug in avr-gcc

There is a major bug affecting versions of avr-gcc that I have tested from 4.4 all the way up to the most recent (at the time of writing) 4.5.1. It only affects programs that target MCUs with more than 64K flash memory so you need to know about this if you are programming the Arduino Mega (ATmega1280) or the new Mega 256 (ATmega2560).

About the bug

There are actually two parts to the the bug. When you use global constructors gcc stores a list of the addresses of the constructors in a table. During your program’s initialisation phase the MCU walks through that table calling each constructor address stored there in turn. The same goes for global destructors but you’ll never see that because an MCU program never terminates.

The first bug is that register r20 is used by the initialisation code but is not preserved by your constructor code, hence it gets corrupted during the call and your program will crash when the call returns and the init code blindly assumes that r20 is still what it was before the call.

Here is the offending section of code:

.L__do_global_ctors_loop:
        sbiw    r28, 2
        sbc     r20, __zero_reg__
        mov_h   r31, r29
        mov_l   r30, r28
        out     __RAMPZ__, r20
        XCALL   __tablejump_elpm__

The second bug is more subtle. The addresses in the global constructor table are only 16 bits wide. This is OK as long as you don’t have any global constructors physically located above the first 64K of flash. If you do then they will be unreachable and the address stored in this call table will be wrong. Result? Crash.

Fixing the register corruption bug

This one is an easy one to patch. The stack is used to hold r20 while the constructor call is made. The corrected assembly code should look like this:

.L__do_global_ctors_loop:
        sbiw    r28, 2
        sbc     r20, __zero_reg__
        mov_h   r31, r29
        mov_l   r30, r28
        out     __RAMPZ__, r20
        push    r20
        XCALL   __tablejump_elpm__
        pop     r20

This code is stored in gcc/config/avr/libgcc.S. If you are able to build gcc yourself then you should get the patch source from the official location, apply the patch and rebuild the compiler.

If you do not feel up to applying this patch yourself – and that probably means that you are a Windows user – then you can download and install it from my downloads page. Follow the link to my blog post to learn more about the package.

Working around the 16 bit address bug

Ideally the fix for this would be a compiler or linker patch, rebuild and job done. Unfortunately my knowledge of how to work on this part of the gcc internals is not quite there yet, but I do have a workaround for you.

Your program is internally divided into linear sections and the linker allows us to tell it which section each part of code must go into. For example. The linker internally calls the ATmega2560 architecture avr6 and it uses linker scripts to tell it how to arrange the program’s sections. These scripts can be found in the binutils package in the ld/ldscripts/avr6.* files. Here’s the relevant snippet from one of them.

  /* Internal text space or external memory.  */
  .text   :
  {
    *(.vectors)
    KEEP(*(.vectors))
    /* For data that needs to reside in the lower 64k of progmem */
    *(.progmem.gcc*)
    *(.progmem*)
    . = ALIGN(2);
     __trampolines_start = . ;
    /* The jump trampolines for the 16-bit limited relocs will
       reside here.  */
    *(.trampolines)
    *(.trampolines*)
     __trampolines_end = . ;
    /* For future tablejump instruction arrays for 3 byte pc 
      devices. We don't relax jump/call instructions within
      these sections.  */
    *(.jumptables)
    *(.jumptables*)
    /* For code that needs to reside in the lower 128k progmem. */
    *(.lowtext)
    *(.lowtext*)
     __ctors_start = . ;
     *(.ctors)
     __ctors_end = . ;
     __dtors_start = . ;
     *(.dtors)
     __dtors_end = . ;
    KEEP(SORT(*)(.ctors))
    KEEP(SORT(*)(.dtors))
    /* From this point on, we don't bother about wether the
      insns are below or above the 16 bits boundary.  */
    *(.init0)  /* Start here after reset.  */

Note the highlighted section .progmem. If we can tell gcc to put constructors into that section then we’re guaranteed that they will be placed into the low end of memory. Here’s how to do it.

Firstly, add the following definition to one of your header files that is included by every file in your project. Everyone has one of those, often called constants.h or some such name.

#define CONSTRUCTOR __attribute__ ((section (".progmem")))

Secondly in your header files change your constructor definitions to include the new CONSTRUCTOR macro:

class MyClass
{
public:
  CONSTRUCTOR MyClass();
};

That’s it, you don’t need to modify the .cpp files. But how can you be sure that it’s working? The answer is to get the linker to output a map file by supplying it with the -Wl,-Map=output.map option. After applying the above workaround to my own project that generates 80K of flash code, here’s what the map tells me:

*(.progmem.gcc*)
 .progmem.gcc_sw_table
                0x000000e4       0x2a ./impl/screens/ScreenManagerImpl.o
 *(.progmem*)
 .progmem       0x0000010e      0x1d6 ./lib/LiquidCrystal/LiquidCrystal.o
                0x0000010e                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x0000010e                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x00000170                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x00000170                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x000001dc                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x000001dc                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x00000262                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
                0x00000262                LiquidCrystal::LiquidCrystal(unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char, unsigned char)
 

See how the constructors to the LiquidCrystal library have been located in the .progmem section? Perfect!

Conclusion

These two bugs had the potential to eliminate gcc as a C++ development platform for the ATmega1280/2560 but with the fix and the workaround we can continue development almost as normal. I hope this was helpful to you, all comments are welcome.