You are viewing nickclifton

Previous Entry | Next Entry

September 2010 GNU Toolchain Update

Deep Thought
Hi Guys,

  Here are the highlights of this month's changes:
 
  * An new option has been added to objcopy to allow it to create split output files in multiples of more than one byte.  So for example if you had to create image files for two 16-bit flash RAMs that are interleaved on a 32-bit bus, you can now use:

      objcopy --byte=0 --interleave=4 --interleave-width=2 foo.exe foo.image.1
      objcopy --byte=2 --interleave=4 --interleave-width=2 foo.exe foo.image.2

    Then if the input file foo.exe contained '12345678' the output files foo.image.1 and foo.image.2 would contain '1256' and '3478' respectively.


  * GCC now accepts a new function attribute called "ifunc".  This marks a function as an "indirect" function which allows the resolution of its value to be determined dynamically at load time.  (So that for example a version of the function optimized for the current architecture cab selected).

    To use this attribute it is first necessary to define all of different implementations of the function that will be available.  Then you must define a resolver function which will return a pointer to the desired implementation based upon some criteria. 

    The implementation functions' declarations must all have the same API and the resolver must return a pointer to void function returning void.  Here is an example:

      void *slow_memcpy (void *dst, const void *src, size_t len)
      {
         char *d = dst; char *s = src;

         while (len--)
           *d++ = *s++;

         return dst;
      }

      void *fast_memcpy (void *dst, const void *src, size_t len)
      {
         __asm("foo %0 %1 %2" : "=m" (dst) :  "m" (src), "r" (len) : memory);
         return dst;
      }

      static void (* resolve_memcpy (void)) (void)
      {
         return __cpu_has_foo () ? fast_memcpy : slow_memcpy;
      }

      void *memcpy (void *, const void *, size_t) __attribute__ ((ifunc ("resolve_memcpy")));

    The exported header file declaring the function the user calls would just contain:

      extern void *memcpy (void *, const void *, size_t);

    allowing the user to call this as a regular function, unaware of how it is actually implemented.

    Indirect functions cannot be weak, and require a recent binutils (at least version 2.20.1), and GNU C library (at least version 2.11.1).


  * A new warning option has been added to gcc:

      -Wdouble-promotion

    This issues a warning when a value of type float is implicitly promoted to double.  CPUs with a 32-bit single-precision floating-point unit implement floats in hardware, but emulate doubles in software.  On such a machine, doing computations using double values is much more expensive because of the overhead required for software emulation.

    It is easy to accidentally do computations with doubles because floating-point literals are implicitly of type double.  For example, in:

      float area (float radius) { return 3.14159 * radius * radius; }

    the compiler will perform the entire computation with double precision because the floating-point literal is a double.


  * A new option has been added to gcc to make it emit information about a program's stack usage: -fstack-usage

    This creates a dump file with entries for every compiled function.  Each entry is made up of three fields:

      + The name of the function.

      + A number of bytes.

      + One or more qualifiers: static, dynamic, bounded.

    The qualifier "static" means that the function uses a fixed number of bytes on the stack.  These are allocated upon entry to the function and released upon exit.  The second field in the stack dump entry specifies the number of bytes used.

    The qualifier "dynamic" means that the function uses a variable number of bytes on the stack.  In addition to the static allocation described above, stack adjustments are made in the body of the function, for example to push/pop arguments around function calls.

    If the qualifier "bounded" is also present, then the maximum amount of these dynamic adjustments is known at compile-time and the second field in the entry is the upper bound on the total amount of stack used by the function.   If "bounded" is not present, the amount of the dynamic adjustments is not computable at compile-time and the second field only represents an estimate of the total amount of stack space used.


  * Another new loop optimization has been added:
 
      -ftree-loop-if-convert-stores
   
    This attempts to perform if-convertions on conditional expressions containing memory writes.  This transformation can be unsafe for multi-threaded programs as it transforms conditional memory writes into unconditional memory writes.  For example:

     for (i = 0; i < N; i++)
       if (cond)
         A[i] = expr;

    would be transformed to:

      for (i = 0; i < N; i++)
        A[i] = cond ? expr : A[i];

    potentially producing data races.  As a result this optimization is not enabled by default at any optimization level and has to be specifically enabled on the command line.  (The intent of the optimization is to remove control-flow from the innermost loops in order to improve the ability of the vectorization pass to handle them).


  That's all folks!

Cheers
  Nick

Profile

Deep Thought
nickclifton
nickclifton

Latest Month

August 2014
S M T W T F S
     12
3456789
10111213141516
17181920212223
24252627282930
31      
Powered by LiveJournal.com
Designed by chasethestars